Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

gh-128639: Don't assume one thread in subinterpreter finalization with fixed daemon thread support #134606

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
Loading
from

Conversation

ZeroIntensity
Copy link
Member

@ZeroIntensity ZeroIntensity commented May 24, 2025

This is a reapplication of #128640, with daemon thread hanging fixed.

The key part that I was missing was this:

for (PyThreadState *p = list; p != NULL; p = p->next) {
    _PyThreadState_SetShuttingDown(p);
}

My fix predated these lines, so it didn't make it in originally.

The reason this only failed on the iOS buildbots was because they're single-process. Out of random choice, I picked 100 seconds as the sleep time, but since the process would exit before that 100 seconds was over on all the other buildbots, there wasn't any problem. On iOS, the test suite continued to run in the same process after the 100 seconds were up, so the daemon threads would start trying to run again under a deallocated interpreter, causing all sorts of weird crashes.

@bedevere-bot

This comment was marked as outdated.

@ZeroIntensity ZeroIntensity added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 24, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit a569355 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134606%2Fmerge

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 24, 2025
@ZeroIntensity

This comment was marked as outdated.

@ZeroIntensity

This comment was marked as outdated.

@bedevere-bot

This comment was marked as outdated.

@ZeroIntensity
Copy link
Member Author

!buildbot refleak

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit f85a291 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134606%2Fmerge

The command will test the builders whose names match following regular expression: refleak

The builders matched are:

  • aarch64 Fedora Rawhide NoGIL refleaks PR
  • AMD64 Fedora Rawhide NoGIL refleaks PR
  • PPC64LE RHEL8 Refleaks PR
  • PPC64LE Fedora Rawhide NoGIL refleaks PR
  • PPC64LE CentOS9 Refleaks PR
  • s390x Fedora Rawhide NoGIL refleaks PR
  • AMD64 Fedora Stable Refleaks PR
  • AMD64 RHEL8 Refleaks PR
  • AMD64 Windows11 Refleaks PR
  • aarch64 Fedora Stable Refleaks PR
  • AMD64 Fedora Rawhide Refleaks PR
  • ARM64 MacOS M1 Refleaks NoGIL PR
  • aarch64 Fedora Rawhide Refleaks PR
  • aarch64 RHEL8 Refleaks PR
  • s390x RHEL8 Refleaks PR
  • s390x Fedora Rawhide Refleaks PR
  • AMD64 FreeBSD Refleaks PR
  • aarch64 CentOS9 Refleaks PR
  • AMD64 CentOS9 NoGIL Refleaks PR
  • PPC64LE Fedora Stable Refleaks PR
  • PPC64LE Fedora Rawhide Refleaks PR
  • s390x RHEL9 Refleaks PR
  • AMD64 CentOS9 Refleaks PR
  • s390x Fedora Stable Refleaks PR

@ZeroIntensity
Copy link
Member Author

@ericsnowcurrently Yay, buildbots passed on this one!

Copy link
Member

@ericsnowcurrently ericsnowcurrently left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like with gh-128640, mostly LGTM. I've left a few minor comments.

Comment on lines 2451 to 2466
_PyAtExit_Call(tstate->interp);

if (tstate != interp->threads.head || tstate->next != NULL) {
Py_FatalError("not the last thread");
}

_PyRuntimeState *runtime = interp->runtime;
_PyEval_StopTheWorldAll(runtime);
/* Remaining daemon threads will automatically exit
when they attempt to take the GIL (ex: PyEval_RestoreThread()). */
_PyInterpreterState_SetFinalizing(interp, tstate);

PyThreadState *list = _PyThreadState_RemoveExcept(tstate);
for (PyThreadState *p = list; p != NULL; p = p->next) {
_PyThreadState_SetShuttingDown(p);
}

_PyEval_StartTheWorldAll(runtime);
_PyThreadState_DeleteList(list, /*is_after_fork=*/0);

// XXX Call something like _PyImport_Disable() here?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I've understood correctly, this is the only difference from the original PR, which had the following change in this part:

     _PyAtExit_Call(tstate->interp);
+    _PyRuntimeState *runtime = interp->runtime;
+    _PyEval_StopTheWorldAll(runtime);
+    PyThreadState *list = _PyThreadState_RemoveExcept(tstate);

     /* Remaining daemon threads will automatically exit
        when they attempt to take the GIL (ex: PyEval_RestoreThread()). */
     _PyInterpreterState_SetFinalizing(interp, tstate);
+    _PyEval_StartTheWorldAll(runtime);
+    _PyThreadState_DeleteList(list, /*is_after_fork=*/0);

     // XXX Call something like _PyImport_Disable() here?

The difference entails:

  • call _PyThreadState_RemoveExcept() after calling _PyInterpreterState_SetFinalizing(), instead of right before
  • then immediately call _PyThreadState_SetShuttingDown() on each of the removed thread states, before starting the world again

Is that right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that looks about right. Take a look at how the main interpreter does this for reference. The particularly important part was _PyThreadState_SetShuttingDown, because without it, daemon threads don't hang.

@@ -2508,27 +2522,17 @@ finalize_subinterpreters(void)

/* Clean up all remaining subinterpreters. */
while (interp != NULL) {
assert(!_PyInterpreterState_IsRunningMain(interp));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this assert removed? It should definitely still hold.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, couldn't a daemon thread be running the interpreter? They haven't shutdown at this point, because we had to move where finalize_subinterpreters is called.

Lib/test/test_interpreters/test_api.py Outdated Show resolved Hide resolved
@bedevere-app
Copy link

bedevere-app bot commented May 27, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@ZeroIntensity
Copy link
Member Author

It looks like we're triggering this assertion somehow. Removing it and running the test locally seems to pass just fine, so I'm not sure what's up. Is it possible that this is a bug in _PyCode_CheckNoExternalState?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.