-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-128639: Don't assume one thread in subinterpreter finalization with fixed daemon thread support #134606
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-128639: Don't assume one thread in subinterpreter finalization with fixed daemon thread support #134606
Conversation
…inalization (pythongh-128640)" (pythongh-134256) This reverts commit 27bd082.
This comment was marked as outdated.
This comment was marked as outdated.
🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit a569355 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134606%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
!buildbot refleak |
🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit f85a291 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134606%2Fmerge The command will test the builders whose names match following regular expression: The builders matched are:
|
@ericsnowcurrently Yay, buildbots passed on this one! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like with gh-128640, mostly LGTM. I've left a few minor comments.
_PyAtExit_Call(tstate->interp); | ||
|
||
if (tstate != interp->threads.head || tstate->next != NULL) { | ||
Py_FatalError("not the last thread"); | ||
} | ||
|
||
_PyRuntimeState *runtime = interp->runtime; | ||
_PyEval_StopTheWorldAll(runtime); | ||
/* Remaining daemon threads will automatically exit | ||
when they attempt to take the GIL (ex: PyEval_RestoreThread()). */ | ||
_PyInterpreterState_SetFinalizing(interp, tstate); | ||
|
||
PyThreadState *list = _PyThreadState_RemoveExcept(tstate); | ||
for (PyThreadState *p = list; p != NULL; p = p->next) { | ||
_PyThreadState_SetShuttingDown(p); | ||
} | ||
|
||
_PyEval_StartTheWorldAll(runtime); | ||
_PyThreadState_DeleteList(list, /*is_after_fork=*/0); | ||
|
||
// XXX Call something like _PyImport_Disable() here? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I've understood correctly, this is the only difference from the original PR, which had the following change in this part:
_PyAtExit_Call(tstate->interp);
+ _PyRuntimeState *runtime = interp->runtime;
+ _PyEval_StopTheWorldAll(runtime);
+ PyThreadState *list = _PyThreadState_RemoveExcept(tstate);
/* Remaining daemon threads will automatically exit
when they attempt to take the GIL (ex: PyEval_RestoreThread()). */
_PyInterpreterState_SetFinalizing(interp, tstate);
+ _PyEval_StartTheWorldAll(runtime);
+ _PyThreadState_DeleteList(list, /*is_after_fork=*/0);
// XXX Call something like _PyImport_Disable() here?
The difference entails:
- call
_PyThreadState_RemoveExcept()
after calling_PyInterpreterState_SetFinalizing()
, instead of right before - then immediately call
_PyThreadState_SetShuttingDown()
on each of the removed thread states, before starting the world again
Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that looks about right. Take a look at how the main interpreter does this for reference. The particularly important part was _PyThreadState_SetShuttingDown
, because without it, daemon threads don't hang.
@@ -2508,27 +2522,17 @@ finalize_subinterpreters(void) | ||
|
||
/* Clean up all remaining subinterpreters. */ | ||
while (interp != NULL) { | ||
assert(!_PyInterpreterState_IsRunningMain(interp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this assert removed? It should definitely still hold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, couldn't a daemon thread be running the interpreter? They haven't shutdown at this point, because we had to move where finalize_subinterpreters
is called.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Co-authored-by: Eric Snow <ericsnowcurrently@gmail.com>
It looks like we're triggering this assertion somehow. Removing it and running the test locally seems to pass just fine, so I'm not sure what's up. Is it possible that this is a bug in |
This is a reapplication of #128640, with daemon thread hanging fixed.
The key part that I was missing was this:
My fix predated these lines, so it didn't make it in originally.
The reason this only failed on the iOS buildbots was because they're single-process. Out of random choice, I picked 100 seconds as the sleep time, but since the process would exit before that 100 seconds was over on all the other buildbots, there wasn't any problem. On iOS, the test suite continued to run in the same process after the 100 seconds were up, so the daemon threads would start trying to run again under a deallocated interpreter, causing all sorts of weird crashes.