-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-36888: Add multiprocessing.parent_process() #13247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
249fa8d
to
8839770
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
Lib/multiprocessing/spawn.py
Outdated
@@ -100,25 +100,22 @@ def spawn_main(pipe_handle, parent_pid=None, tracker_fd=None): | ||
|
||
if parent_pid is not None: | ||
source_process = _winapi.OpenProcess( | ||
_winapi.PROCESS_DUP_HANDLE, False, parent_pid) | ||
_winapi.PROCESS_ALL_ACCESS, False, parent_pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, is it possible to narrow the permissions? (for example PROCESS_ALL_ACCESS + SYNCHRONIZE
?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean PROCESS_DUP_HANDLE
and SYNCHRONIZE
? I tried a few days ago and I got an access denied
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. Perhaps @zooba has more insight about this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: after a good night sleep, PROCESS_DUP_HANDLE
| SYNCHRONIZE
does work.
@pitrou want to take a look? ;) |
Same as in #13276, this code will not always be functional if several python processes manipulate the same |
Why is it not functional? The parent process handle should uniquely be open in the parent process no? |
@tomMoral Pardon me, I realized my last message was not very clear.
Thus technically, a |
Is this still WIP? |
Not anymore. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have reservations about the implementation. Also, a documentation addition will be required.
Lib/multiprocessing/spawn.py
Outdated
@@ -100,25 +100,22 @@ def spawn_main(pipe_handle, parent_pid=None, tracker_fd=None): | ||
|
||
if parent_pid is not None: | ||
source_process = _winapi.OpenProcess( | ||
_winapi.PROCESS_DUP_HANDLE, False, parent_pid) | ||
_winapi.PROCESS_ALL_ACCESS, False, parent_pid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, is it possible to narrow the permissions? (for example PROCESS_ALL_ACCESS + SYNCHRONIZE
?)
Lib/multiprocessing/popen_fork.py
Outdated
@@ -70,7 +70,7 @@ def _launch(self, process_obj): | ||
if self.pid == 0: | ||
try: | ||
os.close(parent_r) | ||
code = process_obj._bootstrap() | ||
code = process_obj._bootstrap(parent_sentinel=child_w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. child_w
will always be "ready" (writable) except if the pipe buffer is full, so how can you use it as a sentinel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to use _ParentProcess.is_alive
, that itself calls mutliprocessing.connection.wait
on the sentinel. Does that answer your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, because it doesn't answer how it works concretely on the write end of the pipe.
>>> import os, select
>>> r, w = os.pipe()
>>> select.select([r], [r], [r])
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyboardInterrupt
>>> select.select([w], [w], [w])
([], [4], [])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though, at least on Linux:
>>> select.select([w], [], [w])
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyboardInterrupt
>>> os.close(r)
>>> select.select([w], [], [w])
([4], [], [])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not a frequent user of select
, but it does seem odd to have a write end of a pipe into the rlist
of select
(which is the behavior we rely on using multiprocessing.connection.wait
). But actually, I did not write this line, @tomMoral did. Maybe he has other insights on this.
@pitrou would you be more confortable if we use the read end of a new dedicated pipe as a sentinel instead to mimicate the sentinel
implementation of the parent process for fork
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that would seem like a more strictly POSIX-compliant solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, it seems weird that way. I did not realize this was different compared to spawn_posix
, where you have access to child_r
. (But there seems to be a leak with parent_w
with this implem, cf other comment).
That being said, the goal of the sentinel is simply to know if this pipe is still open or not. That is why it feels a waste to create a new pipe. But there might not be better answer.
Lib/multiprocessing/spawn.py
Outdated
def _main(fd): | ||
with os.fdopen(fd, 'rb', closefd=True) as from_parent: | ||
def _main(fd, parent_sentinel): | ||
with os.fdopen(fd, 'rb', closefd=False) as from_parent: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not very nice if fd
and parent_sentinel
are different fds... I think the caller should instead ensure those are different fds, possibly using os.dup
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woops, agreed.
@@ -50,7 +50,7 @@ def _launch(self, process_obj): | ||
|
||
self.sentinel, w = forkserver.connect_to_new_process(self._fds) | ||
self.finalizer = util.Finalize(self, os.close, (self.sentinel,)) | ||
with open(w, 'wb', closefd=True) as f: | ||
with open(w, 'wb', closefd=False) as f: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't seem right. See comment about using os.dup
instead where necessary.
Lib/test/_test_multiprocessing.py
Outdated
parent_pid, parent_name = rconn.recv() | ||
self.assertEqual(parent_pid, current_process().pid) | ||
self.assertEqual(parent_pid, os.getpid()) | ||
self.assertEqual(parent_name, current_process().name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also test parent_process()
from the parent process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK (Should return None
)
Lib/test/_test_multiprocessing.py
Outdated
def test_parent_process_attributes(self): | ||
if self.TYPE == "threads": | ||
self.skipTest('test not appropriate for {}'.format(self.TYPE)) | ||
from multiprocessing.process import current_process |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be at the top-level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I often see import
statements inside tests, and I have to admit I do not know what rule is used to determine if an import should be top-level or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, it's better to put imports at the top level, except if the module might not be available, or if importing it may have undesirable side-effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way multiprocessing.current_process()
works, and perhaps multiprocessing.parent_process()
should work, too.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Lib/multiprocessing/popen_fork.py
Outdated
@@ -70,7 +70,7 @@ def _launch(self, process_obj): | ||
if self.pid == 0: | ||
try: | ||
os.close(parent_r) | ||
code = process_obj._bootstrap() | ||
code = process_obj._bootstrap(parent_sentinel=child_w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, it seems weird that way. I did not realize this was different compared to spawn_posix
, where you have access to child_r
. (But there seems to be a leak with parent_w
with this implem, cf other comment).
That being said, the goal of the sentinel is simply to know if this pipe is still open or not. That is why it feels a waste to create a new pipe. But there might not be better answer.
I have made the requested changes; please review again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. There seems to be a couple of issues still.
@@ -50,6 +50,7 @@ def _launch(self, process_obj): | ||
|
||
self.sentinel, w = forkserver.connect_to_new_process(self._fds) | ||
self.finalizer = util.Finalize(self, os.close, (self.sentinel,)) | ||
parent_sentinel = os.dup(w) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This variable isn't used anywhere. Is it a bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It leaves the write end opened as w
is closed at the end of the following context manager (upon your request). But now that I fully understood #13247 (comment), I guess it is better to have it as a private attribute of the process object and register a Finalizer.
@@ -68,7 +68,7 @@ def __init__(self, process_obj): | ||
else: | ||
env = None | ||
|
||
with open(wfd, 'wb', closefd=True) as to_child: | ||
with open(wfd, 'wb', closefd=False) as to_child: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is that? Is it a leftover from previous attempts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, sorry about that.
Lib/multiprocessing/process.py
Outdated
self._sentinel = sentinel | ||
self._config = {} | ||
|
||
def close(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look necessary to override this method.
Lib/test/_test_multiprocessing.py
Outdated
wconn.send("alive" if parent_process().is_alive() else "not alive") | ||
|
||
start_time = time.monotonic() | ||
while (parent_process().is_alive() and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about calling join(timeout=5)
? With the sentinel, it seems like it should work.
self.finalizer = util.Finalize(self, os.close, (self.sentinel,)) | ||
self.finalizer = util.Finalize(self, os.close, (self._parent_w,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_parent_w
is not the best name, but I did not find convincing, short alternatives. I thought _child_sentinel
was somewhat unclear. Any ideas anyone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _parent_w
is ok. Also you don't need to make it an attribute (see other comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, you're also clobbering the previous finalizer here.
@pitrou thanks for the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple more comments.
Lib/multiprocessing/popen_fork.py
Outdated
self.sentinel = parent_r | ||
self._parent_w = parent_w |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not required (but it doesn't hurt either).
self.finalizer = util.Finalize(self, os.close, (self.sentinel,)) | ||
self.finalizer = util.Finalize(self, os.close, (self._parent_w,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think _parent_w
is ok. Also you don't need to make it an attribute (see other comment).
Lib/test/_test_multiprocessing.py
Outdated
from multiprocessing.process import parent_process | ||
wconn.send("alive" if parent_process().is_alive() else "not alive") | ||
|
||
start_time = time.monotonic() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't used anymore.
Lib/multiprocessing/popen_fork.py
Outdated
self.finalizer = util.Finalize(self, os.close, (parent_r,)) | ||
self.finalizer = util.Finalize(self, os.close, (parent_w,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this is clobbering the previous finalizer. You should use a single callback that closes both fds.
self.finalizer = util.Finalize(self, os.close, (self.sentinel,)) | ||
self.finalizer = util.Finalize(self, os.close, (self._parent_w,)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, you're also clobbering the previous finalizer here.
I have made the requested changes; please review again. |
Thank you :-) |
In the std lib, the
semaphore_tracker
and theManager
rely on daemonized processes that are launched with server like loops. The cleaning of such processes is made complicated by the fact that there is no cannonical way to check if the parent process is alive.I propose to add in context a
parent_process
function that would give access to aProcess
object representing the parent process. This way, we could benefit from sentinel to improve the clean up of these processes that can be left dangling in case of hard stop from the main interpreter.https://bugs.python.org/issue36888