Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

bpo-36888: Add multiprocessing.parent_process() #13247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
May 20, 2019

Conversation

tomMoral
Copy link
Contributor

@tomMoral tomMoral commented May 11, 2019

In the std lib, the semaphore_tracker and the Manager rely on daemonized processes that are launched with server like loops. The cleaning of such processes is made complicated by the fact that there is no cannonical way to check if the parent process is alive.

I propose to add in context a parent_process function that would give access to a Process object representing the parent process. This way, we could benefit from sentinel to improve the clean up of these processes that can be left dangling in case of hard stop from the main interpreter.

https://bugs.python.org/issue36888

Copy link
Contributor Author

@tomMoral tomMoral left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments

Lib/test/_test_multiprocessing.py Outdated Show resolved Hide resolved
Lib/test/_test_multiprocessing.py Outdated Show resolved Hide resolved
Lib/test/_test_multiprocessing.py Show resolved Hide resolved
Lib/test/_test_multiprocessing.py Outdated Show resolved Hide resolved
@@ -100,25 +100,22 @@ def spawn_main(pipe_handle, parent_pid=None, tracker_fd=None):

if parent_pid is not None:
source_process = _winapi.OpenProcess(
_winapi.PROCESS_DUP_HANDLE, False, parent_pid)
_winapi.PROCESS_ALL_ACCESS, False, parent_pid)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is it possible to narrow the permissions? (for example PROCESS_ALL_ACCESS + SYNCHRONIZE?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean PROCESS_DUP_HANDLE and SYNCHRONIZE? I tried a few days ago and I got an access denied.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Perhaps @zooba has more insight about this.

Copy link
Contributor

@pierreglaser pierreglaser May 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: after a good night sleep, PROCESS_DUP_HANDLE | SYNCHRONIZE does work.

@pierreglaser
Copy link
Contributor

@pitrou want to take a look? ;)

@pierreglaser
Copy link
Contributor

Same as in #13276, this code will not always be functional if several python processes manipulate the same SharedMemoryManager. I am not sure that this represents the canonical use-case of SharedMemoryManager though.

@tomMoral
Copy link
Contributor Author

Why is it not functional? The parent process handle should uniquely be open in the parent process no?

@pierreglaser
Copy link
Contributor

@tomMoral Pardon me, I realized my last message was not very clear.

Manager objects can establish connections to several python processes. It does not make so much sense to differentiate its parent process to other potential processes that are connected to it: especially, we do not want the Manager process to start cleaning up ressources if its parent dies, while other processes may still be connected to the Manager.

Thus technically, a Manager would need one sentinel per process it is connected to (and not only one for its parent process, as it is the case right now). This is particularly hard to implement for SharedMemoryManager objects, because when asked to create a new resource, a SharedMemoryManager does not return a proxy type, but an actual SharedMemory object. Thus, processes accessing the SharedMemory object created by the Manager do not need to communicate with the Manager whenever they want to modify it.
Therefore, it seems hard, maybe impossible, for a SharedMemoryManager to track all the processes manipulating the objects it delivers.

@pitrou
Copy link
Member

pitrou commented May 19, 2019

Is this still WIP?

@pierreglaser
Copy link
Contributor

Not anymore.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reservations about the implementation. Also, a documentation addition will be required.

@@ -100,25 +100,22 @@ def spawn_main(pipe_handle, parent_pid=None, tracker_fd=None):

if parent_pid is not None:
source_process = _winapi.OpenProcess(
_winapi.PROCESS_DUP_HANDLE, False, parent_pid)
_winapi.PROCESS_ALL_ACCESS, False, parent_pid)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is it possible to narrow the permissions? (for example PROCESS_ALL_ACCESS + SYNCHRONIZE?)

@@ -70,7 +70,7 @@ def _launch(self, process_obj):
if self.pid == 0:
try:
os.close(parent_r)
code = process_obj._bootstrap()
code = process_obj._bootstrap(parent_sentinel=child_w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. child_w will always be "ready" (writable) except if the pipe buffer is full, so how can you use it as a sentinel?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to use _ParentProcess.is_alive, that itself calls mutliprocessing.connection.wait on the sentinel. Does that answer your question?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, because it doesn't answer how it works concretely on the write end of the pipe.

>>> import os, select
>>> r, w = os.pipe()
>>> select.select([r], [r], [r])
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyboardInterrupt
>>> select.select([w], [w], [w])
([], [4], [])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though, at least on Linux:

>>> select.select([w], [], [w])
^CTraceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyboardInterrupt
>>> os.close(r)
>>> select.select([w], [], [w])
([4], [], [])

Copy link
Contributor

@pierreglaser pierreglaser May 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a frequent user of select, but it does seem odd to have a write end of a pipe into the rlist of select (which is the behavior we rely on using multiprocessing.connection.wait). But actually, I did not write this line, @tomMoral did. Maybe he has other insights on this.
@pitrou would you be more confortable if we use the read end of a new dedicated pipe as a sentinel instead to mimicate the sentinel implementation of the parent process for fork?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that would seem like a more strictly POSIX-compliant solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, it seems weird that way. I did not realize this was different compared to spawn_posix, where you have access to child_r. (But there seems to be a leak with parent_w with this implem, cf other comment).

That being said, the goal of the sentinel is simply to know if this pipe is still open or not. That is why it feels a waste to create a new pipe. But there might not be better answer.

def _main(fd):
with os.fdopen(fd, 'rb', closefd=True) as from_parent:
def _main(fd, parent_sentinel):
with os.fdopen(fd, 'rb', closefd=False) as from_parent:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not very nice if fd and parent_sentinel are different fds... I think the caller should instead ensure those are different fds, possibly using os.dup.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, agreed.

@@ -50,7 +50,7 @@ def _launch(self, process_obj):

self.sentinel, w = forkserver.connect_to_new_process(self._fds)
self.finalizer = util.Finalize(self, os.close, (self.sentinel,))
with open(w, 'wb', closefd=True) as f:
with open(w, 'wb', closefd=False) as f:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem right. See comment about using os.dup instead where necessary.

parent_pid, parent_name = rconn.recv()
self.assertEqual(parent_pid, current_process().pid)
self.assertEqual(parent_pid, os.getpid())
self.assertEqual(parent_name, current_process().name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also test parent_process() from the parent process.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK (Should return None)

def test_parent_process_attributes(self):
if self.TYPE == "threads":
self.skipTest('test not appropriate for {}'.format(self.TYPE))
from multiprocessing.process import current_process
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be at the top-level?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I often see import statements inside tests, and I have to admit I do not know what rule is used to determine if an import should be top-level or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, it's better to put imports at the top level, except if the module might not be available, or if importing it may have undesirable side-effects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way multiprocessing.current_process() works, and perhaps multiprocessing.parent_process() should work, too.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Lib/multiprocessing/popen_spawn_posix.py Show resolved Hide resolved
@@ -70,7 +70,7 @@ def _launch(self, process_obj):
if self.pid == 0:
try:
os.close(parent_r)
code = process_obj._bootstrap()
code = process_obj._bootstrap(parent_sentinel=child_w)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, it seems weird that way. I did not realize this was different compared to spawn_posix, where you have access to child_r. (But there seems to be a leak with parent_w with this implem, cf other comment).

That being said, the goal of the sentinel is simply to know if this pipe is still open or not. That is why it feels a waste to create a new pipe. But there might not be better answer.

@tomMoral tomMoral requested a review from 1st1 as a code owner May 20, 2019 10:27
@pierreglaser
Copy link
Contributor

I have made the requested changes; please review again.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. There seems to be a couple of issues still.

@@ -50,6 +50,7 @@ def _launch(self, process_obj):

self.sentinel, w = forkserver.connect_to_new_process(self._fds)
self.finalizer = util.Finalize(self, os.close, (self.sentinel,))
parent_sentinel = os.dup(w)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable isn't used anywhere. Is it a bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It leaves the write end opened as wis closed at the end of the following context manager (upon your request). But now that I fully understood #13247 (comment), I guess it is better to have it as a private attribute of the process object and register a Finalizer.

@@ -68,7 +68,7 @@ def __init__(self, process_obj):
else:
env = None

with open(wfd, 'wb', closefd=True) as to_child:
with open(wfd, 'wb', closefd=False) as to_child:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that? Is it a leftover from previous attempts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, sorry about that.

Lib/multiprocessing/popen_spawn_posix.py Show resolved Hide resolved
Lib/multiprocessing/process.py Show resolved Hide resolved
Lib/multiprocessing/popen_fork.py Outdated Show resolved Hide resolved
self._sentinel = sentinel
self._config = {}

def close(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look necessary to override this method.

wconn.send("alive" if parent_process().is_alive() else "not alive")

start_time = time.monotonic()
while (parent_process().is_alive() and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about calling join(timeout=5)? With the sentinel, it seems like it should work.

self.finalizer = util.Finalize(self, os.close, (self.sentinel,))
self.finalizer = util.Finalize(self, os.close, (self._parent_w,))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_parent_w is not the best name, but I did not find convincing, short alternatives. I thought _child_sentinel was somewhat unclear. Any ideas anyone?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _parent_w is ok. Also you don't need to make it an attribute (see other comment).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, you're also clobbering the previous finalizer here.

@pierreglaser
Copy link
Contributor

@pitrou thanks for the review.
I have made the requested changes; please review again.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more comments.

self.sentinel = parent_r
self._parent_w = parent_w
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not required (but it doesn't hurt either).

self.finalizer = util.Finalize(self, os.close, (self.sentinel,))
self.finalizer = util.Finalize(self, os.close, (self._parent_w,))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think _parent_w is ok. Also you don't need to make it an attribute (see other comment).

Lib/multiprocessing/popen_spawn_posix.py Show resolved Hide resolved
Lib/multiprocessing/process.py Outdated Show resolved Hide resolved
from multiprocessing.process import parent_process
wconn.send("alive" if parent_process().is_alive() else "not alive")

start_time = time.monotonic()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't used anymore.

self.finalizer = util.Finalize(self, os.close, (parent_r,))
self.finalizer = util.Finalize(self, os.close, (parent_w,))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is clobbering the previous finalizer. You should use a single callback that closes both fds.

self.finalizer = util.Finalize(self, os.close, (self.sentinel,))
self.finalizer = util.Finalize(self, os.close, (self._parent_w,))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, you're also clobbering the previous finalizer here.

@pierreglaser
Copy link
Contributor

pierreglaser commented May 20, 2019

I have made the requested changes; please review again.
@pitrou hopefully this will not make you cringe again :)

@pitrou pitrou changed the title bpo-36888 ENH add parent_process in multiprocessing contexts bpo-36888: Add multiprocessing.parent_process() May 20, 2019
@pitrou pitrou merged commit c09a9f5 into python:master May 20, 2019
@pitrou
Copy link
Member

pitrou commented May 20, 2019

Thank you :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
Morty Proxy This is a proxified and sanitized view of the page, visit original site.