-
-
Notifications
You must be signed in to change notification settings - Fork 32.2k
bpo-36977: Make SharedMemoryManager release its resources if its parent process dies #13451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
bpo-36977: Make SharedMemoryManager release its resources if its parent process dies #13451
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello, IMO you should test the if your changes. I think that you are not test it, isn't?
Are you making this changes for the test? I think if you want to say anything about (in this
example) process child are alive yet, you need to say it on the code (no in the test).
Because, anyone will know that the process are alive.
56c0933
to
c8add0d
Compare
c8add0d
to
87c889f
Compare
deadline = start_time + 60 | ||
t = 0.1 | ||
while time.monotonic() < deadline: | ||
time.sleep(t) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel comfortable having tests that rely on sleep, they are almost assured to fail in some of the buildbots
(check #10700 for example). Could you restructure this
tests to use synchronization primitives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your worries. I am having trouble thinking of a test that does not require a timeout though (else, we risk hanging forever waiting for the synchronization event to occur). Inherently, this tests a feature that relies on connection.wait
, which can block for a non-deterministic amount of time.
Note that using this pattern:
while time.monotonic() < deadline:
time.sleep(t)
if condition():
break
else:
raise AssertionError
is already more robust than a simple
time.sleep(dt)
self.assertTrue(condition())
It was discussed here: https://bugs.python.org/issue36867, and @vstinner seemed OK with its usage. I may be missing something though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
multiprocessing
remains the module with the higher amount of random failures due to sleeps/race conditions. I will dismiss my review in case there is no other way, but I still uneasy about adding more sleep()
:(
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Maybe @pitrou wants to have a look at this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments below. It would also be nice if @applio could take a look and say if it looks fine on the principle.
Lib/multiprocessing/managers.py
Outdated
@@ -159,6 +159,10 @@ def __init__(self, registry, address, authkey, serializer): | ||
self.id_to_local_proxy_obj = {} | ||
self.mutex = threading.Lock() | ||
|
||
def _track_parent(self): | ||
process.parent_process().join() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if process.parent_process()
returns None? For example if the Server was launched in an independent Python instance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not know this was a supported use-case. But we can add a guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, adding a guard will simply discard the case where parent_process() is None
, which is not the best solution. Yet, I feel that tracking the usage of manager-allocated resources by other processes than the manager's direct parent is tricky, especially for SharedMemoryManager
(see more detailed explanation here #13247 (comment)).
Do we want to make this feature more robust by trying to make the manager wait on any process that tries to connect to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I'm not following you here. We're simply trying to fix the case where the manager was launched by its parent process, no? (that's what the issue title says)
The case where the manager is autonomous looks pretty much unfixable on its own.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're simply trying to fix the case where the manager was launched by its parent process
Ok, lets stick to this in this PR.
The case where the manager is autonomous looks pretty much unfixable on its own.
I agree, but there are in-betweens: if a manager is created by a process (its parent), and then connected to by other processes, then should the manager just watch its parent? Technically, if other processes are also connected to it, it probably should not shut down unless ALL processes connected to it (and not only its parent) terminated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... So, conversely, if the manager loses all its connections and was created by a parent process, then it should die?
Perhaps that should be the right heuristic?
In other words: SharedMemoryServer.handle_request
can easily keep track of the number of connected clients. At the end, it decrements that number and, if it falls to 0 and there is a parent process, then die?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this is already a more robust solution. However, as opposed to with SyncManager
, manipulating/modifying SharedMemoryManager
delivered memory segments does not necessarily trigger communication with the manager server process. This is because SharedMemoryManager
delivers SharedMemory
objects, and not proxies. Thus, keeping track of the number of connected clients does not equate keep track of the processes manipulating those objects. But maybe this approach is still good enough for now.
Also, the recent os.memfd_create
mentioned in this python issue looks interesting, but I need to look more into it.
shm_name = p.stdout.readline().decode().strip() | ||
p.terminate() | ||
start_time = time.monotonic() | ||
deadline = start_time + 60 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 seconds sounds enough, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of overloaded CI workers one can see weird things on rare occasions.
99.9% of the time it will probably take less than 10ms (although I have not checked), this limit is just a timeout, so I think it's fine to keep a large value to make it extremely unlikely to get false positive failures on CI servers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are seconds here not milliseconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, no strong feelings either.
Lib/multiprocessing/managers.py
Outdated
@@ -159,6 +159,10 @@ def __init__(self, registry, address, authkey, serializer): | ||
self.id_to_local_proxy_obj = {} | ||
self.mutex = threading.Lock() | ||
|
||
def _track_parent(self): | ||
process.parent_process().join() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid I'm not following you here. We're simply trying to fix the case where the manager was launched by its parent process, no? (that's what the issue title says)
The case where the manager is autonomous looks pretty much unfixable on its own.
https://bugs.python.org/issue36977