-
-
Notifications
You must be signed in to change notification settings - Fork 32k
gh-109564: Add None
check for asyncio.Server._wakeup
#126660
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
gh-109564: Add None
check for asyncio.Server._wakeup
#126660
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a test, too. Please add one to the server tests.
Misc/NEWS.d/next/Library/2024-11-11-11-13-30.gh-issue-109564._f7W0v.rst
Outdated
Show resolved
Hide resolved
Lib/asyncio/base_events.py
Outdated
@@ -303,6 +303,8 @@ def _detach(self, transport): | ||
self._wakeup() | ||
|
||
def _wakeup(self): | ||
if self._waiters is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well do the None check on the waiters
local after it's been assigned from self._waiters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in d963237
…f7W0v.rst Co-authored-by: Peter Bierma <zintensitydev@gmail.com>
Lib/asyncio/base_events.py
Outdated
@@ -303,6 +303,8 @@ def _detach(self, transport): | ||
self._wakeup() | ||
|
||
def _wakeup(self): | ||
if self._waiters is None: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a detailed comment on how a double call can happen (sort of a shorter version of your main PR message)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in d963237
Sorry for the late reply, @ZeroIntensity! I'm having a hard time creating a unit test for this scenario. Would appreciate some pointers on how to approach a unit test for this. Below is a snippet that can reproduce the error in a unit-test, however the class TestServer2():
# ...
async def test_close_race(self):
srv = await asyncio.start_server(lambda *_: None, socket_helper.HOSTv4, 0)
# Do not await connection so that we can recreate the race condition by closing
# the server after the connection is accepted, but before it is handled.
addr = srv.sockets[0].getsockname()
conn_task = asyncio.create_task(asyncio.open_connection(addr[0], addr[1]))
for _ in range(3):
# 1st yield: client socket connect
# 2nd yield: selector reader
# 3rd yield: server accept connection
await asyncio.sleep(0)
srv.close()
# server accept connection 2
await asyncio.sleep(0)
# Complete the client connection to close the socket
_, wr = await conn_task
self.addCleanup(wr.close)
await srv.wait_closed() I also considered trying to get a reference to the server transport to handle the accepted connection. However, this fails to construct completely (failing when it attempts to attach to the server that just closed) and the object is destroyed (calling Additional errorsI discovered an unrelated error on the client side as well; This occurs when the client attempts to create a transport before the server is able to accept the connection. In our unit test above the timing of when the
It appears this issue is observed for Mac OS: envoyproxy/envoy#1446 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this, that said, I think we should do a broader change - instead of relying on "implicit" notion of the actual state the Server is in by checking some attributes if they are None or not, we should add an explicit self._state
enum member with states like INITIALIZED
, SERVING
, CLOSED
, etc.
The PR looks good to me, but I'll wait to hear from @ZeroIntensity before I hit the green button. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what's going on with this bug. On Linux, I wasn't able to reproduce the issue given the reproducer in the PR description, and the two reproducer files from gh-109564 don't cause any errors on my end either. In fact, on this PR, the original reproducer causes a bunch of ResourceWarning
spam.
Regarding the test--this PR got lost in notifications, sorry for not replying! I think as long as we have something that reproduces the error that should work. But, I ran that test locally and I don't see any error (and again, it instead spams ResourceWarning
with this change).
Sorry @ZeroIntensity, the only thing this PR addresses is an unraisable exception raised by I've manage to write a unit-test that captures this exception in 001c286. I also added a Let me know if the unraisable exception is a trivial matter that doesn't need to be cleaned-up. Happy to close the PR if that's the case.
If the team prefers, I'm happy to close this PR and instead work on a more robust change like @1st1 mentioned. LMK! |
# When the server is closed before a connection is handled but after the | ||
# connection is accepted, then a race-condition exists between the handler | ||
# transport and the server, both which will attempt to wakeup the server to set | ||
# any server waiters. We can recreate race-condition by opening a connection and | ||
# waiting for the server reader callback before closing the server | ||
loop = asyncio.get_running_loop() | ||
srv_reader, _ = loop._selector.get_key(srv_sock.fileno()).data | ||
conn_task = asyncio.create_task(asyncio.open_connection(addr[0], addr[1])) | ||
for _ in range(10): | ||
await asyncio.sleep(0) | ||
if srv_reader in loop._ready: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, forgot about Windows here.
If I set this to a fixed 3 yields without relying on if srv_reader in loop._ready
, then the test is flaky (sometimes the error won't be observed and the test will have a false-positive)
The first yield is to let the client connection to start the connection, the second for reader to be selected, and the last for the server to accept the connection. But, I'm not sure why the accept connection tasks sometimes happens much later, and hence the false-positive.
Thanks for adding a test! I've temporarily marked this as I'll look closer at this again tomorrow morning (it's 9:30 PM here). In the meantime, do you have a reproducer for the unraisable exception? Everything I've seen so far in this PR and the issue haven't emitted any warnings or errors on my end. (It's possible that my system is just dodging the race condition by being either too fast or too slow--I'll investigate on a few different machines later.)
Assuming asyncio is just being weird on my end and this PR is really fixing a problem, then we can merge it as-is and backport it, and then put a broader/feature-ish fix on main only. |
Sorry, I don't have anything other than the test on Test on main (acbd5c9)
Reproducer from bug ticket<sys>:0: ResourceWarning: unclosed <socket.socket fd=7, family=2, type=1, proto=0, laddr=('127.0.0.1', 4445), raddr=('127.0.0.1', 50638)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
<sys>:0: ResourceWarning: unclosed <socket.socket fd=8, family=2, type=1, proto=0, laddr=('127.0.0.1', 4445), raddr=('127.0.0.1', 50639)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/cpython/Lib/asyncio/selector_events.py:869: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=7>
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Exception ignored in: <function _SelectorTransport.__del__ at 0x10415f710>
Traceback (most recent call last):
File "/cpython/Lib/asyncio/selector_events.py", line 872, in __del__
self._server._detach(self)
File "/cpython/Lib/asyncio/base_events.py", line 303, in _detach
self._wakeup()
File "/cpython/Lib/asyncio/base_events.py", line 308, in _wakeup
for waiter in waiters:
TypeError: 'NoneType' object is not iterable
/cpython/Lib/asyncio/selector_events.py:869: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=8>
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Exception ignored in: <function _SelectorTransport.__del__ at 0x10415f710>
Traceback (most recent call last):
File "/cpython/Lib/asyncio/selector_events.py", line 872, in __del__
self._server._detach(self)
File "/cpython/Lib/asyncio/base_events.py", line 303, in _detach
self._wakeup()
File "/cpython/Lib/asyncio/base_events.py", line 308, in _wakeup
for waiter in waiters:
TypeError: 'NoneType' object is not iterable This is my system information:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your hard work and responsiveness! Since the reproducers don't seem to work reliably on Linux, and this PR actually causes somewhat of a regression in terms of ResourceWarning
spam, I'd like to go for a broader change like Yury suggested. (I'm fine with implementing it in this PR instead of opening a new one, FWIW.)
Running the test locally on my end:
import asyncio
import gc
async def test_close_race():
srv = await asyncio.start_server(lambda *_: None, '0.0.0.0', 5000)
srv_sock = srv.sockets[0]
addr = srv_sock.getsockname()
# When the server is closed before a connection is handled but after the
# connection is accepted, then a race-condition exists between the handler
# transport and the server, both which will attempt to wakeup the server to set
# any server waiters. We can recreate race-condition by opening a connection and
# waiting for the server reader callback before closing the server
loop = asyncio.get_running_loop()
srv_reader, _ = loop._selector.get_key(srv_sock.fileno()).data
conn_task = asyncio.create_task(asyncio.open_connection(addr[0], addr[1]))
for _ in range(10):
await asyncio.sleep(0)
if srv_reader in loop._ready:
break
# Ensure accepted connection task is scheduled by the server reader, but not
# completed, before closing the server.
await asyncio.sleep(0)
srv.close()
# Complete the client connection to close the socket. Suppress errors in the
# handler transport due to failing to attach to closed server.
try:
_, wr = await conn_task
except OSError as e:
print(e) # Just to see something for insight
await srv.wait_closed()
# Verify the handler transport does not raise an error due to multiple calls to
# the server wakeup. Suppress expected ResourceWarnings from the handler
# transport failing to attach to the closed server.
gc.collect()
asyncio.run(test_close_race())
It seems that I've been wrong about the affected versions. Python 3.12.3 (a little bit outdated but I don't think it matters) outputs nothing, but 3.13.0 and the main branch do indeed cause an error:
Exception ignored in: <function _SelectorTransport.__del__ at 0x7b23f43aca40>
Traceback (most recent call last):
File "/usr/lib/python3.13/asyncio/selector_events.py", line 872, in __del__
self._server._detach(self)
File "/usr/lib/python3.13/asyncio/base_events.py", line 304, in _detach
self._wakeup()
File "/usr/lib/python3.13/asyncio/base_events.py", line 309, in _wakeup
for waiter in waiters:
TypeError: 'NoneType' object is not iterable
With this PR, I now see this:
/root/cpython/Lib/asyncio/selector_events.py:871: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=8 read=idle write=<idle, bufsize=0>>
_warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/root/cpython/Lib/asyncio/streams.py:410: ResourceWarning: unclosed <StreamWriter transport=<_SelectorSocketTransport closing fd=7 read=idle write=<idle, bufsize=0>> reader=<StreamReader eof transport=<_SelectorSocketTransport closing fd=7 read=idle write=<idle, bufsize=0>>>>
warnings.warn(f"unclosed {self!r}", ResourceWarning)
Which is better! But my main concern (and why I'm requesting changes) is because this causes ResourceWarning
s to be shown where there was no error or warning before (such as with the reproducer in the PR description--that doesn't emit anything on the current main, but it does with this change). Please ping me if you need help :)
@1st1 Were you able to get the reproducer in GH-109564 or the PR description to work locally?
No I haven't tried reproducing this bug. I'm away this week, can try doing that next week. |
Adds a
None
check forasyncio.Server._wakeup
to prevent crashing when it is called multiple times.Because the method clears
self._waiters
the next call to this method will fail when it attempts to iterate onNone
.This can happen when the public
asyncio.Server.close
API and internalasyncio.Server._detach
are both called, and this is demonstrated in #109564, when the server is closed after a socket connection is accepted but before it is handled.Example error of missing
None
checkRepro:
Usage:
Retracted stack trace: