gh-109564: Add `None` check for `asyncio.Server._wakeup` #126660

ordinary-jamie · Nov 11, 2024

Adds a None check for asyncio.Server._wakeup to prevent crashing when it is called multiple times.

Because the method clears self._waiters the next call to this method will fail when it attempts to iterate on None.

This can happen when the public asyncio.Server.close API and internal asyncio.Server._detach are both called, and this is demonstrated in #109564, when the server is closed after a socket connection is accepted but before it is handled.

Example error of missing `None` check

Repro:

# repro.py

import asyncio
import sys
import socket

async def server():
    server = await asyncio.start_server(lambda *_: None, host="127.0.0.1", port=4445)
    await asyncio.sleep(0)
    server.close()
    await server.wait_closed()


def client():

    while True:
        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
            try:
                sock.connect(("127.0.0.1", 4445))
            except IOError:
                pass
            finally:
                sock.close()


if __name__ == '__main__':
    if len(sys.argv) > 1 and sys.argv[1] == 'server':
        asyncio.run(server())
    else:
        client()

Usage:

python repro.py &

# May have to call multiple times before observing error
python repro.py server

Retracted stack trace:

/cpython/Lib/asyncio/selector_events.py:869: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=10>
  _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Exception ignored in: <function _SelectorTransport.__del__ at 0x10194b650>
Traceback (most recent call last):
  File "/cpython/Lib/asyncio/selector_events.py", line 872, in __del__
    self._server._detach(self)
  File "/cpython/Lib/asyncio/base_events.py", line 303, in _detach
    self._wakeup()
  File "/cpython/Lib/asyncio/base_events.py", line 308, in _wakeup
    for waiter in waiters:
TypeError: 'NoneType' object is not iterable

Issue: Race condition in asyncio.Server.close #109564

ZeroIntensity

This needs a test, too. Please add one to the server tests.

Misc/NEWS.d/next/Library/2024-11-11-11-13-30.gh-issue-109564._f7W0v.rst

graingert · Nov 12, 2024

Lib/asyncio/base_events.py

@@ -303,6 +303,8 @@ def _detach(self, transport):
            self._wakeup()

    def _wakeup(self):
+        if self._waiters is None:


Might as well do the None check on the waiters local after it's been assigned from self._waiters

Addressed in d963237

…f7W0v.rst Co-authored-by: Peter Bierma <zintensitydev@gmail.com>

1st1 · Nov 14, 2024

Lib/asyncio/base_events.py

@@ -303,6 +303,8 @@ def _detach(self, transport):
            self._wakeup()

    def _wakeup(self):
+        if self._waiters is None:
+            return


Please add a detailed comment on how a double call can happen (sort of a shorter version of your main PR message)

Addressed in d963237

ordinary-jamie · Nov 16, 2024

Sorry for the late reply, @ZeroIntensity!

I'm having a hard time creating a unit test for this scenario. Would appreciate some pointers on how to approach a unit test for this.

Below is a snippet that can reproduce the error in a unit-test, however the _accept_connection2 task is not easily accessible from this test. And, even if I did filter it out from asyncio.tasks.all_tasks() then I can't do much with this since the error is caught and the future result is simply set to None.

class TestServer2():
    # ...

    async def test_close_race(self):
        srv = await asyncio.start_server(lambda *_: None, socket_helper.HOSTv4, 0)

        # Do not await connection so that we can recreate the race condition by closing
        # the server after the connection is accepted, but before it is handled.
        addr = srv.sockets[0].getsockname()
        conn_task = asyncio.create_task(asyncio.open_connection(addr[0], addr[1]))
        for _ in range(3):
            # 1st yield: client socket connect
            # 2nd yield: selector reader
            # 3rd yield: server accept connection
            await asyncio.sleep(0)

        srv.close()

        # server accept connection 2
        await asyncio.sleep(0)

        # Complete the client connection to close the socket
        _, wr = await conn_task
        self.addCleanup(wr.close)

        await srv.wait_closed()

I also considered trying to get a reference to the server transport to handle the accepted connection. However, this fails to construct completely (failing when it attempts to attach to the server that just closed) and the object is destroyed (calling _SelectorTransport.__del__ and then its wakeup which triggers the error this PR is patching).

Additional errors

I discovered an unrelated error on the client side as well;

This occurs when the client attempts to create a transport before the server is able to accept the connection. In our unit test above the timing of when the _accept_connection reader is called varies in relation to when the client starts creating the transport; I'm not sure why at this stage and will investigate in a separate issue.

File "/cpython/Lib/test/test_asyncio/test_server.py", line 301, in test_close_race
    _, wr = await conn_task
            ^^^^^^^^^^^^^^^
  File "/cpython/Lib/asyncio/streams.py", line 48, in open_connection
    transport, _ = await loop.create_connection(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        lambda: protocol, host, port, **kwds)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cpython/Lib/asyncio/base_events.py", line 1192, in create_connection
    transport, protocol = await self._create_connection_transport(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    ...<2 lines>...
        ssl_shutdown_timeout=ssl_shutdown_timeout)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cpython/Lib/asyncio/base_events.py", line 1223, in _create_connection_transport
    transport = self._make_socket_transport(sock, protocol, waiter)
  File "/cpython/Lib/asyncio/selector_events.py", line 72, in _make_socket_transport
    return _SelectorSocketTransport(self, sock, protocol, waiter,
                                    extra, server)
  File "/cpython/Lib/asyncio/selector_events.py", line 946, in __init__
    base_events._set_nodelay(self._sock)
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
  File "/cpython/Lib/asyncio/base_events.py", line 196, in _set_nodelay
    sock.setsockopt(socket.IPPROTO_TCP, socket.TCP_NODELAY, 1)
    ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: [Errno 22] Invalid argument

It appears this issue is observed for Mac OS: envoyproxy/envoy#1446

…/ordinary-jamie/cpython into asyncio-server-wakeup-none-check

1st1

Approving this, that said, I think we should do a broader change - instead of relying on "implicit" notion of the actual state the Server is in by checking some attributes if they are None or not, we should add an explicit self._state enum member with states like INITIALIZED, SERVING, CLOSED, etc.

cc @ZeroIntensity

1st1 · Nov 17, 2024

The PR looks good to me, but I'll wait to hear from @ZeroIntensity before I hit the green button.

ZeroIntensity

I'm not sure what's going on with this bug. On Linux, I wasn't able to reproduce the issue given the reproducer in the PR description, and the two reproducer files from gh-109564 don't cause any errors on my end either. In fact, on this PR, the original reproducer causes a bunch of ResourceWarning spam.

Regarding the test--this PR got lost in notifications, sorry for not replying! I think as long as we have something that reproduces the error that should work. But, I ran that test locally and I don't see any error (and again, it instead spams ResourceWarning with this change).

ordinary-jamie · Nov 18, 2024

I'm not sure what's going on with this bug. On Linux, I wasn't able to reproduce the issue given the reproducer in the PR description, and the two reproducer files from #109564 don't cause any errors on my end either. In fact, on this PR, the original reproducer causes a bunch of ResourceWarning spam.

Sorry @ZeroIntensity, the only thing this PR addresses is an unraisable exception raised by Exception ignored in: <function _SelectorTransport.__del__ at ...>

I've manage to write a unit-test that captures this exception in 001c286. I also added a conn.close in _accept_connection2 to reduce the number of ResourceWarnings (I initially omitted this because I thought it was unrelated to the PR).

Let me know if the unraisable exception is a trivial matter that doesn't need to be cleaned-up. Happy to close the PR if that's the case.

I think we should do a broader change - instead of relying on "implicit" notion of the actual state the Server is in by checking some attributes if they are None or not, we should add an explicit self._state enum member with states like INITIALIZED, SERVING, CLOSED, etc.

If the team prefers, I'm happy to close this PR and instead work on a more robust change like @1st1 mentioned. LMK!

ordinary-jamie · Nov 18, 2024

Lib/test/test_asyncio/test_server.py

+        # When the server is closed before a connection is handled but after the
+        # connection is accepted, then a race-condition exists between the handler
+        # transport and the server, both which will attempt to wakeup the server to set
+        # any server waiters. We can recreate race-condition by opening a connection and
+        # waiting for the server reader callback before closing the server
+        loop = asyncio.get_running_loop()
+        srv_reader, _ = loop._selector.get_key(srv_sock.fileno()).data
+        conn_task = asyncio.create_task(asyncio.open_connection(addr[0], addr[1]))
+        for _ in range(10):
+            await asyncio.sleep(0)
+            if srv_reader in loop._ready:
+                break


Sorry, forgot about Windows here.

If I set this to a fixed 3 yields without relying on if srv_reader in loop._ready, then the test is flaky (sometimes the error won't be observed and the test will have a false-positive)

The first yield is to let the client connection to start the connection, the second for reader to be selected, and the last for the server to accept the connection. But, I'm not sure why the accept connection tasks sometimes happens much later, and hence the false-positive.

ZeroIntensity · Nov 18, 2024

Thanks for adding a test! I've temporarily marked this as DO-NOT-MERGE while we figure things out :)

I'll look closer at this again tomorrow morning (it's 9:30 PM here). In the meantime, do you have a reproducer for the unraisable exception? Everything I've seen so far in this PR and the issue haven't emitted any warnings or errors on my end. (It's possible that my system is just dodging the race condition by being either too fast or too slow--I'll investigate on a few different machines later.)

If the team prefers, I'm happy to close this PR and instead work on a more robust change like @1st1 mentioned.

Assuming asyncio is just being weird on my end and this PR is really fixing a problem, then we can merge it as-is and backport it, and then put a broader/feature-ish fix on main only.

ordinary-jamie · Nov 18, 2024

In the meantime, do you have a reproducer for the unraisable exception? Everything I've seen so far in this PR and the issue haven't emitted any warnings or errors on my end.

Sorry, I don't have anything other than the test on main and the reproducer in the bug ticket show me the unraisable exception. I will try investigate where the variations in the task scheduling are happening.

Test on main (acbd5c9)

test_close_race (test.test_asyncio.test_server.TestServer2.test_close_race) ... FAIL

======================================================================
FAIL: test_close_race (test.test_asyncio.test_server.TestServer2.test_close_race)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/cpython/Lib/asyncio/runners.py", line 127, in run
    return self._loop.run_until_complete(task)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/cpython/Lib/asyncio/base_events.py", line 720, in run_until_complete
    return future.result()
           ~~~~~~~~~~~~~^^
  File "/cpython/Lib/test/test_asyncio/test_server.py", line 310, in test_close_race
    self.assertIsNone(cm.unraisable)
    ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^
AssertionError: UnraisableHookArgs(exc_type=<class 'TypeError'>, exc_value=TypeError("'NoneType' object is not iterable"), exc_traceback=<traceback object at 0x1072792c0>, err_msg=None, object=<function _SelectorTransport.__del__ at 0x10689e450>) is not None

----------------------------------------------------------------------
Ran 1 test in 0.024s

FAILED (failures=1)

Reproducer from bug ticket

<sys>:0: ResourceWarning: unclosed <socket.socket fd=7, family=2, type=1, proto=0, laddr=('127.0.0.1', 4445), raddr=('127.0.0.1', 50638)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
<sys>:0: ResourceWarning: unclosed <socket.socket fd=8, family=2, type=1, proto=0, laddr=('127.0.0.1', 4445), raddr=('127.0.0.1', 50639)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/cpython/Lib/asyncio/selector_events.py:869: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=7>
  _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Exception ignored in: <function _SelectorTransport.__del__ at 0x10415f710>
Traceback (most recent call last):
  File "/cpython/Lib/asyncio/selector_events.py", line 872, in __del__
    self._server._detach(self)
  File "/cpython/Lib/asyncio/base_events.py", line 303, in _detach
    self._wakeup()
  File "/cpython/Lib/asyncio/base_events.py", line 308, in _wakeup
    for waiter in waiters:
TypeError: 'NoneType' object is not iterable
/cpython/Lib/asyncio/selector_events.py:869: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=8>
  _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Exception ignored in: <function _SelectorTransport.__del__ at 0x10415f710>
Traceback (most recent call last):
  File "/cpython/Lib/asyncio/selector_events.py", line 872, in __del__
    self._server._detach(self)
  File "/cpython/Lib/asyncio/base_events.py", line 303, in _detach
    self._wakeup()
  File "/cpython/Lib/asyncio/base_events.py", line 308, in _wakeup
    for waiter in waiters:
TypeError: 'NoneType' object is not iterable

This is my system information:

uname -a: Darwin MacBookPro.modem 24.0.0 Darwin Kernel Version 24.0.0: Tue Sep 24 23:39:07 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T6000 arm64
./python.exe --version: Python 3.14.0a1+
git rev-parse HEAD: acbd5c9c6c62dac34d2ed1a789d36fe61841c16d

ZeroIntensity

Thanks for all your hard work and responsiveness! Since the reproducers don't seem to work reliably on Linux, and this PR actually causes somewhat of a regression in terms of ResourceWarning spam, I'd like to go for a broader change like Yury suggested. (I'm fine with implementing it in this PR instead of opening a new one, FWIW.)

Running the test locally on my end:

import asyncio
import gc

async def test_close_race():
    srv = await asyncio.start_server(lambda *_: None, '0.0.0.0', 5000)
    srv_sock = srv.sockets[0]
    addr = srv_sock.getsockname()

    # When the server is closed before a connection is handled but after the
    # connection is accepted, then a race-condition exists between the handler
    # transport and the server, both which will attempt to wakeup the server to set
    # any server waiters. We can recreate race-condition by opening a connection and
    # waiting for the server reader callback before closing the server
    loop = asyncio.get_running_loop()
    srv_reader, _ = loop._selector.get_key(srv_sock.fileno()).data
    conn_task = asyncio.create_task(asyncio.open_connection(addr[0], addr[1]))
    for _ in range(10):
        await asyncio.sleep(0)
        if srv_reader in loop._ready:
            break

    # Ensure accepted connection task is scheduled by the server reader, but not
    # completed, before closing the server.
    await asyncio.sleep(0)
    srv.close()

    # Complete the client connection to close the socket. Suppress errors in the
    # handler transport due to failing to attach to closed server.
    try:
        _, wr = await conn_task
    except OSError as e:
        print(e)  # Just to see something for insight

    await srv.wait_closed()

    # Verify the handler transport does not raise an error due to multiple calls to
    # the server wakeup. Suppress expected ResourceWarnings from the handler
    # transport failing to attach to the closed server.
    gc.collect()

asyncio.run(test_close_race())

It seems that I've been wrong about the affected versions. Python 3.12.3 (a little bit outdated but I don't think it matters) outputs nothing, but 3.13.0 and the main branch do indeed cause an error:

Exception ignored in: <function _SelectorTransport.__del__ at 0x7b23f43aca40>
Traceback (most recent call last):
  File "/usr/lib/python3.13/asyncio/selector_events.py", line 872, in __del__
    self._server._detach(self)
  File "/usr/lib/python3.13/asyncio/base_events.py", line 304, in _detach
    self._wakeup()
  File "/usr/lib/python3.13/asyncio/base_events.py", line 309, in _wakeup
    for waiter in waiters:
TypeError: 'NoneType' object is not iterable

With this PR, I now see this:

/root/cpython/Lib/asyncio/selector_events.py:871: ResourceWarning: unclosed transport <_SelectorSocketTransport fd=8 read=idle write=<idle, bufsize=0>>
  _warn(f"unclosed transport {self!r}", ResourceWarning, source=self)
ResourceWarning: Enable tracemalloc to get the object allocation traceback
/root/cpython/Lib/asyncio/streams.py:410: ResourceWarning: unclosed <StreamWriter transport=<_SelectorSocketTransport closing fd=7 read=idle write=<idle, bufsize=0>> reader=<StreamReader eof transport=<_SelectorSocketTransport closing fd=7 read=idle write=<idle, bufsize=0>>>>
  warnings.warn(f"unclosed {self!r}", ResourceWarning)

Which is better! But my main concern (and why I'm requesting changes) is because this causes ResourceWarnings to be shown where there was no error or warning before (such as with the reproducer in the PR description--that doesn't emit anything on the current main, but it does with this change). Please ping me if you need help :)

@1st1 Were you able to get the reproducer in GH-109564 or the PR description to work locally?

1st1 · Nov 25, 2024

No I haven't tried reproducing this bug. I'm away this week, can try doing that next week.

ordinary-jamie added 2 commits November 11, 2024 11:14

fix: Add None check for

a555aa0

fix: Add None check for

6dcee6b

ordinary-jamie requested review from 1st1, asvetlov, kumaraditya303 and willingc as code owners November 11, 2024 03:28

bedevere-app bot added the awaiting review label Nov 11, 2024

bedevere-app bot mentioned this pull request Nov 11, 2024

Race condition in asyncio.Server.close #109564

Open

docs: Fix up referencing

a5de93e

ZeroIntensity reviewed Nov 12, 2024

View reviewed changes

Misc/NEWS.d/next/Library/2024-11-11-11-13-30.gh-issue-109564._f7W0v.rst Outdated Show resolved Hide resolved

ZeroIntensity requested a review from graingert November 12, 2024 21:33

graingert reviewed Nov 12, 2024

View reviewed changes

Update Misc/NEWS.d/next/Library/2024-11-11-11-13-30.gh-issue-109564._…

a8d77f2

…f7W0v.rst Co-authored-by: Peter Bierma <zintensitydev@gmail.com>

1st1 reviewed Nov 14, 2024

View reviewed changes

ordinary-jamie added 3 commits November 16, 2024 14:55

Merge branch 'asyncio-server-wakeup-none-check' of https://github.com…

2eb9556

…/ordinary-jamie/cpython into asyncio-server-wakeup-none-check

fix: Add comments and use existing waiter ref

d963237

fix: revert unintended changes

29c3232

1st1 approved these changes Nov 17, 2024

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting review labels Nov 17, 2024

ZeroIntensity reviewed Nov 17, 2024

View reviewed changes

feat: Add unit-test to capture unraised exception

001c286

ordinary-jamie commented Nov 18, 2024

View reviewed changes

ZeroIntensity added DO-NOT-MERGE topic-asyncio needs backport to 3.12 only security fixes needs backport to 3.13 bugs and security fixes labels Nov 18, 2024

ZeroIntensity requested changes Nov 19, 2024

View reviewed changes

ordinary-jamie mentioned this pull request Mar 9, 2025

GH-109564: add asyncio.Server state machine #131009

Open

tomasr8 removed the needs backport to 3.12 only security fixes label Apr 10, 2025

serhiy-storchaka added the needs backport to 3.14 bugs and security fixes label May 8, 2025

Search code, repositories, users, issues, pull requests...

Uh oh!

gh-109564: Add None check for asyncio.Server._wakeup #126660

Are you sure you want to change the base?

gh-109564: Add None check for asyncio.Server._wakeup #126660

Conversation

ordinary-jamie commented Nov 11, 2024 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example error of missing None check

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

graingert Nov 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ordinary-jamie Nov 16, 2024

Choose a reason for hiding this comment

Uh oh!

1st1 Nov 14, 2024

Choose a reason for hiding this comment

Uh oh!

ordinary-jamie Nov 16, 2024

Choose a reason for hiding this comment

Uh oh!

ordinary-jamie commented Nov 16, 2024

Additional errors

Uh oh!

1st1 left a comment

Choose a reason for hiding this comment

Uh oh!

1st1 commented Nov 17, 2024

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

ordinary-jamie commented Nov 18, 2024

Uh oh!

ordinary-jamie Nov 18, 2024

Choose a reason for hiding this comment

Uh oh!

ZeroIntensity commented Nov 18, 2024

Uh oh!

ordinary-jamie commented Nov 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ZeroIntensity left a comment

Choose a reason for hiding this comment

Uh oh!

1st1 commented Nov 25, 2024

Uh oh!

Uh oh!

gh-109564: Add `None` check for `asyncio.Server._wakeup` #126660

gh-109564: Add `None` check for `asyncio.Server._wakeup` #126660

ordinary-jamie commented Nov 11, 2024 •

edited by bedevere-app bot

Loading

Example error of missing `None` check

graingert Nov 12, 2024 •

edited

Loading

ordinary-jamie commented Nov 18, 2024 •

edited

Loading