Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

ssl.SSLSocket.read() / write() missing ERR_clear_error() before SSL_read_ex() / SSL_write_ex() causes spurious errors with cooperative threading #148594

Copy link
Copy link
@kswia

Description

@kswia
Issue body actions

Bug report

Bug description:

Summary

_ssl__SSLSocket_read_impl and _ssl__SSLSocket_write_impl in Modules/_ssl.c do not call ERR_clear_error() before SSL_read_ex() / SSL_write_ex(). This allows stale entries on the per-thread OpenSSL error queue to corrupt the result of SSL_get_error(), causing spurious BrokenPipeError or OSError on healthy SSL connections.

Affected versions

All current CPython versions. Confirmed in 3.12 branch (line 2544) and main / 3.15-dev (line 2941).

Root cause

The do { ... } while() retry loop in _ssl__SSLSocket_read_impl (Modules/_ssl.c L2939-2942 on main):

do {
    Py_BEGIN_ALLOW_THREADS;
    retval = SSL_read_ex(self->ssl, mem, (size_t)len, &count);
    err = _PySSL_errno(retval == 0, self->ssl, retval);
    Py_END_ALLOW_THREADS;
    // ...
} while (err.ssl == SSL_ERROR_WANT_READ || err.ssl == SSL_ERROR_WANT_WRITE);

_PySSL_errno() calls SSL_get_error(ssl, retcode), which internally calls ERR_peek_last_error(). Per the OpenSSL documentation:

In addition to ssl and ret, SSL_get_error() inspects the current thread's OpenSSL error queue. Thus, SSL_get_error() must be called in the same thread that performed the TLS/SSL I/O operation, and no other OpenSSL function calls should appear in between. The current thread's error queue must be empty before the TLS/SSL I/O operation is attempted, or SSL_get_error() will not work reliably.

If stale error entries are present on the queue from a prior SSL operation (on the same thread but a different SSL object), SSL_get_error() misattributes them and returns SSL_ERROR_SYSCALL instead of the correct SSL_ERROR_WANT_READ.

The same issue exists in _ssl__SSLSocket_write_impl.

When this manifests

This bug is invisible in multi-threaded programs because each OS thread has its own OpenSSL error queue. It becomes critical in cooperative multitasking frameworks (gevent, eventlet, asyncio with SSL) where multiple coroutines/greenlets share a single OS thread and thus a single OpenSSL error queue.

Concrete scenario (gevent):

  1. Greenlet A performs an SSL write on an HTTPS connection. The remote client has disconnected, so SSL_write_ex()send() fails with EPIPE. OpenSSL pushes an error entry onto the (per-thread) error queue. The greenlet handles the exception, but the error queue is not cleared.
  2. The gevent hub switches to Greenlet B, which is an AMQP consumer doing SSL_read_ex() on a healthy RabbitMQ connection.
  3. SSL_read_ex()recv() returns EAGAIN (no data available — normal for a non-blocking socket).
  4. SSL_get_error() finds the stale error from step 1 via ERR_peek_last_error() and returns SSL_ERROR_SYSCALL instead of SSL_ERROR_WANT_READ.
  5. _PySSL_errno() captures errno = 32 (stale EPIPE from step 1).
  6. CPython exits the retry loop, enters PySSL_SetError(), and raises BrokenPipeError(errno=32, "Broken pipe") on a perfectly healthy connection.

Evidence

  • Disassembly: The compiled _ssl.cpython-312-x86_64-linux-gnu.so confirms no ERR_clear_error (PLT 0x9050) before SSL_read_ex (PLT 0x93b0) at the call site.
  • Production telemetry: At the moment of every BrokenPipeError, getsockopt(SO_ERROR) returns 0 (no kernel-level error), and tcpdump shows no FIN/RST from the remote side — the TCP connection is healthy.
  • Workaround validation: Calling ERR_clear_error() (via ctypes) before every _sslobj.read() in a monkey-patched ssl.SSLSocket.read() completely eliminates the spurious errors. Tested for 15+ minutes under production load with zero errors, after months of constant failures every ~90 seconds.

Proposed fix

Add ERR_clear_error() before SSL_read_ex() and SSL_write_ex() in their respective retry loops:

do {
    Py_BEGIN_ALLOW_THREADS;
    ERR_clear_error();  /* Prevent stale errors from affecting SSL_get_error() */
    retval = SSL_read_ex(self->ssl, mem, (size_t)len, &count);
    err = _PySSL_errno(retval == 0, self->ssl, retval);
    Py_END_ALLOW_THREADS;
    // ...

This matches OpenSSL's documented requirement and is consistent with how CPython already calls ERR_clear_error() in other SSL functions (e.g., _ssl__SSLSocket_do_handshake_impl, _ssl_ctx_new).

Related

Reproducer

A minimal reproducer requires two SSL connections on the same OS thread. In pseudocode:

import ssl, socket, gevent

def writer_greenlet():
    """SSL connection that will fail, leaving stale error on queue"""
    ctx = ssl.create_default_context()
    sock = ctx.wrap_socket(socket.socket(), server_hostname="...")
    sock.connect(...)
    # Remote side disconnects
    sock.write(b"data")  # raises BrokenPipeError — leaves stale OpenSSL error

def reader_greenlet():
    """Healthy SSL connection that reads — gets spurious BrokenPipeError"""
    ctx = ssl.create_default_context()
    sock = ctx.wrap_socket(socket.socket(), server_hostname="...")
    sock.connect(...)
    # This should block waiting for data, but instead raises BrokenPipeError
    sock.read(4096)  # BrokenPipeError on a HEALTHY connection

gevent.joinall([
    gevent.spawn(writer_greenlet),
    gevent.spawn(reader_greenlet),
])

Versions

  • CPython: 3.12.12, also present in main (3.15-dev, commit d14e31e)
  • OpenSSL: 3.5.1 (also reproducible with 3.0.x, 3.2.x)
  • OS: RHEL 10.1
  • gevent: 25.4.1 / 25.8.2

The pseudocode reproducer is schematic — in practice, the trigger requires precise greenlet switching timing. The production scenario (AMQP consumers + HTTPS server in gevent) triggers it reliably every ~90 seconds.

CPython versions tested on:

3.12

Operating systems tested on:

Linux

Linked PRs

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.13bugs and security fixesbugs and security fixes3.14bugs and security fixesbugs and security fixes3.15pre-release feature fixes, bugs and security fixespre-release feature fixes, bugs and security fixesextension-modulesC modules in the Modules dirC modules in the Modules dirtopic-SSLtype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.