Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

JIT can trigger assert(uop_buffer_remaining_space(trace) > space_needed); for widened uops #149335

Copy link
Copy link
@eendebakpt

Description

@eendebakpt
Issue body actions

Bug report

Bug description:

The assert(uop_buffer_remaining_space(trace) > space_needed); in _PyJit_translate_single_bytecode_to_trace can be triggered if a uop is widened. Found when the CI for #148271 was failing (there TO_BOOL gets an extra _RECORD_TOS_TYPE). @markshannon

Reporting the issue already, I will try to create a simple reproducer and more concise explanation.

Full explanation by Claude

JIT trace-buffer assertion: full explanation

The assertion

In _PyJit_translate_single_bytecode_to_trace (Python/optimizer.c), each bytecode appended to a tier-2 trace is gated by a Py_DEBUG assertion:

assert(uop_buffer_remaining_space(trace) > space_needed);

where

space_needed = expansion->nuops + needs_guard_ip + 2 + (!OPCODE_HAS_NO_SAVE_IP(opcode))

expansion->nuops is the number of uops the bytecode's macro expands to. The + 2 covers the _CHECK_VALIDITY uop emitted at the start of every bytecode plus a reserved _DEOPT slot for it. + needs_guard_ip covers the _GUARD_IP_* slot for opcodes that need it. + (!OPCODE_HAS_NO_SAVE_IP) covers the _SET_IP slot.

Just before the assertion, trace->end has already been decremented to reserve the side-exit/error/deopt/guard-ip stubs at the buffer's tail:

trace->end -= 2 + has_exit + has_error + has_deopt + needs_guard_ip

So if R is the buffer's true remaining space at the start of the bytecode, the assertion is checking R - tail > space_needed, i.e.

R > 4 + has_exit + has_error + has_deopt + 2*needs_guard_ip + nuops + has_save_ip

The fitness mechanism

_PyJit_translate_single_bytecode_to_trace also runs a fitness gate before the assertion:

if (fitness < eq) goto done;

eq is compute_exit_quality(opcode). For specializable opcodes (like TO_BOOL, CALL_*) it returns EXIT_QUALITY_SPECIALIZABLE, defined as FITNESS_INITIAL / 80. With FITNESS_INITIAL == UOP_MAX_TRACE_LENGTH == 1000 in debug builds, that's 12.

Per bytecode, fitness is debited by slots_used = remaining_before − remaining_after, i.e. exactly the number of buffer slots the bytecode just consumed (uops emitted + tail reservations made). So fitness and remaining_space are intended to track in lockstep, with branch/frame penalties only making fitness drop faster (which would only ever make the assertion safer).

The intended invariant is therefore fitness <= remaining_space. If that held, then fitness >= eq = 12 would imply remaining_space >= 12 too, and 12 slots is enough for every existing space_needed.

Where the invariant breaks

Two accounting holes push fitness above remaining_space:

  1. Trace prelude not charged. _PyJit_StartTracing emits _START_EXECUTOR and _MAKE_WARM into the buffer, costing 2 slots. Fitness is initialized to FITNESS_INITIAL (1000) right after, with no debit. So before the first bytecode: fitness = 1000, remaining = 998. Drift = +2.

  2. needs_guard_ip under-reserved. When needs_guard_ip is true (e.g. after _PUSH_FRAME), the code emits up to three uops (_RECORD_CODE, the guard_ip uop itself, and guard_code_version_uop if the called object is a code object). But space_needed only adds + needs_guard_ip (i.e. 1). Each such occurrence under-reserves by 1–2 slots, but those slots are still consumed by next++, so slots_used does include them and fitness drops correctly. The mismatch is between the predicted space_needed and the actual emission, not between fitness and buffer — but it tightens the assertion further.

I confirmed the drift directly. With a release build with UOP_MAX_TRACE_LENGTH hacked down to 1000 (matching debug) and a printf inserted at the trace-finalize site, a long branchless reproducer produced:

TRACE_LEN: length=666 remaining=14 fitness=18 UOP_MAX=1000

fitness = 18, remaining = 14 — fitness sits 4 slots higher than the buffer.

Why this PR exposes it

The PR for gh-143732 changes the TO_BOOL macro from

macro(TO_BOOL) = _SPECIALIZE_TO_BOOL + unused/2 + _TO_BOOL;

to

macro(TO_BOOL) = _SPECIALIZE_TO_BOOL + _RECORD_TOS_TYPE + unused/2 + _TO_BOOL;

That bumps expansion->nuops for TO_BOOL from 1 to 2.

Concrete space_needed for TO_BOOL (no exit/deopt/guard_ip, has_error, has_save_ip):

nuops needs_guard_ip +2 has_save_ip space_needed
main 1 0 2 1 4
PR 2 0 2 1 5

After tail reservations (trace->end -= 2 + has_error = 3), the post-decrement remaining must exceed space_needed. So the true-remaining R we need is:

tail space_needed R required
main 3 4 > 7
PR 3 5 > 8

Worst case at the fitness gate: fitness == 12 and the drift is fitness − remaining = +4, so R = 8. On main, 8 > 7 ✓. On the PR, 8 > 8 ✗ — assertion fires.

Empirical confirmation

Debug + JIT build, 30 runs of ./python -m test test_strtod --multiprocess 0 --verbose2 --verbose3:

Build TO_BOOL macro Pass Fail
main original 30 0
main + PR's macro change with _RECORD_TOS_TYPE 28 2

Failures are always in test_short_halfway_cases, which uses random.randrange, so the failing trace shape varies and the failure is flaky (~5–10%). The C backtrace lands in _PyJit_translate_single_bytecode_to_trace reached from the test's if e + q.bit_length() > max_exp: line — so the bytecode that actually trips the assertion is plausibly a wide CALL (e.g. CALL_METHOD_DESCRIPTOR_NOARGS for q.bit_length()), not TO_BOOL itself; widening TO_BOOL just shifts trace timing enough to expose it.

Bottom line

It's a thin-margin coupling, not a PR bug. The intended invariant fitness <= remaining_space doesn't actually hold today; in some traces fitness is ~4 slots above remaining. With EXIT_QUALITY_SPECIALIZABLE = 12, the worst-case assertion threshold is right at the boundary for the PR's wider TO_BOOL, so any future macro widening can resurface this. Fixes (in increasing principled-ness): (a) graceful goto done instead of asserting; (b) raise EXIT_QUALITY_SPECIALIZABLE to ~25 to leave room for the widest opcode plus the +4 drift; (c) fix the two accounting holes so the invariant actually holds.

CPython versions tested on:

CPython main branch

Operating systems tested on:

Linux

Linked PRs

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    interpreter-core(Objects, Python, Grammar, and Parser dirs)(Objects, Python, Grammar, and Parser dirs)topic-JITtype-bugAn unexpected behavior, bug, or errorAn unexpected behavior, bug, or error
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.