GH-136410: Faster side exits by markshannon · Pull Request #136411 · python/cpython

markshannon · Jul 8, 2025

This PR reinstates the _COLD_EXIT uop, but this time expects the exit to be passed, not the executor.
This way we only need one jitted stub, not hundreds.

Using stubs hugely simplifies _EXIT_TRACE as all it needs to do is jump to the exit's executor.

The x86-64 stencil for _EXIT_TRACE shrinks from 384 bytes to 36 bytes, although it does require one extra _CHECK_VALIDITY (19 bytes) to be added to each trace.
Since traces often contain multiple _EXIT_TRACEs this is a substantial space saving.

Issue: Switching between the JIT and interpreter is too slow. #136410

Fidget-Spinner · Jul 8, 2025

Python/optimizer.c

+    // from being immediately detected as cold and invalidated.
+    cold->vm_data.warm = true;
+    if (_PyJIT_Compile(cold, cold->trace, 1)) {
+        Py_DECREF(cold);


It's immortal. Decrefing won't do anything. I think you mean to call the dealloc function on it?

Good spot. I think I need to move making it immortal to after it is fully created.

markshannon · Jul 8, 2025

@pablogsal any ideas what this failure means? https://github.com/python/cpython/actions/runs/16140914160/job/45547999926?pr=136411

pablogsal · Jul 8, 2025

@pablogsal any ideas what this failure means? https://github.com/python/cpython/actions/runs/16140914160/job/45547999926?pr=136411

Is a race condition in the tests that will be fixed by #136347

markshannon · Jul 8, 2025

Performance is in the noise:

Linux x86 +0.4%
Windows x86 -0.1%
Mac ARM +0.2%

However, I wouldn't expect much of a speedup by reducing the size of exits, so this seems reasonable.

Fidget-Spinner · Jul 8, 2025

I think the benchmarking public mirror is down. Can you please share the link when it comes up again?

markshannon · Jul 10, 2025

Note to self: The cold exit executor should be freed when freeing the interpreter to avoid leaking memory.

sergey-miryanov · Jul 22, 2025

Python/optimizer.c

-        executor->exits[i].executor = NULL;
+        executor->exits[i].index = i;
        executor->exits[i].temperature = initial_temperature_backoff_counter();
+        executor->exits[i].executor = cold;


We put cold_executor here without INCREF because cold_executor is immortal. But in executor_clear we DECREF each executor from exits because there can be other executors as well. But this is imbalance of INCREF/DECREF. Maybe it is worth to add a comment here that we omit INCREF because of immortality of cold executor?

brandtbucher

A couple of notes. I like the general idea, not a huge fan of the jit_exit side-channel (but I get why we have it).

brandtbucher · Jul 30, 2025

Python/optimizer.c

+        Py_FatalError("Cannot allocate core JIT code");
+    }
+#endif
+    _Py_SetImmortal((PyObject *)cold);


Why does it need to be immortal? This means that we'll leak one of these per interpreter, along with about a page of JIT code. I think the interpreter can just hold a normal reference that we free at shutdown, right?

It is cleared when the interpreter is freed https://github.com/python/cpython/pull/136411/files#diff-7ac11e526f79b42d6ea9d3592cb99da46775640c69fa5510f4a6de87cced7141R818

Then why is it immortal? I'm worried it could get shared at some point, which is generally safe to do with immortal objects. An immortal object that gets cleared at interpreter shutdown seems like it could lead to hard-to-debug problems down the road.

Maybe just not make it immortal, and refcount it normally?

It is immortal: it outlives all mortal objects in the same interpreter. Reference counting it is just extra overhead.

Python/optimizer.c

brandtbucher · Jul 30, 2025

Python/bytecodes.c

+                }
+                exit->temperature = initial_temperature_backoff_counter();
+            }
+            assert(tstate->jit_exit == exit);


Maybe set it to NULL now?

We could do, but I'd rather leave it pointing to the last exit.
Cold exit relies on jit_exit being the last exit, and I can imagine us adding "compile me now" stubs later that would also rely on it.

Not a big deal, but I'd lean towards clearing it here. A possibly-dangling pointer seems dangerous, and we could assert it's NULL when setting it to make sure we didn't mess up somewhere.

It is only meaningful when starting an executor, and it will always have been set to a live executor at that point.
I'd like to keep it this way as it will simplify changing it to be passed in a register with TOS caching.

Python/optimizer.c

bedevere-app · Jul 30, 2025

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

markshannon · Jul 30, 2025

A couple of notes. I like the general idea, not a huge fan of the jit_exit side-channel (but I get why we have it).

Once we have TOS caching we can pass the exit in one of the cache registers. On occasion we will need to spill a value on the stack, but it should be mostly free.

markshannon · Jul 30, 2025

With the changes to _START_EXECUTOR to check for invalidation, _START_EXECUTOR has grown from 0 to 62 bytes (x86-64) which is still pretty good given the savings of 350 bytes per _EXIT_TRACE.

brandtbucher

A couple suggestions for possible improvements (and in the discussions above), but nothing blocking. This is a good change, thanks!

brandtbucher · Jul 31, 2025

Include/internal/pycore_uop_metadata.h

+    [_START_EXECUTOR] = HAS_DEOPT_FLAG | HAS_ESCAPES_FLAG,
    [_MAKE_WARM] = 0,
    [_FATAL_ERROR] = 0,
    [_DEOPT] = 0,
    [_ERROR_POP_N] = HAS_ARG_FLAG,
    [_TIER2_RESUME_CHECK] = HAS_DEOPT_FLAG,
+    [_COLD_EXIT] = HAS_ESCAPES_FLAG,


Can you mark _PyExecutor_ClearExit and _PyExecutor_FromExit as non-escaping?

...also, am I the only one around here who reviews generated code? ;)

Ah wait, _PyExecutor_ClearExit can escape.

That's annoying... the traces are going to have a _CHECK_VALIDITY op after the _START_EXECUTOR anyways, because the deopt path in _START_EXECUTOR can escape. But it will never actually reach the next instruction. Seems like something that should be addressed, actually.

Maybe just go back to checking in _EXIT_TRACE like we were before? It's going to get checked anyways, so might as well just do it there.

...also, am I the only one around here who reviews generated code? ;)

Maybe 🙂

Checking validity in _START_EXECUTOR is much more efficient than in _EXIT_TRACE as it is only a single predictable branch.

brandtbucher · Jul 31, 2025

Python/optimizer.c

+        _PyExecutorObject *e = executor->exits[i].executor;
+        executor->exits[i].executor = cold;
+        Py_DECREF(e);


Could use _PyExecutor_ClearExit here to avoid repeating the same logic twice.

But that couples the two functions, and they do have distinct uses.

markshannon · Jul 31, 2025

That's annoying... the traces are going to have a _CHECK_VALIDITY op after the _START_EXECUTOR anyways, because the deopt path in _START_EXECUTOR can escape.

It can't really escape, because it is on the deopt path. But that's a separate issue: #137276

…H-136411)

markshannon added 4 commits July 7, 2025 16:56

Faster side exits.

a281775

Fix copy-and-paste error

ec7f0e2

Check validity at start of trace

911520a

Tidy up

73832b2

markshannon requested review from ZeroIntensity, brandtbucher, ericsnowcurrently and savannahostrowski as code owners July 8, 2025 09:44

bedevere-app bot added the awaiting core review label Jul 8, 2025

bedevere-app bot mentioned this pull request Jul 8, 2025

Switching between the JIT and interpreter is too slow. #136410

Closed

markshannon added the skip news label Jul 8, 2025

markshannon added 3 commits July 8, 2025 10:56

Null test before use

a9297a0

Update asserts

47b53d1

Properly fix assert

2564d41

Fidget-Spinner reviewed Jul 8, 2025

View reviewed changes

Don't make object immortal until it is fully initialized

779bd7d

markshannon added 2 commits July 21, 2025 10:59

Merge branch 'main' into fast-side-exits

db336ce

Free cold executor when closing of interpreter

0b05acf

markshannon requested a review from diegorusso as a code owner July 21, 2025 10:10

markshannon added 3 commits July 21, 2025 14:47

Add NULL check

142bab6

Free cold executor after final GC

af696ea

Add empty _PyExecutor_Free for non-jit builds

264559a

sergey-miryanov reviewed Jul 22, 2025

View reviewed changes

brandtbucher requested changes Jul 30, 2025

View reviewed changes

bedevere-app bot removed the awaiting core review label Jul 30, 2025

bedevere-app bot added the awaiting changes label Jul 30, 2025

markshannon added 2 commits July 30, 2025 10:42

Clear exit if START_EXECUTOR detects that the executor is invalid

989a882

Add some asserts

656ada0

brandtbucher approved these changes Jul 31, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting changes labels Jul 31, 2025

markshannon merged commit e7b55f5 into python:main Aug 1, 2025
72 of 73 checks passed

bedevere-app bot removed the awaiting merge label Aug 1, 2025

markshannon deleted the fast-side-exits branch August 2, 2025 15:51

Agent-Hellboy pushed a commit to Agent-Hellboy/cpython that referenced this pull request Aug 19, 2025

pythonGH-136410: Faster side exits by using a cold exit stub (pythonG…

7ad8f37

…H-136411)

This was referenced Sep 1, 2025

JIT: assertion failure in _PyObject_GC_UNTRACK #137007

Closed

JIT: executor->vm_data.valid assertion failure in unlink_executor #136996

Open

Zheaoli mentioned this pull request Oct 4, 2025

JIT executors are not properly freed #139540

Open

Search code, repositories, users, issues, pull requests...

Uh oh!

Conversation

markshannon commented Jul 8, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fidget-Spinner Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markshannon commented Jul 8, 2025

Uh oh!

pablogsal commented Jul 8, 2025

Uh oh!

markshannon commented Jul 8, 2025

Uh oh!

Fidget-Spinner commented Jul 8, 2025

Uh oh!

markshannon commented Jul 10, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandtbucher left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bedevere-app bot commented Jul 30, 2025

Uh oh!

markshannon commented Jul 30, 2025

Uh oh!

markshannon commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

brandtbucher left a comment

Choose a reason for hiding this comment

Uh oh!

brandtbucher Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brandtbucher Jul 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

markshannon commented Jul 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

markshannon commented Jul 8, 2025 •

edited by bedevere-app bot

Loading

Fidget-Spinner Jul 8, 2025 •

edited

Loading

markshannon commented Jul 30, 2025 •

edited

Loading

brandtbucher Jul 31, 2025 •

edited

Loading

brandtbucher Jul 31, 2025 •

edited

Loading