GH-136410: Faster side exits#136411
GH-136410: Faster side exits#136411markshannon merged 15 commits intopython:mainpython/cpython:mainfrom faster-cpython:fast-side-exitsfaster-cpython/cpython:fast-side-exitsCopy head branch name to clipboard
Conversation
| // from being immediately detected as cold and invalidated. | ||
| cold->vm_data.warm = true; | ||
| if (_PyJIT_Compile(cold, cold->trace, 1)) { | ||
| Py_DECREF(cold); |
There was a problem hiding this comment.
It's immortal. Decrefing won't do anything. I think you mean to call the dealloc function on it?
There was a problem hiding this comment.
Good spot. I think I need to move making it immortal to after it is fully created.
|
@pablogsal any ideas what this failure means? https://github.com/python/cpython/actions/runs/16140914160/job/45547999926?pr=136411 |
Is a race condition in the tests that will be fixed by #136347 |
|
Performance is in the noise:
However, I wouldn't expect much of a speedup by reducing the size of exits, so this seems reasonable. |
|
I think the benchmarking public mirror is down. Can you please share the link when it comes up again? |
|
Note to self: The cold exit executor should be freed when freeing the interpreter to avoid leaking memory. |
| executor->exits[i].executor = NULL; | ||
| executor->exits[i].index = i; | ||
| executor->exits[i].temperature = initial_temperature_backoff_counter(); | ||
| executor->exits[i].executor = cold; |
There was a problem hiding this comment.
We put cold_executor here without INCREF because cold_executor is immortal. But in executor_clear we DECREF each executor from exits because there can be other executors as well. But this is imbalance of INCREF/DECREF. Maybe it is worth to add a comment here that we omit INCREF because of immortality of cold executor?
brandtbucher
left a comment
There was a problem hiding this comment.
A couple of notes. I like the general idea, not a huge fan of the jit_exit side-channel (but I get why we have it).
| Py_FatalError("Cannot allocate core JIT code"); | ||
| } | ||
| #endif | ||
| _Py_SetImmortal((PyObject *)cold); |
There was a problem hiding this comment.
Why does it need to be immortal? This means that we'll leak one of these per interpreter, along with about a page of JIT code. I think the interpreter can just hold a normal reference that we free at shutdown, right?
There was a problem hiding this comment.
It is cleared when the interpreter is freed https://github.com/python/cpython/pull/136411/files#diff-7ac11e526f79b42d6ea9d3592cb99da46775640c69fa5510f4a6de87cced7141R818
There was a problem hiding this comment.
Then why is it immortal? I'm worried it could get shared at some point, which is generally safe to do with immortal objects. An immortal object that gets cleared at interpreter shutdown seems like it could lead to hard-to-debug problems down the road.
Maybe just not make it immortal, and refcount it normally?
There was a problem hiding this comment.
It is immortal: it outlives all mortal objects in the same interpreter. Reference counting it is just extra overhead.
| } | ||
| exit->temperature = initial_temperature_backoff_counter(); | ||
| } | ||
| assert(tstate->jit_exit == exit); |
There was a problem hiding this comment.
We could do, but I'd rather leave it pointing to the last exit.
Cold exit relies on jit_exit being the last exit, and I can imagine us adding "compile me now" stubs later that would also rely on it.
There was a problem hiding this comment.
Not a big deal, but I'd lean towards clearing it here. A possibly-dangling pointer seems dangerous, and we could assert it's NULL when setting it to make sure we didn't mess up somewhere.
There was a problem hiding this comment.
It is only meaningful when starting an executor, and it will always have been set to a live executor at that point.
I'd like to keep it this way as it will simplify changing it to be passed in a register with TOS caching.
|
When you're done making the requested changes, leave the comment: |
Once we have TOS caching we can pass the exit in one of the cache registers. On occasion we will need to spill a value on the stack, but it should be mostly free. |
|
With the changes to |
brandtbucher
left a comment
There was a problem hiding this comment.
A couple suggestions for possible improvements (and in the discussions above), but nothing blocking. This is a good change, thanks!
| [_START_EXECUTOR] = HAS_DEOPT_FLAG | HAS_ESCAPES_FLAG, | ||
| [_MAKE_WARM] = 0, | ||
| [_FATAL_ERROR] = 0, | ||
| [_DEOPT] = 0, | ||
| [_ERROR_POP_N] = HAS_ARG_FLAG, | ||
| [_TIER2_RESUME_CHECK] = HAS_DEOPT_FLAG, | ||
| [_COLD_EXIT] = HAS_ESCAPES_FLAG, |
There was a problem hiding this comment.
Can you mark _PyExecutor_ClearExit and _PyExecutor_FromExit as non-escaping?
...also, am I the only one around here who reviews generated code? ;)
There was a problem hiding this comment.
Ah wait, _PyExecutor_ClearExit can escape.
That's annoying... the traces are going to have a _CHECK_VALIDITY op after the _START_EXECUTOR anyways, because the deopt path in _START_EXECUTOR can escape. But it will never actually reach the next instruction. Seems like something that should be addressed, actually.
Maybe just go back to checking in _EXIT_TRACE like we were before? It's going to get checked anyways, so might as well just do it there.
There was a problem hiding this comment.
...also, am I the only one around here who reviews generated code? ;)
Maybe 🙂
There was a problem hiding this comment.
Checking validity in _START_EXECUTOR is much more efficient than in _EXIT_TRACE as it is only a single predictable branch.
| _PyExecutorObject *e = executor->exits[i].executor; | ||
| executor->exits[i].executor = cold; | ||
| Py_DECREF(e); |
There was a problem hiding this comment.
Could use _PyExecutor_ClearExit here to avoid repeating the same logic twice.
There was a problem hiding this comment.
But that couples the two functions, and they do have distinct uses.
It can't really escape, because it is on the deopt path. But that's a separate issue: #137276 |
This PR reinstates the
_COLD_EXITuop, but this time expects the exit to be passed, not the executor.This way we only need one jitted stub, not hundreds.
Using stubs hugely simplifies
_EXIT_TRACEas all it needs to do is jump to the exit's executor.The x86-64 stencil for
_EXIT_TRACEshrinks from 384 bytes to 36 bytes, although it does require one extra_CHECK_VALIDITY(19 bytes) to be added to each trace.Since traces often contain multiple
_EXIT_TRACEs this is a substantial space saving.