bpo-44525: Copy freevars in bytecode to allow calls to inner functions to be specialized#29595
bpo-44525: Copy freevars in bytecode to allow calls to inner functions to be specialized#29595markshannon merged 8 commits intopython:mainpython/cpython:mainfrom faster-cpython:copy-free-vars-in-bytecodefaster-cpython/cpython:copy-free-vars-in-bytecodeCopy head branch name to clipboard
Conversation
|
🤖 New build scheduled with the buildbot fleet by @markshannon for commit 4fbaa99 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
ericsnowcurrently
left a comment
There was a problem hiding this comment.
Mostly LGTM.
I'm pretty sure there are a few slight (and very unlikely) races here. I also have a couple questions about the purpose of f_globals and f_builtins on the frame.
| PyObject *f_globals; /* Borrowed reference */ | ||
| PyObject *f_builtins; /* Borrowed reference */ |
There was a problem hiding this comment.
So, we avoid extra refcount operations for these, since f_func is guaranteed to hold a ref, right? Cool!
The only problem I see is that the function's func_globals (or func_builtins) could get swapped out in another thread while this frame is executing. Then we'd lose that borrowed reference and potentially segfault. It's an unlikely case, but still a possibility.
There was a problem hiding this comment.
Having these on struct _interpreter_frame is helpful because we avoid extra pointer indirection, right?
However, that only matters in cases where these are used relatively frequently (so the cost of the indirection adds up), i.e. in the eval loop (via GLOBALS()). If we do not expect frame->f_globals to change during the eval loop then could we move both to local variables before starting the loop (and drop them from struct _interpreter_frame altogether)?
Note that then we would have a different possible race where the function's func_globals could get swapped out while the frame is running, leading to weirdness.
There was a problem hiding this comment.
Also, why don't we similarly add f_closure to the frame? Is it because we want to avoid increasing the size of struct _interpreter_frame (and we don't actually use the closure that much, so the cost of the indirection is bearable)?
There was a problem hiding this comment.
Also, why don't we similarly add
f_closureto the frame? Is it because we want to avoid increasing the size ofstruct _interpreter_frame(and we don't actually use the closure that much, so the cost of the indirection is bearable)?
We might want to avoid creating a closure object at all. We could put the cells at the end of the function object, rather than in a separate tuple. but that's future work.
There was a problem hiding this comment.
Looks like you skipped over the two other comments here.
There was a problem hiding this comment.
The frame has a strong reference to the function, which has a strong reference to both the globals and builtins.
So borrowed references to the globals and builtins are OK.
If we do not expect frame->f_globals to change during the eval loop then could we move both to local variables before starting the loop (and drop them from struct _interpreter_frame altogether)?
And where would you store those references across calls?
There was a problem hiding this comment.
If we keep f_func on the frame then that's where we'd get them.
|
When you're done making the requested changes, leave the comment: |
|
🤖 New build scheduled with the buildbot fleet by @markshannon for commit 51f2e49 🤖 If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again. |
…nctions to be specialized (pythonGH-29595)
…nctions to be specialized (pythonGH-29595) * Make internal APIs that take PyFrameConstructor take a PyFunctionObject instead. * Add reference to function to frame, borrow references to builtins and globals. * Add COPY_FREE_VARS instruction to allow specialization of calls to inner functions.
…er (#87) This PR skips frame annotation completeness tests for Python >= 3.11. Python 3.11 added a reference to the function from the frame (see python/cpython#29595), but didn't expose that reference at Python level.
This should speedup calls to functions, by allowing us to specialize calls to functions with free variables without penalizing calls to other functions.
However, there is a bit of yak-shaving involved:
evaland friends, rather than just passing aFrameDescriptor.Although the speedup is probably not significant this is worthwhile as it makes more operations explicit in the bytecode which what we want when optimizing.
https://bugs.python.org/issue44525