gh-115999: Add free-threaded specialization for UNPACK_SEQUENCE#126600
gh-115999: Add free-threaded specialization for UNPACK_SEQUENCE#126600Eclips4 merged 7 commits intopython:mainpython/cpython:mainfrom Eclips4:ft-specialize-unpack-sequenceEclips4/cpython:ft-specialize-unpack-sequenceCopy head branch name to clipboard
UNPACK_SEQUENCE#126600Conversation
| def test_unpack_sequence(self): | ||
| def f(): | ||
| for _ in range(100): | ||
| a, b = 1, 2 |
There was a problem hiding this comment.
Sort of unrelated to this PR, but I'm surprised constant unpacking like this isn't peepholed into LOAD_CONST, STORE_FAST, LOAD_CONST, STORE_FAST. Maybe because it's actually a longer instruction sequence than the original? It certainly does less work.
@iritkatriel, thoughts?
There was a problem hiding this comment.
Thanks for doing this! The changes look good. I think the current implementations of UNPACK_SEQUENCE_{TUPLE,TWO_TUPLE} are already thread-safe since tuples are immutable. The implementation of UNPACK_SEQUENCE_LIST isn't thread-safe (there is nothing preventing another thread from adding or removing items from the list while the instruction is executing) so we're going to need a slightly different implementation in the free-threaded build.
I would suggest taking a critical section around the list size check and the code that pushes items onto the stack. Something like:
inst(UNPACK_SEQUENCE_LIST, (unused/1, seq -- values[oparg])) {
PyObject *seq_o = PyStackRef_AsPyObjectBorrow(seq);
DEOPT_IF(!PyList_CheckExact(seq_o));
int should_deopt = 0;
Py_BEGIN_CRITICAL_SECTION(seq_o);
should_deopt = PyList_GET_SIZE(seq_o) != oparg;
if (!should_deopt) {
STAT_INC(UNPACK_SEQUENCE, hit);
PyObject **items = _PyList_ITEMS(seq_o);
for (int i = oparg; --i >= 0; ) {
*values++ = PyStackRef_FromPyObjectNew(items[i]);
}
}
Py_END_CRITICAL_SECTION();
DEOPT_IF(should_deopt);
DECREF_INPUTS();
}
|
When you're done making the requested changes, leave the comment: |
Python/bytecodes.c
Outdated
| *values++ = PyStackRef_FromPyObjectNew(items[i]); | ||
| int should_deopt = 0; | ||
| Py_BEGIN_CRITICAL_SECTION(seq_o); | ||
| should_deopt = PyList_GET_SIZE(seq_o) != oparg; |
There was a problem hiding this comment.
Could this be rewritten as something like:
if (PyList_GET_SIZE(seq_o) != oparg) {
END_CRITICAL_SECTION
DEOPT_IF(true);
}There was a problem hiding this comment.
We can't use the macros because they introduce and close a new scope.
I don't think there's precedence in CPython for using the critical section functions manually (everything uses the macros), but if it's important that we maintain the structure you're suggesting we could do something like
#ifdef Py_GIL_DISABLED
PyCriticalSection _py_cs;
PyCriticalSection_Begin(&_py_cs, seq_o);
#endif
if (PyList_GET_SIZE(seq_o) != oparg) {
#ifdef Py_GIL_DISABLED
PyCriticalSection_End(&_py_cs);
#endif
DEOPT_IF(true);
}
STAT_INC(UNPACK_SEQUENCE, hit);
PyObject **items = _PyList_ITEMS(seq_o);
for (int i = oparg; --i >= 0; ) {
*values++ = PyStackRef_FromPyObjectNew(items[i]);
}
#ifdef Py_GIL_DISABLED
PyCriticalSection_End(&_py_cs);
#endif
DECREF_INPUTS();Note that the preprocessor guards are not necessary for correctness; the critical section functions are a no-op in default builds.
markshannon
left a comment
There was a problem hiding this comment.
This doesn't fit into the general pattern of specialized instructions:
- One or more guards
- Zero or one actions
See the docs for why this matters.
Has this been benchmarked?
Performance is neutral overall on the default build. Though strangely the results for the unpack sequence benchmark look like this change is significantly faster. I'm not sure how reliable that benchmark is, however. Performance is improved by 2% overall on the free-threaded build. |
|
@markshannon - Please have another look at this. |
Co-authored-by: mpage <mpage@cs.stanford.edu>
|
@Eclips4 - I think it would be fine to merge this PR. We can put up follow up PRs if any subsequent changes are required. |
…E` (python#126600) Add free-threaded specialization for `UNPACK_SEQUENCE` opcode. `UNPACK_SEQUENCE_TUPLE/UNPACK_SEQUENCE_TWO_TUPLE` are already thread safe since tuples are immutable. `UNPACK_SEQUENCE_LIST` is not thread safe because of nature of lists (there is nothing preventing another thread from adding items to or removing them the list while the instruction is executing). To achieve thread safety we add a critical section to the implementation of `UNPACK_SEQUENCE_LIST`, especially around the parts where we check the size of the list and push items onto the stack. --------- Co-authored-by: Matt Page <mpage@meta.com> Co-authored-by: mpage <mpage@cs.stanford.edu>
--disable-gilbuilds #115999