gh-115999: Add partial free-thread specialization for BINARY_SUBSCR#127227
gh-115999: Add partial free-thread specialization for BINARY_SUBSCR#127227corona10 merged 7 commits intopython:mainpython/cpython:mainfrom corona10:gh-115999-binary-subscrcorona10/cpython:gh-115999-binary-subscrCopy head branch name to clipboard
Conversation
|
I am going to write test codes at this week. |
| ht->_spec_cache.getitem = descriptor; | ||
| ht->_spec_cache.getitem_version = version; |
There was a problem hiding this comment.
It should make to be thread-safe by using LOCK_TYPE(), but it will be handled with separated PRs.
1dacf5d to
1a901a3
Compare
|
@mpage Would you like to take a look? |
mpage
left a comment
There was a problem hiding this comment.
Thanks for doing this! I left one comment inline, but otherwise looks good to me. I kicked off benchmark runs for default/free-threaded builds and will post the results once they're finished.
|
@mpage Done thanks! |
|
FYI, aarch64 linux CI failure is not related to this PR, #127332 will fix the issue. |
|
I should have caught this last time, but I think we want to use the fast-path for list accesses in |
You mean that the fast-path that can avoid locking object? |
Yep |
|
What about checking |
Yes, those are needed for correctness. I think for now we probably want to have two slightly different code paths for the free-threaded and default builds. The default build should continue to use the existing logic to fetch the item from the list. The free-threaded builds should use |
|
Matt, I define a new |
Python/bytecodes.c
Outdated
| Py_ssize_t index = ((PyLongObject*)sub)->long_value.ob_digit[0]; | ||
| #ifdef Py_GIL_DISABLED | ||
| STAT_INC(BINARY_SUBSCR, hit); | ||
| PyObject *res_o = _PyList_GetItemRef(list, index); |
There was a problem hiding this comment.
The default build of list_get_item_ref is almost same as current implementation,
we might able to this implementation to the default build also.
Lines 352 to 361 in 328187c
There was a problem hiding this comment.
It would certainly be cleaner, but it might come with some performance cost in the default build.
mpage
left a comment
There was a problem hiding this comment.
Looking good! I kicked off new benchmark runs and will post the results here once they're finished. They take a few hours to run and there are other runs queued ahead of them, so that will probably be sometime tomorrow.
There are a couple tests that should be enabled using requires_specialization_ft now that we're specializing BINARY_SUBSCR in free-threaded builds:
- test_opcache.TestRacesDoNotCrash.test_binary_subscr_list_int
- test_dis.DisTests.test_binary_subscr_specialize - This should probably be deleted. It's covered by the tests that you've added and we want to move the specialization tests out of
test_dis.
mpage
left a comment
There was a problem hiding this comment.
Nice! Most recent performance results look good:
- Neutral on the default build.
- ~1% improvement on free-threaded builds.
Include/internal/pycore_list.h
Outdated
|
|
||
| PyAPI_FUNC(PyObject*) _PyList_Extend(PyListObject *, PyObject *); | ||
| extern void _PyList_DebugMallocStats(FILE *out); | ||
| // _PyList_GetItemRef should be used only when the object is known to be a list. |
There was a problem hiding this comment.
I think it would also be worth mentioning that this does not raise errors whereas PyList_GetItemRef does.
|
It seems like the change introduced a regression: #127521. |
--disable-gilbuilds #115999STORE_SUBSCR#127169 would be merged.PyDict_GetItemRefis thread-safe internally.CALLinstructions in free-threaded builds #127123 is merged, I should follow several strategies from here.