You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There's room for tiny refactorings in the Objects/dictobject.c that may allow speedups, for example via better CPU cache behavior, compiler hints, cheap checks & fast exits for common cases. In my guess-timation, they may bring about upto 10% overall lookup/insert performance gains. That's more of an un-scientific intuition at best right now, but I'm laying some ideas out here:
compare_unicode_unicode(): can benefit from pointer-equality and branch prediction, and reducing unnecessary function calls
compare_generic(): can use early rejections on hash-mismatch, avoiding RichCompareBool calls
unicodekeys_lookup_unicode(): pointer-equality for the common case can precede do_lookup which today is invoked everytime for all cases
find_empty_slot(): early termination / separate path for first iteration can speed up insertions, especially for lightly-populated dicts
build_indices_generic / build_indices_unicode (): breaking out the first-probe success case may speed up dict resize operations
I welcome feedback on how to raise this proposal differently, but otherwise I'd like to work on these ideas and quantify them. I feel modern CPUs have deep pipelines and these optimizations may help us avoid some cache-thrashing. These changes are purely performance oriented and don't change functionality / API / ABI and should be thread-safe too.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Proposal:
There's room for tiny refactorings in the
Objects/dictobject.cthat may allow speedups, for example via better CPU cache behavior, compiler hints, cheap checks & fast exits for common cases. In my guess-timation, they may bring about upto 10% overall lookup/insert performance gains. That's more of an un-scientific intuition at best right now, but I'm laying some ideas out here:compare_unicode_unicode(): can benefit from pointer-equality and branch prediction, and reducing unnecessary function callscompare_generic(): can use early rejections on hash-mismatch, avoidingRichCompareBoolcallsunicodekeys_lookup_unicode(): pointer-equality for the common case can precededo_lookupwhich today is invoked everytime for all casesfind_empty_slot(): early termination / separate path for first iteration can speed up insertions, especially for lightly-populated dictsbuild_indices_generic / build_indices_unicode (): breaking out the first-probe success case may speed up dict resize operationsI welcome feedback on how to raise this proposal differently, but otherwise I'd like to work on these ideas and quantify them. I feel modern CPUs have deep pipelines and these optimizations may help us avoid some cache-thrashing. These changes are purely performance oriented and don't change functionality / API / ABI and should be thread-safe too.
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response