There are a few remaining scaling bottlenecks in the free-threaded build that we should fix.
I have been using the following benchmark to detect bottlenecks that were previously issues in older versions of the nogil forks:
https://gist.github.com/colesbury/429fe9f90036d43ad43576c3d357a12e
Note that for reliable results the above benchmark requires some setup:
Adjust NTHREADS if necessary on your system
Disable turbo boost or equivalent on your system
Avoid running on hyper-threading siblings (i.e., use taskset -c 0-<N> to choose separate physical cores)
Current bottlenecks
cmodule_function
load_string_const
load_tuple_const
create_closure
Underlying issues
Reference count contention on non-string constants. We will want to immortalize most constants in PyCodeObject.
Reference count contention on func.__qualname__ or code.co_qualname (when creating closure)
Reference count contention on module-level PyCFunctionObjects
Linked PRs
Reactions are currently unavailable
There are a few remaining scaling bottlenecks in the free-threaded build that we should fix.
I have been using the following benchmark to detect bottlenecks that were previously issues in older versions of the nogil forks:
https://gist.github.com/colesbury/429fe9f90036d43ad43576c3d357a12e
Note that for reliable results the above benchmark requires some setup:
NTHREADSif necessary on your systemtaskset -c 0-<N>to choose separate physical cores)Current bottlenecks
Underlying issues
PyCodeObject.func.__qualname__orcode.co_qualname(when creating closure)PyCFunctionObjectsLinked PRs
_Py_ID(__main__)for main module name #118528