Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

[C API] Replace PyTuple_Pack(1,2) with PyTuple_Make[Single,Pair] to optimize creation of tuples #140052

Copy link
Copy link
@sergey-miryanov

Description

@sergey-miryanov
Issue body actions

Feature or enhancement

Proposal:

I ran benchmarks on pyperformance and found that tuples with one or two elements account for about 80% of the total.

Image

I checked the code and filled the following table with the number of occurrences of one- and two-element tuples:

Table with number of occurrences
file function count
_asyncmodule.c PyTuple_New(2) 2
_collectionsmodule.c PyTuple_Pack(1) 2
_csv.c PyTuple_Pack(1) 1
_datetimemodule.c PyTuple_Pack(2) 4
PyTuple_Pack(1) 3
_elementtree.c PyTuple_Pack(2) 4
_functoolsmodule.c PyTuple_New(2) 2
_interpretersmodule.c PyTuple_Pack(2) 1
_json.c PyTuple_New(2) 1
PyTuple_Pack(2) 1
PyTuple_Pack(1) 1
_operator.c PyTuple_Pack(2) 1
_pickle.c PyTuple_Pack(2) 3
PyTuple_New(2) 2
PyTuple_New(1) 2
_ssl.c PyTuple_New(2) 6
PyTuple_Pack(2) 1
_threadmodule.c PyTuple_New(2) 1
_tkinter.c PyTuple_Pack(1)
arraymodule.c PyTuple_New(2) 2
itertoolsmodule.c PyTuple_Pack(2) 2
PyTuple_New(2) 1
main.c PyTuple_Pack(2) 1
overlapped.c PyTuple_New(2) 2
posixmodule.c PyTuple_Pack(2) 1
pyexpat.c PyTuple_New(1) 1
selectmodule.c PyTuple_Pack(2) 1
PyTuple_New(2) 1
signal_module.c PyTuple_New(2) 1
socket_module.c PyTuple_Pack(2) 3
termios.c PyTuple_New(2) 2
_ctypes.c PyTuple_Pack(2) 2
stgdict.c PyTuple_Pack(2) 1
decimal.c PyTuple_Pack(2) 7
PyTuple_Pack(1) 2
microprotocol.c PyTuple_Pack(2) 2
_sre.c PyTuple_New(2) 1
datetime.c PyTuple_Pack(1) 2
PyTuple_Pack(2) 1
getargs.c PyTuple_Pack(1) 1
heaptype.c PyTuple_Pack(2) 1
PyTuple_Pack(1) 2
PyTuple_New(2) 1
vectorcall_limited.c PyTuple_New(1) 2
multibytecodec.c PyTuple_New(2) 1
codeobject.c PyTuple_Pack(2) 7
dictobject.c PyTuple_Pack(2) 2
PyTuple_New(2) 4
enumobject.c PyTuple_Pack(2) 1
PyTuple_New(2) 2
exceptions.c PyTuple_Pack(2) 7
floatobject.c PyTuple_Pack(2) 1
frameobject.c PyTuple_Pack(2) 2
PyTuple_Pack(1) 1
genericaliasobject.c PyTuple_Pack(1) 2
listobject.c PyTuple_Pack(2) 1
longobject.c PyTuple_Pack(2) 1
PyTuple_New(2) 2
odictobject.c PyTuple_Pack(2) 2
PyTuple_New(2) 1
setobject.c PyTuple_Pack(1) 1
typeobject.c PyTuple_Pack(2) 2
PyTuple_Pack(1) 5
typevarobject.c PyTuple_Pack(2) 1
PyTuple_Pack(1) 2
unicode_format.h PyTuple_Pack(2) 2
pegen_errors.c PyTuple_Pack(2) 2
_warnings.c PyTuple_Pack(2) 1
bltnmodule.c PyTuple_Pack(2) 1
ceval.c PyTuple_Pack(1) 1
_codegen.c PyTuple_Pack(1) 2
compile.c PyTuple_Pack(2) 1
crossinterp.c PyTuple_Pack(1) 1
errors.c PyTuple_Pack(1) 1
hamt.c PyTuple_Pack(2) 1
marshal.c PyTuple_Pack(2) 1
pylifecycle.c PyTuple_Pack(2) 1
Python-tokenize.c PyTuple_Pack(2) 1
sysmodule.c PyTuple_Pack(1) 1
tracemalloc.c PyTuple_New(2) 1

I came up with the idea of adding PyTuple_MakeSingle and PyTuple_MakePair for such cases to improve performance.

Afterwards, @eendebakpt sent me a link with a previous attempt at this (many thanks!) - #118222.

Anyway, I implemented these changes and ran benchmarks.

If we replace PyTuple_Pack(1,...) with PyTuple_MakeSingle and PyTuple_Pack(2,...) with PyTuple_MakePair then we get following results (ran on ubuntu 24.04 x64, compiled with lto):

Geometric mean - 1.00x faster
+--------------------------+----------+------------------------+
| Benchmark                | main     | opt                    |
+==========================+==========+========================+
| async_generators         | 277 ms   | 279 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| asyncio_websockets       | 242 ms   | 241 ms: 1.00x faster   |
+--------------------------+----------+------------------------+
| chaos                    | 36.0 ms  | 36.6 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| comprehensions           | 10.3 us  | 10.1 us: 1.02x faster  |
+--------------------------+----------+------------------------+
| bench_mp_pool            | 66.1 ms  | 43.0 ms: 1.54x faster  |
+--------------------------+----------+------------------------+
| coroutines               | 15.0 ms  | 14.5 ms: 1.03x faster  |
+--------------------------+----------+------------------------+
| coverage                 | 54.9 ms  | 56.1 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| crypto_pyaes             | 45.3 ms  | 46.1 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| deepcopy                 | 171 us   | 168 us: 1.02x faster   |
+--------------------------+----------+------------------------+
| deepcopy_reduce          | 1.87 us  | 1.84 us: 1.02x faster  |
+--------------------------+----------+------------------------+
| deepcopy_memo            | 16.5 us  | 17.5 us: 1.06x slower  |
+--------------------------+----------+------------------------+
| deltablue                | 1.97 ms  | 1.99 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| django_template          | 23.1 ms  | 23.4 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| docutils                 | 1.70 sec | 1.68 sec: 1.01x faster |
+--------------------------+----------+------------------------+
| fannkuch                 | 245 ms   | 243 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| float                    | 42.4 ms  | 43.4 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| gc_traversal             | 2.93 ms  | 2.82 ms: 1.04x faster  |
+--------------------------+----------+------------------------+
| generators               | 19.1 ms  | 19.5 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| genshi_text              | 14.2 ms  | 14.4 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| genshi_xml               | 32.8 ms  | 33.1 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| go                       | 68.7 ms  | 70.2 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| hexiom                   | 3.64 ms  | 3.61 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| json_dumps               | 6.34 ms  | 6.26 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| json_loads               | 15.7 us  | 16.1 us: 1.02x slower  |
+--------------------------+----------+------------------------+
| logging_silent           | 63.6 ns  | 59.4 ns: 1.07x faster  |
+--------------------------+----------+------------------------+
| logging_simple           | 3.49 us  | 3.52 us: 1.01x slower  |
+--------------------------+----------+------------------------+
| mako                     | 7.00 ms  | 6.94 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| mdp                      | 788 ms   | 780 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| meteor_contest           | 68.0 ms  | 68.2 ms: 1.00x slower  |
+--------------------------+----------+------------------------+
| nbody                    | 55.4 ms  | 55.0 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| pickle_dict              | 18.5 us  | 18.8 us: 1.02x slower  |
+--------------------------+----------+------------------------+
| pickle_list              | 2.93 us  | 2.98 us: 1.02x slower  |
+--------------------------+----------+------------------------+
| pickle_pure_python       | 208 us   | 205 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| pidigits                 | 143 ms   | 143 ms: 1.00x slower   |
+--------------------------+----------+------------------------+
| pprint_safe_repr         | 489 ms   | 496 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| pprint_pformat           | 997 ms   | 1.01 sec: 1.01x slower |
+--------------------------+----------+------------------------+
| pyflate                  | 259 ms   | 260 ms: 1.00x slower   |
+--------------------------+----------+------------------------+
| regex_compile            | 82.7 ms  | 83.5 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| regex_dna                | 115 ms   | 113 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| regex_v8                 | 14.7 ms  | 14.3 ms: 1.03x faster  |
+--------------------------+----------+------------------------+
| richards                 | 27.3 ms  | 27.1 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| richards_super           | 31.2 ms  | 31.1 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| scimark_fft              | 174 ms   | 178 ms: 1.02x slower   |
+--------------------------+----------+------------------------+
| scimark_lu               | 71.5 ms  | 69.3 ms: 1.03x faster  |
+--------------------------+----------+------------------------+
| scimark_monte_carlo      | 41.9 ms  | 41.2 ms: 1.02x faster  |
+--------------------------+----------+------------------------+
| scimark_sor              | 70.4 ms  | 72.0 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| scimark_sparse_mat_mult  | 2.67 ms  | 2.71 ms: 1.02x slower  |
+--------------------------+----------+------------------------+
| spectral_norm            | 59.2 ms  | 58.8 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| sqlglot_normalize        | 176 ms   | 179 ms: 1.01x slower   |
+--------------------------+----------+------------------------+
| sqlglot_optimize         | 34.0 ms  | 34.1 ms: 1.00x slower  |
+--------------------------+----------+------------------------+
| sqlglot_parse            | 801 us   | 792 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| sqlglot_transpile        | 1.01 ms  | 994 us: 1.01x faster   |
+--------------------------+----------+------------------------+
| sympy_expand             | 311 ms   | 306 ms: 1.02x faster   |
+--------------------------+----------+------------------------+
| sympy_sum                | 92.8 ms  | 92.4 ms: 1.00x faster  |
+--------------------------+----------+------------------------+
| sympy_str                | 177 ms   | 176 ms: 1.01x faster   |
+--------------------------+----------+------------------------+
| telco                    | 112 ms   | 111 ms: 1.00x faster   |
+--------------------------+----------+------------------------+
| tomli_loads              | 1.21 sec | 1.23 sec: 1.02x slower |
+--------------------------+----------+------------------------+
| typing_runtime_protocols | 107 us   | 109 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| unpack_sequence          | 25.2 ns  | 26.3 ns: 1.04x slower  |
+--------------------------+----------+------------------------+
| unpickle_list            | 2.93 us  | 3.01 us: 1.03x slower  |
+--------------------------+----------+------------------------+
| unpickle_pure_python     | 137 us   | 138 us: 1.01x slower   |
+--------------------------+----------+------------------------+
| xml_etree_iterparse      | 62.9 ms  | 62.0 ms: 1.01x faster  |
+--------------------------+----------+------------------------+
| xml_etree_generate       | 54.7 ms  | 55.4 ms: 1.01x slower  |
+--------------------------+----------+------------------------+
| xml_etree_process        | 39.0 ms  | 40.3 ms: 1.03x slower  |
+--------------------------+----------+------------------------+
| Geometric mean           | (ref)    | 1.00x faster           |
+--------------------------+----------+------------------------+

Benchmark hidden because not significant (19): 2to3, asyncio_tcp, asyncio_tcp_ssl, bench_thread_pool, dulwich_log, create_gc_cycles, html5lib, logging_format, nqueens, pathlib, pickle, python_startup, python_startup_no_site, raytrace, regex_effbot, sqlite_synth, sympy_integrate, unpickle, xml_etree_parse

I plan to implement PyTuple_Make[Single,Pair]Steal and also replace PyTuple_New(1,2).

Has this already been discussed elsewhere?

This is a minor feature, which does not need previous discussion elsewhere

Links to previous discussion of this feature:

No response

Linked PRs

Reactions are currently unavailable

Metadata

Metadata

Assignees

No one assigned

    Labels

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Morty Proxy This is a proxified and sanitized view of the page, visit original site.