Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

Commit df3ed54

Browse filesBrowse files
[3.11] gh-95913: Edit Faster CPython section in 3.11 WhatsNew (GH-98429) (GH-102490)
gh-95913: Edit Faster CPython section in 3.11 WhatsNew (GH-98429) (cherry picked from commit 80b19a3) Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
1 parent b6fd4e6 commit df3ed54
Copy full SHA for df3ed54

File tree

Expand file treeCollapse file tree

1 file changed

+109
-77
lines changed
Filter options
Expand file treeCollapse file tree

1 file changed

+109
-77
lines changed

‎Doc/whatsnew/3.11.rst

Copy file name to clipboardExpand all lines: Doc/whatsnew/3.11.rst
+109-77Lines changed: 109 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -1319,14 +1319,17 @@ This section covers specific optimizations independent of the
13191319
Faster CPython
13201320
==============
13211321

1322-
CPython 3.11 is on average `25% faster <https://github.com/faster-cpython/ideas#published-results>`_
1323-
than CPython 3.10 when measured with the
1322+
CPython 3.11 is an average of
1323+
`25% faster <https://github.com/faster-cpython/ideas#published-results>`_
1324+
than CPython 3.10 as measured with the
13241325
`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite,
1325-
and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup
1326-
could be up to 10-60% faster.
1326+
when compiled with GCC on Ubuntu Linux.
1327+
Depending on your workload, the overall speedup could be 10-60%.
13271328

1328-
This project focuses on two major areas in Python: faster startup and faster
1329-
runtime. Other optimizations not under this project are listed in `Optimizations`_.
1329+
This project focuses on two major areas in Python:
1330+
:ref:`whatsnew311-faster-startup` and :ref:`whatsnew311-faster-runtime`.
1331+
Optimizations not covered by this project are listed separately under
1332+
:ref:`whatsnew311-optimizations`.
13301333

13311334

13321335
.. _whatsnew311-faster-startup:
@@ -1339,8 +1342,8 @@ Faster Startup
13391342
Frozen imports / Static code objects
13401343
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
13411344

1342-
Python caches bytecode in the :ref:`__pycache__<tut-pycache>` directory to
1343-
speed up module loading.
1345+
Python caches :term:`bytecode` in the :ref:`__pycache__ <tut-pycache>`
1346+
directory to speed up module loading.
13441347

13451348
Previously in 3.10, Python module execution looked like this:
13461349

@@ -1349,8 +1352,9 @@ Previously in 3.10, Python module execution looked like this:
13491352
Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate
13501353
13511354
In Python 3.11, the core modules essential for Python startup are "frozen".
1352-
This means that their code objects (and bytecode) are statically allocated
1353-
by the interpreter. This reduces the steps in module execution process to this:
1355+
This means that their :ref:`codeobjects` (and bytecode)
1356+
are statically allocated by the interpreter.
1357+
This reduces the steps in module execution process to:
13541358

13551359
.. code-block:: text
13561360
@@ -1359,7 +1363,7 @@ by the interpreter. This reduces the steps in module execution process to this:
13591363
Interpreter startup is now 10-15% faster in Python 3.11. This has a big
13601364
impact for short-running programs using Python.
13611365

1362-
(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
1366+
(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in many issues.)
13631367

13641368

13651369
.. _whatsnew311-faster-runtime:
@@ -1372,17 +1376,19 @@ Faster Runtime
13721376
Cheaper, lazy Python frames
13731377
^^^^^^^^^^^^^^^^^^^^^^^^^^^
13741378

1375-
Python frames are created whenever Python calls a Python function. This frame
1376-
holds execution information. The following are new frame optimizations:
1379+
Python frames, holding execution information,
1380+
are created whenever Python calls a Python function.
1381+
The following are new frame optimizations:
13771382

13781383
- Streamlined the frame creation process.
13791384
- Avoided memory allocation by generously re-using frame space on the C stack.
13801385
- Streamlined the internal frame struct to contain only essential information.
13811386
Frames previously held extra debugging and memory management information.
13821387

1383-
Old-style frame objects are now created only when requested by debuggers or
1384-
by Python introspection functions such as ``sys._getframe`` or
1385-
``inspect.currentframe``. For most user code, no frame objects are
1388+
Old-style :ref:`frame objects <frame-objects>`
1389+
are now created only when requested by debuggers
1390+
or by Python introspection functions such as :func:`sys._getframe` and
1391+
:func:`inspect.currentframe`. For most user code, no frame objects are
13861392
created at all. As a result, nearly all Python functions calls have sped
13871393
up significantly. We measured a 3-7% speedup in pyperformance.
13881394

@@ -1403,10 +1409,11 @@ In 3.11, when CPython detects Python code calling another Python function,
14031409
it sets up a new frame, and "jumps" to the new code inside the new frame. This
14041410
avoids calling the C interpreting function altogether.
14051411

1406-
Most Python function calls now consume no C stack space. This speeds up
1407-
most of such calls. In simple recursive functions like fibonacci or
1408-
factorial, a 1.7x speedup was observed. This also means recursive functions
1409-
can recurse significantly deeper (if the user increases the recursion limit).
1412+
Most Python function calls now consume no C stack space, speeding them up.
1413+
In simple recursive functions like fibonacci or
1414+
factorial, we observed a 1.7x speedup. This also means recursive functions
1415+
can recurse significantly deeper
1416+
(if the user increases the recursion limit with :func:`sys.setrecursionlimit`).
14101417
We measured a 1-3% improvement in pyperformance.
14111418

14121419
(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.)
@@ -1417,7 +1424,7 @@ We measured a 1-3% improvement in pyperformance.
14171424
PEP 659: Specializing Adaptive Interpreter
14181425
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
14191426

1420-
:pep:`659` is one of the key parts of the faster CPython project. The general
1427+
:pep:`659` is one of the key parts of the Faster CPython project. The general
14211428
idea is that while Python is a dynamic language, most code has regions where
14221429
objects and types rarely change. This concept is known as *type stability*.
14231430

@@ -1426,17 +1433,18 @@ in the executing code. Python will then replace the current operation with a
14261433
more specialized one. This specialized operation uses fast paths available only
14271434
to those use cases/types, which generally outperform their generic
14281435
counterparts. This also brings in another concept called *inline caching*, where
1429-
Python caches the results of expensive operations directly in the bytecode.
1436+
Python caches the results of expensive operations directly in the
1437+
:term:`bytecode`.
14301438

14311439
The specializer will also combine certain common instruction pairs into one
1432-
superinstruction. This reduces the overhead during execution.
1440+
superinstruction, reducing the overhead during execution.
14331441

14341442
Python will only specialize
14351443
when it sees code that is "hot" (executed multiple times). This prevents Python
1436-
from wasting time for run-once code. Python can also de-specialize when code is
1444+
from wasting time on run-once code. Python can also de-specialize when code is
14371445
too dynamic or when the use changes. Specialization is attempted periodically,
1438-
and specialization attempts are not too expensive. This allows specialization
1439-
to adapt to new circumstances.
1446+
and specialization attempts are not too expensive,
1447+
allowing specialization to adapt to new circumstances.
14401448

14411449
(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler.
14421450
See :pep:`659` for more information. Implementation by Mark Shannon and Brandt
@@ -1449,32 +1457,32 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
14491457
| Operation | Form | Specialization | Operation speedup | Contributor(s) |
14501458
| | | | (up to) | |
14511459
+===============+====================+=======================================================+===================+===================+
1452-
| Binary | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
1453-
| operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, |
1454-
| | | fast paths for their underlying types. | | Brandt Bucher, |
1460+
| Binary | ``x + x`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, |
1461+
| operations | | such as :class:`int`, :class:`float` and :class:`str` | | Dong-hee Na, |
1462+
| | ``x - x`` | take custom fast paths for their underlying types. | | Brandt Bucher, |
14551463
| | | | | Dennis Sweeney |
1464+
| | ``x * x`` | | | |
14561465
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1457-
| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-25% | Irit Katriel, |
1458-
| | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon |
1459-
| | | data structures. | | |
1466+
| Subscript | ``a[i]`` | Subscripting container types such as :class:`list`, | 10-25% | Irit Katriel, |
1467+
| | | :class:`tuple` and :class:`dict` directly index | | Mark Shannon |
1468+
| | | the underlying data structures. | | |
14601469
| | | | | |
1461-
| | | Subscripting custom ``__getitem__`` | | |
1470+
| | | Subscripting custom :meth:`~object.__getitem__` | | |
14621471
| | | is also inlined similar to :ref:`inline-calls`. | | |
14631472
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
14641473
| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney |
14651474
| subscript | | | | |
14661475
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
14671476
| Calls | ``f(arg)`` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, |
1468-
| | ``C(arg)`` | as ``len`` and ``str`` directly call their underlying | | Ken Jin |
1469-
| | | C version. This avoids going through the internal | | |
1470-
| | | calling convention. | | |
1471-
| | | | | |
1477+
| | | as :func:`len` and :class:`str` directly call their | | Ken Jin |
1478+
| | ``C(arg)`` | underlying C version. This avoids going through the | | |
1479+
| | | internal calling convention. | | |
14721480
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1473-
| Load | ``print`` | The object's index in the globals/builtins namespace | [1]_ | Mark Shannon |
1474-
| global | ``len`` | is cached. Loading globals and builtins require | | |
1475-
| variable | | zero namespace lookups. | | |
1481+
| Load | ``print`` | The object's index in the globals/builtins namespace | [#load-global]_ | Mark Shannon |
1482+
| global | | is cached. Loading globals and builtins require | | |
1483+
| variable | ``len`` | zero namespace lookups. | | |
14761484
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1477-
| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [2]_ | Mark Shannon |
1485+
| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [#load-attr]_ | Mark Shannon |
14781486
| attribute | | index inside the class/object's namespace is cached. | | |
14791487
| | | In most cases, attribute loading will require zero | | |
14801488
| | | namespace lookups. | | |
@@ -1486,14 +1494,15 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
14861494
| Store | ``o.attr = z`` | Similar to load attribute optimization. | 2% | Mark Shannon |
14871495
| attribute | | | in pyperformance | |
14881496
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
1489-
| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | 8% | Brandt Bucher |
1490-
| Sequence | | and ``tuple``. Avoids internal calling convention. | | |
1497+
| Unpack | ``*seq`` | Specialized for common containers such as | 8% | Brandt Bucher |
1498+
| Sequence | | :class:`list` and :class:`tuple`. | | |
1499+
| | | Avoids internal calling convention. | | |
14911500
+---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
14921501

1493-
.. [1] A similar optimization already existed since Python 3.8. 3.11
1494-
specializes for more forms and reduces some overhead.
1502+
.. [#load-global] A similar optimization already existed since Python 3.8.
1503+
3.11 specializes for more forms and reduces some overhead.
14951504
1496-
.. [2] A similar optimization already existed since Python 3.10.
1505+
.. [#load-attr] A similar optimization already existed since Python 3.10.
14971506
3.11 specializes for more forms. Furthermore, all attribute loads should
14981507
be sped up by :issue:`45947`.
14991508
@@ -1503,49 +1512,72 @@ Bucher, with additional help from Irit Katriel and Dennis Sweeney.)
15031512
Misc
15041513
----
15051514

1506-
* Objects now require less memory due to lazily created object namespaces. Their
1507-
namespace dictionaries now also share keys more freely.
1515+
* Objects now require less memory due to lazily created object namespaces.
1516+
Their namespace dictionaries now also share keys more freely.
15081517
(Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.)
15091518

1519+
* "Zero-cost" exceptions are implemented, eliminating the cost
1520+
of :keyword:`try` statements when no exception is raised.
1521+
(Contributed by Mark Shannon in :issue:`40222`.)
1522+
15101523
* A more concise representation of exceptions in the interpreter reduced the
15111524
time required for catching an exception by about 10%.
15121525
(Contributed by Irit Katriel in :issue:`45711`.)
15131526

1527+
* :mod:`re`'s regular expression matching engine has been partially refactored,
1528+
and now uses computed gotos (or "threaded code") on supported platforms. As a
1529+
result, Python 3.11 executes the `pyperformance regular expression benchmarks
1530+
<https://pyperformance.readthedocs.io/benchmarks.html#regex-dna>`_ up to 10%
1531+
faster than Python 3.10.
1532+
(Contributed by Brandt Bucher in :gh:`91404`.)
1533+
15141534

15151535
.. _whatsnew311-faster-cpython-faq:
15161536

15171537
FAQ
15181538
---
15191539

1520-
| Q: How should I write my code to utilize these speedups?
1521-
|
1522-
| A: You don't have to change your code. Write Pythonic code that follows common
1523-
best practices. The Faster CPython project optimizes for common code
1524-
patterns we observe.
1525-
|
1526-
|
1527-
| Q: Will CPython 3.11 use more memory?
1528-
|
1529-
| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10.
1530-
This is offset by memory optimizations for frame objects and object
1531-
dictionaries as mentioned above.
1532-
|
1533-
|
1534-
| Q: I don't see any speedups in my workload. Why?
1535-
|
1536-
| A: Certain code won't have noticeable benefits. If your code spends most of
1537-
its time on I/O operations, or already does most of its
1538-
computation in a C extension library like numpy, there won't be significant
1539-
speedup. This project currently benefits pure-Python workloads the most.
1540-
|
1541-
| Furthermore, the pyperformance figures are a geometric mean. Even within the
1542-
pyperformance benchmarks, certain benchmarks have slowed down slightly, while
1543-
others have sped up by nearly 2x!
1544-
|
1545-
|
1546-
| Q: Is there a JIT compiler?
1547-
|
1548-
| A: No. We're still exploring other optimizations.
1540+
.. _faster-cpython-faq-my-code:
1541+
1542+
How should I write my code to utilize these speedups?
1543+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1544+
1545+
Write Pythonic code that follows common best practices;
1546+
you don't have to change your code.
1547+
The Faster CPython project optimizes for common code patterns we observe.
1548+
1549+
1550+
.. _faster-cpython-faq-memory:
1551+
1552+
Will CPython 3.11 use more memory?
1553+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1554+
1555+
Maybe not; we don't expect memory use to exceed 20% higher than 3.10.
1556+
This is offset by memory optimizations for frame objects and object
1557+
dictionaries as mentioned above.
1558+
1559+
1560+
.. _faster-cpython-ymmv:
1561+
1562+
I don't see any speedups in my workload. Why?
1563+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1564+
1565+
Certain code won't have noticeable benefits. If your code spends most of
1566+
its time on I/O operations, or already does most of its
1567+
computation in a C extension library like NumPy, there won't be significant
1568+
speedups. This project currently benefits pure-Python workloads the most.
1569+
1570+
Furthermore, the pyperformance figures are a geometric mean. Even within the
1571+
pyperformance benchmarks, certain benchmarks have slowed down slightly, while
1572+
others have sped up by nearly 2x!
1573+
1574+
1575+
.. _faster-cpython-jit:
1576+
1577+
Is there a JIT compiler?
1578+
^^^^^^^^^^^^^^^^^^^^^^^^
1579+
1580+
No. We're still exploring other optimizations.
15491581

15501582

15511583
.. _whatsnew311-faster-cpython-about:

0 commit comments

Comments
0 (0)
Morty Proxy This is a proxified and sanitized view of the page, visit original site.