Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Appearance settings

bpo-27129: Use instruction offsets, not byte offsets, in bytecode and internally.#25069

Merged
markshannon merged 7 commits intopython:masterpython/cpython:masterfrom
faster-cpython:use-instruction-offsetsfaster-cpython/cpython:use-instruction-offsetsCopy head branch name to clipboard
Apr 1, 2021
Merged

bpo-27129: Use instruction offsets, not byte offsets, in bytecode and internally.#25069
markshannon merged 7 commits intopython:masterpython/cpython:masterfrom
faster-cpython:use-instruction-offsetsfaster-cpython/cpython:use-instruction-offsetsCopy head branch name to clipboard

Conversation

@markshannon
Copy link
Member

@markshannon markshannon commented Mar 29, 2021

  • Changes all offsets in jumps and branches to instruction from byte offsets. This doubles the range of instruction that can be targeted by an instruction.
  • Change the meaning of the C field f_lasti to match, which saves a bit of shifting in the interpreter.

https://bugs.python.org/issue27129

Python/ceval.c Outdated Show resolved Hide resolved
Objects/frameobject.c Outdated Show resolved Hide resolved
@gvanrossum gvanrossum self-requested a review March 29, 2021 16:36
Copy link
Contributor

@erlend-aasland erlend-aasland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing: I'd swap the two PyCode_Addr2Line's in Python/ceval.c with PyFrame_GetLineNumber for improved readability. (Unless the extra function call impacts performance.)

Python/ceval.c Outdated Show resolved Hide resolved
Python/ceval.c Outdated Show resolved Hide resolved
Copy link
Member

@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh... I didn't notice that when the wordcode work has been done. It's unfortunate that we didn't switch to instruction offset. I vaguely recall discussion about sizeof(_Py_CODEUNIT).

I support this change, here are some comments. My main worry is about PyCode_Addr2Line().

cc @serhiy-storchaka

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What a strange comment :-D I read it as "oh, Python 2.7 is too young, let's wait a little bit longer before we can use this shiny new function" !?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to treat the second argument as an instruction offset, rather than a byte offset. Rather than changing PyCode_Addr2Line() calls. It sounds unfortunate that the API allows to ask the address in the middle of a word (2 bytes).

It sounds common to pass f_lasti to PyCode_Addr2Line(): PyCode_Addr2Line(code, f_lasti) would still work, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depends on how you think the API is defined.
If you want the the line number from an offset derived from code.co_linetable, then the API should be unchanged.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyCode_Addr2Line(code, offset) must take a byte offset.
To getf_lasti from the frame, the only API to do so is something like PyObject_GetAttrString(frame, "f_lasti"), which would return a byte offset.

Also see:
https://www.python.org/dev/peps/pep-0626/#c-api
https://www.python.org/dev/peps/pep-0626/#out-of-process-debuggers-and-profilers

@vstinner
Copy link
Member

I wanted to ask to document properly that PyCode_Addr2Line() uses a byte offset, but the function is not documented! Can you please document the function at:
https://docs.python.org/dev/c-api/code.html

Please document f_lasti unit and the change at: https://docs.python.org/dev/library/inspect.html

@markshannon
Copy link
Member Author

The comment on PyCode_Addr2Line() says
"If you just need the line number of a frame, use PyFrame_GetLineNumber() instead."
PEP 626 provides a faster way to extract line numbers: https://www.python.org/dev/peps/pep-0626/#out-of-process-debuggers-and-profilers
So I think we should keep PyCode_Addr2Line() undocumented, unless you want to document it as "don't use this."

There are no changes to the inspect module.

@vstinner
Copy link
Member

So I think we should keep PyCode_Addr2Line() undocumented, unless you want to document it as "don't use this."

It's part of the public C API, so people uses it (for good or bad reasons). A search on the top 4000 PyPI projects give me:

vstinner@apu$ ./search.sh PyCode_Addr2Line
2021-02-18/uWSGI-2.0.19.1.tar.gz
2021-02-18/rook-0.1.138.tar.gz
2021-02-18/pyuwsgi-2.0.19.1.post0.tar.gz
2021-02-18/Pygments-2.8.0.tar.gz
2021-02-18/pydevd-2.2.0.tar.gz
2021-02-18/pydevd-pycharm-211.5538.22.tar.gz
2021-02-18/pyelftools-0.27.tar.gz
2021-02-18/mod_wsgi-4.7.1.tar.gz
2021-02-18/line_profiler-3.1.0.tar.gz
2021-02-18/graphene-federation-0.1.0.tar.gz
2021-02-18/faulthandler-3.2.tar.gz
2021-02-18/Cython-0.29.21.tar.gz

Hopefully, the list is short. The documentation should describes the behavior, and it's ok to strongly advice to use another function instead ;-)

Lib/importlib/_bootstrap_external.py Outdated Show resolved Hide resolved
Lib/dis.py Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not go a step further and divide the labels in dis output by 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of reasons.

  • I wanted to keep changes to a minimum.
  • There is no guarantee that instructions will remain the same size, so using byte offsets keeps things more consistent across versions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. However now we have a confusing discrepancy between what's displayed and how we're encouraged to think about the bytecode in the C code. I guess there's no perfect answer, so status quo wins.

@markshannon
Copy link
Member Author

@vstinner I'm fine with documenting PyCode_Addr2Line() but it hasn't been changed in this PR, so that is out of scope for this PR. I'll make another PR.

@markshannon markshannon force-pushed the use-instruction-offsets branch from 79b3b3b to 325d135 Compare March 31, 2021 10:12
@markshannon markshannon closed this Apr 1, 2021
@markshannon markshannon reopened this Apr 1, 2021
@markshannon markshannon closed this Apr 1, 2021
@markshannon markshannon reopened this Apr 1, 2021
@markshannon markshannon merged commit fcb55c0 into python:master Apr 1, 2021
@vstinner
Copy link
Member

vstinner commented Apr 1, 2021

@markshannon markshannon closed this 8 hours ago
@markshannon markshannon reopened this 8 hours ago

You should be allowed to click on the CI to "Re-run all jobs" to avoid closing/reopening the PR. Sadly, it's not possible to only re-run a single job.

sweeneyde added a commit to sweeneyde/cpython that referenced this pull request Apr 4, 2021
markshannon pushed a commit that referenced this pull request Apr 4, 2021
)

* Update magic numbers and bootstrapping for GH-25069

* add blurb

Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
@markshannon markshannon deleted the use-instruction-offsets branch April 14, 2021 13:26
esc added a commit to esc/numba that referenced this pull request Sep 14, 2021
Fixup Numba to address the changes introduced in cpython for Python 3.10
at python/cpython#25069 . But for Numba, we have
to alter the patch slightly as Numba rewrites the bytecode to use a
`_FIXED_OFFSET` with a value of `2`.
esc added a commit to esc/numba that referenced this pull request Nov 2, 2021
Fixup Numba to address the changes introduced in cpython for Python 3.10
at python/cpython#25069 . But for Numba, we have
to alter the patch slightly as Numba rewrites the bytecode to use a
`_FIXED_OFFSET` with a value of `2`.
esc added a commit to esc/numba that referenced this pull request Nov 12, 2021
Fixup Numba to address the changes introduced in cpython for Python 3.10
at python/cpython#25069 . But for Numba, we have
to alter the patch slightly as Numba rewrites the bytecode to use a
`_FIXED_OFFSET` with a value of `2`.
esc added a commit to esc/numba that referenced this pull request Nov 16, 2021
Fixup Numba to address the changes introduced in cpython for Python 3.10
at python/cpython#25069 . But for Numba, we have
to alter the patch slightly as Numba rewrites the bytecode to use a
`_FIXED_OFFSET` with a value of `2`.
esc added a commit to esc/numba that referenced this pull request Dec 7, 2021
Fixup Numba to address the changes introduced in cpython for Python 3.10
at python/cpython#25069 . But for Numba, we have
to alter the patch slightly as Numba rewrites the bytecode to use a
`_FIXED_OFFSET` with a value of `2`.
saulshanabrook added a commit to metadsl/metadsl that referenced this pull request Jan 5, 2022
saulshanabrook added a commit to metadsl/metadsl that referenced this pull request Jan 11, 2022
saulshanabrook added a commit to metadsl/python-code-data that referenced this pull request Jan 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Morty Proxy This is a proxified and sanitized view of the page, visit original site.