gh-135676: Add a summary of source characters#138194
gh-135676: Add a summary of source characters#138194encukou merged 5 commits intopython:mainpython/cpython:mainfrom encukou:lex-analysis-highlevelencukou/cpython:lex-analysis-highlevelCopy head branch name to clipboard
Conversation
AA-Turner
left a comment
There was a problem hiding this comment.
I think this is a useful addition!
A
Doc/reference/lexical_analysis.rst
Outdated
| .. note:: | ||
|
|
||
| A ":dfn:`stream`" is a *sequence*, in the general sense of the word | ||
| (not necessarily a Python :term:`sequence object <sequence>`). |
There was a problem hiding this comment.
I'm not sure this note is needed?
There was a problem hiding this comment.
I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.
| .. list-table:: | ||
| :header-rows: 1 |
There was a problem hiding this comment.
In general for list tables it can be useful to alternate list markers, e.g. using - to denote items of the second-level list. Not essential, though.
There was a problem hiding this comment.
All my list-tables will do that from now on :)
| * * :ref:`String literal <strings>` | ||
|
|
||
| * * * ASCII letter (``a``-``z``, ``A``-``Z``) | ||
| * non-ASCII character |
There was a problem hiding this comment.
Is 'non-ASCII character' too broad here? Not all characters can form valid identifiers, especially if expanding to the full Unicode space!
There was a problem hiding this comment.
It is broad, but: if the tokenizer sees a non-ASCII character, the next token can only be a NAME (or error). (Except inside strings/comments, but then it's not deciding what the next token will be.)
If I remember correctly¹, the tokenizer implementation does lump non-ASCII characters with the letters, and only checks validity after it parses an identifier-like token.
¹ Maybe I don't, but it certainly could do that :)
Doc/reference/lexical_analysis.rst
Outdated
| .. note:: | ||
|
|
||
| A ":dfn:`stream`" is a *sequence*, in the general sense of the word | ||
| (not necessarily a Python :term:`sequence object <sequence>`). |
There was a problem hiding this comment.
I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.
Co-authored-by: Carol Willing <carolcode@willingconsulting.com>
|
Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14. |
(cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Carol Willing <carolcode@willingconsulting.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>
|
GH-139781 is a backport of this pull request to the 3.14 branch. |
|
Thank you for the reviews! |
…139781) (cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Carol Willing <carolcode@willingconsulting.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>
The lexical analysis docs have notes like this at the end:
The period can also occur in floating-point and imaginary literals.
The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer:
' " # \The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error:
$ ? `The intent behind these seems to be providing a "map" of what all the ASCII characters do in Python, but that map is incomplete as it is, and isn't really kept up to date.
This instead provides a summary of source characters -- nominally the ones that start tokens, with notes for other notable cases.
The table can also serve as an alternate "table of contents".
The presentation -- a table of bulleted lists -- is a bit wacky but I think it gets the job done.
📚 Documentation preview 📚: https://cpython-previews--138194.org.readthedocs.build/