gh-135676: Add a summary of source characters by encukou · Pull Request #138194 · python/cpython

encukou · Aug 27, 2025

The lexical analysis docs have notes like this at the end:

The period can also occur in floating-point and imaginary literals.
The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer: ' " # \
The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error: $ ? `

The intent behind these seems to be providing a "map" of what all the ASCII characters do in Python, but that map is incomplete as it is, and isn't really kept up to date.

This instead provides a summary of source characters -- nominally the ones that start tokens, with notes for other notable cases.
The table can also serve as an alternate "table of contents".

The presentation -- a table of bulleted lists -- is a bit wacky but I think it gets the job done.

Issue: Reword the Lexical Analysis chapter of the docs #135676

📚 Documentation preview 📚: https://cpython-previews--138194.org.readthedocs.build/

Doc/reference/lexical_analysis.rst

AA-Turner

I think this is a useful addition!

A

AA-Turner · Oct 8, 2025

Doc/reference/lexical_analysis.rst

+.. note::
+
+   A ":dfn:`stream`" is a *sequence*, in the general sense of the word
+   (not necessarily a Python :term:`sequence object <sequence>`).


I'm not sure this note is needed?

I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.

OK; I've removed it

AA-Turner · Oct 8, 2025

Doc/reference/lexical_analysis.rst

+.. list-table::
+   :header-rows: 1


In general for list tables it can be useful to alternate list markers, e.g. using - to denote items of the second-level list. Not essential, though.

All my list-tables will do that from now on :)

AA-Turner · Oct 8, 2025

Doc/reference/lexical_analysis.rst

+     * * :ref:`String literal <strings>`
+
+   * * * ASCII letter (``a``-``z``, ``A``-``Z``)
+       * non-ASCII character


Is 'non-ASCII character' too broad here? Not all characters can form valid identifiers, especially if expanding to the full Unicode space!

It is broad, but: if the tokenizer sees a non-ASCII character, the next token can only be a NAME (or error). (Except inside strings/comments, but then it's not deciding what the next token will be.)

If I remember correctly¹, the tokenizer implementation does lump non-ASCII characters with the letters, and only checks validity after it parses an identifier-like token.

¹ Maybe I don't, but it certainly could do that :)

willingc

A nice improvement @encukou. I've left a few prose suggestions but fine as is too. Thanks!

willingc · Oct 8, 2025

Doc/reference/lexical_analysis.rst

+.. note::
+
+   A ":dfn:`stream`" is a *sequence*, in the general sense of the word
+   (not necessarily a Python :term:`sequence object <sequence>`).


I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.

Doc/reference/lexical_analysis.rst

Co-authored-by: Carol Willing <carolcode@willingconsulting.com>

miss-islington-app · Oct 8, 2025

Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

(cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Carol Willing <carolcode@willingconsulting.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

bedevere-app · Oct 8, 2025

GH-139781 is a backport of this pull request to the 3.14 branch.

encukou · Oct 8, 2025

Thank you for the reviews!

…139781) (cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <encukou@gmail.com> Co-authored-by: Carol Willing <carolcode@willingconsulting.com> Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com> Co-authored-by: Blaise Pabon <blaise@gmail.com> Co-authored-by: Micha Albert <info@micha.zone> Co-authored-by: KeithTheEE <kmurrayis@gmail.com>

pythongh-135676: Add a summary of source characters

4f2b85b

bedevere-app bot added docs Documentation in the Doc dir skip news labels Aug 27, 2025

github-project-automation bot added this to Docs PRs Aug 27, 2025

github-project-automation bot moved this to Todo in Docs PRs Aug 27, 2025

bedevere-app bot mentioned this pull request Aug 27, 2025

Reword the Lexical Analysis chapter of the docs #135676

Open

StanFromIreland reviewed Aug 27, 2025

View reviewed changes

Doc/reference/lexical_analysis.rst Show resolved Hide resolved

StanFromIreland reviewed Aug 27, 2025

View reviewed changes

Doc/reference/lexical_analysis.rst Outdated Show resolved Hide resolved

serhiy-storchaka reviewed Aug 28, 2025

View reviewed changes

Doc/reference/lexical_analysis.rst Outdated Show resolved Hide resolved

encukou marked this pull request as ready for review September 3, 2025 14:28

encukou requested review from AA-Turner and willingc as code owners September 3, 2025 14:28

bedevere-app bot added the awaiting core review label Sep 3, 2025

Use zero-width space instead of joiner

d9157bb

encukou mentioned this pull request Sep 3, 2025

gh-135676: Reword the Operators & Delimiters section(s) #137713

Merged

AA-Turner approved these changes Oct 8, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Oct 8, 2025

willingc approved these changes Oct 8, 2025

View reviewed changes

encukou and others added 3 commits October 8, 2025 16:05

Update Doc/reference/lexical_analysis.rst

f085358

Co-authored-by: Carol Willing <carolcode@willingconsulting.com>

Remove note explaining *stream*

a30747f

Alternate list markers in list-table

300cc8c

encukou added the needs backport to 3.14 bugs and security fixes label Oct 8, 2025

encukou merged commit 59a6f9d into python:main Oct 8, 2025
29 checks passed

github-project-automation bot moved this from Todo to Done in Docs PRs Oct 8, 2025

bedevere-app bot removed the awaiting merge label Oct 8, 2025

encukou deleted the lex-analysis-highlevel branch October 8, 2025 14:34

bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Oct 8, 2025

Search code, repositories, users, issues, pull requests...

Uh oh!

Conversation

encukou commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AA-Turner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

miss-islington-app bot commented Oct 8, 2025

Uh oh!

bedevere-app bot commented Oct 8, 2025

Uh oh!

encukou commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

encukou commented Aug 27, 2025 •

edited by github-actions bot

Loading