gh-135676: Add a summary of source characters #138194

encukou · 2025-08-27T15:46:57Z

The lexical analysis docs have notes like this at the end:

The period can also occur in floating-point and imaginary literals.
The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer: ' " # \
The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error: $ ? `

The intent behind these seems to be providing a "map" of what all the ASCII characters do in Python, but that map is incomplete as it is, and isn't really kept up to date.

This instead provides a summary of source characters -- nominally the ones that start tokens, with notes for other notable cases.
The table can also serve as an alternate "table of contents".

The presentation -- a table of bulleted lists -- is a bit wacky but I think it gets the job done.

Issue: Reword the Lexical Analysis chapter of the docs #135676

📚 Documentation preview 📚: https://cpython-previews--138194.org.readthedocs.build/

Doc/reference/lexical_analysis.rst

AA-Turner

I think this is a useful addition!

A

AA-Turner · 2025-10-08T06:14:22Z

Doc/reference/lexical_analysis.rst

+.. note::
+
+   A ":dfn:`stream`" is a *sequence*, in the general sense of the word
+   (not necessarily a Python :term:`sequence object <sequence>`).


I'm not sure this note is needed?

I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.

OK; I've removed it

AA-Turner · 2025-10-08T06:15:34Z

Doc/reference/lexical_analysis.rst

+.. list-table::
+   :header-rows: 1


In general for list tables it can be useful to alternate list markers, e.g. using - to denote items of the second-level list. Not essential, though.

All my list-tables will do that from now on :)

AA-Turner · 2025-10-08T06:19:27Z

Doc/reference/lexical_analysis.rst

+     * * :ref:`String literal <strings>`
+
+   * * * ASCII letter (``a``-``z``, ``A``-``Z``)
+       * non-ASCII character


Is 'non-ASCII character' too broad here? Not all characters can form valid identifiers, especially if expanding to the full Unicode space!

It is broad, but: if the tokenizer sees a non-ASCII character, the next token can only be a NAME (or error). (Except inside strings/comments, but then it's not deciding what the next token will be.)

If I remember correctly¹, the tokenizer implementation does lump non-ASCII characters with the letters, and only checks validity after it parses an identifier-like token.

¹ Maybe I don't, but it certainly could do that :)

willingc

A nice improvement @encukou. I've left a few prose suggestions but fine as is too. Thanks!

willingc · 2025-10-08T09:23:52Z

Doc/reference/lexical_analysis.rst

+.. note::
+
+   A ":dfn:`stream`" is a *sequence*, in the general sense of the word
+   (not necessarily a Python :term:`sequence object <sequence>`).


I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.

Doc/reference/lexical_analysis.rst

Co-authored-by: Carol Willing <[email protected]>

miss-islington-app · 2025-10-08T14:34:25Z

Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

(cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Carol Willing <[email protected]> Co-authored-by: Stan Ulbrych <[email protected]> Co-authored-by: Blaise Pabon <[email protected]> Co-authored-by: Micha Albert <[email protected]> Co-authored-by: KeithTheEE <[email protected]>

bedevere-app · 2025-10-08T14:34:41Z

GH-139781 is a backport of this pull request to the 3.14 branch.

encukou · 2025-10-08T14:34:50Z

Thank you for the reviews!

…139781) (cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Carol Willing <[email protected]> Co-authored-by: Stan Ulbrych <[email protected]> Co-authored-by: Blaise Pabon <[email protected]> Co-authored-by: Micha Albert <[email protected]> Co-authored-by: KeithTheEE <[email protected]>

pythongh-135676: Add a summary of source characters

4f2b85b

bedevere-app bot added docs Documentation in the Doc dir skip news labels Aug 27, 2025

github-project-automation bot added this to Docs PRs Aug 27, 2025

github-project-automation bot moved this to Todo in Docs PRs Aug 27, 2025

bedevere-app bot mentioned this pull request Aug 27, 2025

Reword the Lexical Analysis chapter of the docs #135676

Open

StanFromIreland reviewed Aug 27, 2025

View reviewed changes

Doc/reference/lexical_analysis.rst Show resolved Hide resolved

StanFromIreland reviewed Aug 27, 2025

View reviewed changes

Doc/reference/lexical_analysis.rst Outdated Show resolved Hide resolved

serhiy-storchaka reviewed Aug 28, 2025

View reviewed changes

Doc/reference/lexical_analysis.rst Outdated Show resolved Hide resolved

encukou marked this pull request as ready for review September 3, 2025 14:28

encukou requested review from willingc and AA-Turner as code owners September 3, 2025 14:28

bedevere-app bot added the awaiting core review label Sep 3, 2025

Use zero-width space instead of joiner

d9157bb

encukou mentioned this pull request Sep 3, 2025

gh-135676: Reword the Operators & Delimiters section(s) #137713

Merged

AA-Turner approved these changes Oct 8, 2025

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Oct 8, 2025

willingc approved these changes Oct 8, 2025

View reviewed changes

encukou and others added 3 commits October 8, 2025 16:05

Update Doc/reference/lexical_analysis.rst

f085358

Co-authored-by: Carol Willing <[email protected]>

Remove note explaining *stream*

a30747f

Alternate list markers in list-table

300cc8c

encukou added the needs backport to 3.14 bugs and security fixes label Oct 8, 2025

encukou merged commit 59a6f9d into python:main Oct 8, 2025
29 checks passed

github-project-automation bot moved this from Todo to Done in Docs PRs Oct 8, 2025

bedevere-app bot removed the awaiting merge label Oct 8, 2025

encukou deleted the lex-analysis-highlevel branch October 8, 2025 14:34

bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Oct 8, 2025

Uh oh!

gh-135676: Add a summary of source characters #138194

gh-135676: Add a summary of source characters #138194

Conversation

encukou commented Aug 27, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AA-Turner left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

willingc left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

miss-islington-app bot commented Oct 8, 2025

Uh oh!

bedevere-app bot commented Oct 8, 2025

Uh oh!

encukou commented Oct 8, 2025

Uh oh!

Uh oh!

encukou commented Aug 27, 2025 •

edited by github-actions bot

Loading