Skip to content

Conversation

encukou
Copy link
Member

@encukou encukou commented Aug 27, 2025

The lexical analysis docs have notes like this at the end:

  • The period can also occur in floating-point and imaginary literals.

  • The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer: ' " # \

  • The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error: $ ? `

The intent behind these seems to be providing a "map" of what all the ASCII characters do in Python, but that map is incomplete as it is, and isn't really kept up to date.

This instead provides a summary of source characters -- nominally the ones that start tokens, with notes for other notable cases.
The table can also serve as an alternate "table of contents".

The presentation -- a table of bulleted lists -- is a bit wacky but I think it gets the job done.


📚 Documentation preview 📚: https://cpython-previews--138194.org.readthedocs.build/

@encukou encukou marked this pull request as ready for review September 3, 2025 14:28
Copy link
Member

@AA-Turner AA-Turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a useful addition!

A

Comment on lines 15 to 18
.. note::

A ":dfn:`stream`" is a *sequence*, in the general sense of the word
(not necessarily a Python :term:`sequence object <sequence>`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this note is needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK; I've removed it

Comment on lines +32 to +33
.. list-table::
:header-rows: 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general for list tables it can be useful to alternate list markers, e.g. using - to denote items of the second-level list. Not essential, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All my list-tables will do that from now on :)

* * :ref:`String literal <strings>`

* * * ASCII letter (``a``-``z``, ``A``-``Z``)
* non-ASCII character
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 'non-ASCII character' too broad here? Not all characters can form valid identifiers, especially if expanding to the full Unicode space!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is broad, but: if the tokenizer sees a non-ASCII character, the next token can only be a NAME (or error). (Except inside strings/comments, but then it's not deciding what the next token will be.)

If I remember correctly¹, the tokenizer implementation does lump non-ASCII characters with the letters, and only checks validity after it parses an identifier-like token.

¹ Maybe I don't, but it certainly could do that :)

Copy link
Contributor

@willingc willingc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A nice improvement @encukou. I've left a few prose suggestions but fine as is too. Thanks!

Comment on lines 15 to 18
.. note::

A ":dfn:`stream`" is a *sequence*, in the general sense of the word
(not necessarily a Python :term:`sequence object <sequence>`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.

@encukou encukou added the needs backport to 3.14 bugs and security fixes label Oct 8, 2025
@encukou encukou merged commit 59a6f9d into python:main Oct 8, 2025
29 checks passed
@github-project-automation github-project-automation bot moved this from Todo to Done in Docs PRs Oct 8, 2025
@miss-islington-app
Copy link

Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Oct 8, 2025
(cherry picked from commit 59a6f9d)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Carol Willing <[email protected]>
Co-authored-by: Stan Ulbrych <[email protected]>
Co-authored-by: Blaise Pabon <[email protected]>
Co-authored-by: Micha Albert <[email protected]>
Co-authored-by: KeithTheEE <[email protected]>
@encukou encukou deleted the lex-analysis-highlevel branch October 8, 2025 14:34
@bedevere-app
Copy link

bedevere-app bot commented Oct 8, 2025

GH-139781 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label Oct 8, 2025
@encukou
Copy link
Member Author

encukou commented Oct 8, 2025

Thank you for the reviews!

encukou added a commit that referenced this pull request Oct 8, 2025
…139781)

(cherry picked from commit 59a6f9d)

Co-authored-by: Petr Viktorin <[email protected]>
Co-authored-by: Carol Willing <[email protected]>
Co-authored-by: Stan Ulbrych <[email protected]>
Co-authored-by: Blaise Pabon <[email protected]>
Co-authored-by: Micha Albert <[email protected]>
Co-authored-by: KeithTheEE <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docs Documentation in the Doc dir skip news
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

5 participants