-
-
Notifications
You must be signed in to change notification settings - Fork 33k
gh-135676: Add a summary of source characters #138194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a useful addition!
A
Doc/reference/lexical_analysis.rst
Outdated
.. note:: | ||
|
||
A ":dfn:`stream`" is a *sequence*, in the general sense of the word | ||
(not necessarily a Python :term:`sequence object <sequence>`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this note is needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK; I've removed it
.. list-table:: | ||
:header-rows: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general for list tables it can be useful to alternate list markers, e.g. using -
to denote items of the second-level list. Not essential, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my list-tables will do that from now on :)
* * :ref:`String literal <strings>` | ||
|
||
* * * ASCII letter (``a``-``z``, ``A``-``Z``) | ||
* non-ASCII character |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 'non-ASCII character' too broad here? Not all characters can form valid identifiers, especially if expanding to the full Unicode space!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is broad, but: if the tokenizer sees a non-ASCII character, the next token can only be a NAME (or error). (Except inside strings/comments, but then it's not deciding what the next token will be.)
If I remember correctly¹, the tokenizer implementation does lump non-ASCII characters with the letters, and only checks validity after it parses an identifier-like token.
¹ Maybe I don't, but it certainly could do that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nice improvement @encukou. I've left a few prose suggestions but fine as is too. Thanks!
Doc/reference/lexical_analysis.rst
Outdated
.. note:: | ||
|
||
A ":dfn:`stream`" is a *sequence*, in the general sense of the word | ||
(not necessarily a Python :term:`sequence object <sequence>`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AA-Turner. Stream and sequence are both overloaded terms that may be better unpacked by the reader in context.
Co-authored-by: Carol Willing <[email protected]>
Thanks @encukou for the PR 🌮🎉.. I'm working now to backport this PR to: 3.14. |
(cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Carol Willing <[email protected]> Co-authored-by: Stan Ulbrych <[email protected]> Co-authored-by: Blaise Pabon <[email protected]> Co-authored-by: Micha Albert <[email protected]> Co-authored-by: KeithTheEE <[email protected]>
GH-139781 is a backport of this pull request to the 3.14 branch. |
Thank you for the reviews! |
…139781) (cherry picked from commit 59a6f9d) Co-authored-by: Petr Viktorin <[email protected]> Co-authored-by: Carol Willing <[email protected]> Co-authored-by: Stan Ulbrych <[email protected]> Co-authored-by: Blaise Pabon <[email protected]> Co-authored-by: Micha Albert <[email protected]> Co-authored-by: KeithTheEE <[email protected]>
The lexical analysis docs have notes like this at the end:
The period can also occur in floating-point and imaginary literals.
The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer:
' " # \
The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error:
$ ? `
The intent behind these seems to be providing a "map" of what all the ASCII characters do in Python, but that map is incomplete as it is, and isn't really kept up to date.
This instead provides a summary of source characters -- nominally the ones that start tokens, with notes for other notable cases.
The table can also serve as an alternate "table of contents".
The presentation -- a table of bulleted lists -- is a bit wacky but I think it gets the job done.
📚 Documentation preview 📚: https://cpython-previews--138194.org.readthedocs.build/