Skip to content

Commit 3bace0a

Browse files
committed
Start on the Identifiers section
1 parent 4b54031 commit 3bace0a

File tree

1 file changed

+26
-17
lines changed

1 file changed

+26
-17
lines changed

Doc/reference/lexical_analysis.rst

Lines changed: 26 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -277,29 +277,38 @@ Identifiers and keywords
277277

278278
.. index:: identifier, name
279279

280-
Identifiers (also referred to as *names*) are described by the following lexical
281-
definitions.
280+
:data:`~token.NAME` tokens represent *identifiers*, *keywords*, and
281+
*soft keywords*.
282282

283-
The syntax of identifiers in Python is based on the Unicode standard annex
284-
UAX-31, with elaboration and changes as defined below; see also :pep:`3131` for
285-
further details.
286-
287-
Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
288-
include the uppercase and lowercase letters ``A`` through
289-
``Z``, the underscore ``_`` and, except for the first character, the digits
283+
Within the ASCII range (U+0001..U+007F), the valid characters for names
284+
include the uppercase and lowercase letters (``A`` through
285+
``Z``), the underscore ``_`` and, except for the first character, the digits
290286
``0`` through ``9``.
291-
Python 3.0 introduced additional characters from outside the ASCII range (see
292-
:pep:`3131`). For these characters, the classification uses the version of the
293-
Unicode Character Database as included in the :mod:`unicodedata` module.
294287

295-
Identifiers are unlimited in length. Case is significant.
288+
Names must contain at least one character, but have no upper length limit.
289+
Case is significant.
290+
291+
Besizes ``A-Z`` and ``0-9``, names can also use "letter-like" and "number-like"
292+
characters from outside the ASCII range. For these characters, the
293+
classification uses the version of the Unicode Character Database as included
294+
in the :mod:`unicodedata` module.
295+
296+
The exact definition of "letter-like" and "number-like" characters is based on
297+
the Unicode standard annex `UAX-31`_, with elaboration and changes as
298+
defined below. See also :pep:`3131` for further details.
299+
300+
All identifiers are converted into the normal form NFKC while parsing;
301+
comparison of identifiers is based on NFKC.
302+
303+
Formally, names are described by the following lexical definitions.
296304

297305
.. productionlist:: python-grammar
298-
identifier: `xid_start` `xid_continue`*
306+
NAME: `xid_start` `xid_continue`*
299307
id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property>
300308
id_continue: <all characters in `id_start`, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
301309
xid_start: <all characters in `id_start` whose NFKC normalization is in "id_start xid_continue*">
302310
xid_continue: <all characters in `id_continue` whose NFKC normalization is in "id_continue*">
311+
identifier: <`NAME`, except keywords>
303312

304313
The Unicode category codes mentioned above stand for:
305314

@@ -318,14 +327,14 @@ The Unicode category codes mentioned above stand for:
318327
compatibility
319328
* *Other_ID_Continue* - likewise
320329

321-
All identifiers are converted into the normal form NFKC while parsing; comparison
322-
of identifiers is based on NFKC.
323-
324330
A non-normative HTML file listing all valid identifier characters for Unicode
325331
16.0.0 can be found at
326332
https://www.unicode.org/Public/16.0.0/ucd/DerivedCoreProperties.txt
327333

328334

335+
.. _UAX-31: https://www.unicode.org/reports/tr31/
336+
337+
329338
.. _keywords:
330339

331340
Keywords

0 commit comments

Comments
 (0)