@@ -277,29 +277,38 @@ Identifiers and keywords
277277
278278.. index :: identifier, name 
279279
280- Identifiers (also referred to as * names *) are described by the following lexical 
281- definitions .
280+ :data: ` ~token.NAME ` tokens represent * identifiers *, * keywords *, and 
281+ * soft keywords * .
282282
283- The syntax of identifiers in Python is based on the Unicode standard annex
284- UAX-31, with elaboration and changes as defined below; see also :pep: `3131 ` for
285- further details.
286- 
287- Within the ASCII range (U+0001..U+007F), the valid characters for identifiers
288- include the uppercase and lowercase letters ``A `` through
289- ``Z ``, the underscore ``_ `` and, except for the first character, the digits
283+ Within the ASCII range (U+0001..U+007F), the valid characters for names
284+ include the uppercase and lowercase letters (``A `` through
285+ ``Z ``), the underscore ``_ `` and, except for the first character, the digits
290286``0 `` through ``9 ``.
291- Python 3.0 introduced additional characters from outside the ASCII range (see
292- :pep: `3131 `).  For these characters, the classification uses the version of the
293- Unicode Character Database as included in the :mod: `unicodedata ` module.
294287
295- Identifiers are unlimited in length.  Case is significant.
288+ Names must contain at least one character, but have no upper length limit.
289+ Case is significant.
290+ 
291+ Besizes ``A-Z `` and ``0-9 ``, names can also use "letter-like" and "number-like"
292+ characters from outside the ASCII range.  For these characters, the
293+ classification uses the version of the Unicode Character Database as included
294+ in the :mod: `unicodedata ` module.
295+ 
296+ The exact definition of "letter-like" and "number-like" characters is based on
297+ the Unicode standard annex `UAX-31 `_, with elaboration and changes as
298+ defined below. See also :pep: `3131 ` for further details.
299+ 
300+ All identifiers are converted into the normal form NFKC while parsing;
301+ comparison of identifiers is based on NFKC.
302+ 
303+ Formally, names are described by the following lexical definitions.
296304
297305.. productionlist :: python-grammar 
298-    identifier : `xid_start ` `xid_continue`*
306+    NAME : `xid_start ` `xid_continue`*
299307   id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property>
300308   id_continue: <all characters in `id_start `, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
301309   xid_start: <all characters in `id_start ` whose NFKC normalization is in "id_start xid_continue*">
302310   xid_continue: <all characters in `id_continue ` whose NFKC normalization is in "id_continue*">
311+    identifier: <`NAME `, except keywords>
303312
304313The Unicode category codes mentioned above stand for:
305314
@@ -318,14 +327,14 @@ The Unicode category codes mentioned above stand for:
318327  compatibility
319328* *Other_ID_Continue * - likewise
320329
321- All identifiers are converted into the normal form NFKC while parsing; comparison
322- of identifiers is based on NFKC.
323- 
324330A non-normative HTML file listing all valid identifier characters for Unicode
32533116.0.0 can be found at
326332https://www.unicode.org/Public/16.0.0/ucd/DerivedCoreProperties.txt
327333
328334
335+ .. _UAX-31 : https://www.unicode.org/reports/tr31/ 
336+ 
337+ 
329338.. _keywords :
330339
331340Keywords
0 commit comments