@@ -272,67 +272,80 @@ possible string that forms a legal token, when read from left to right.
272272
273273.. _identifiers :
274274
275- Identifiers and keywords
276- ========================
275+ Names (identifiers and keywords)
276+ ================================
277277
278278.. index :: identifier, name
279279
280280:data: `~token.NAME ` tokens represent *identifiers *, *keywords *, and
281281*soft keywords *.
282282
283283Within the ASCII range (U+0001..U+007F), the valid characters for names
284- include the uppercase and lowercase letters (``A `` through
285- ``Z ``), the underscore ``_ `` and, except for the first character, the digits
284+ include the uppercase and lowercase letters (``A `` through `` Z `` and `` a `` to
285+ ``z ``), the underscore ``_ `` and, except for the first character, the digits
286286``0 `` through ``9 ``.
287287
288288Names must contain at least one character, but have no upper length limit.
289289Case is significant.
290290
291- Besizes ``A-Z `` and ``0-9 ``, names can also use "letter-like" and "number-like"
292- characters from outside the ASCII range. For these characters, the
293- classification uses the version of the Unicode Character Database as included
294- in the :mod: `unicodedata ` module.
291+ Besides ``A-Z ``, ``a-z ``, ``_ `` and ``0-9 ``, names can also use "letter-like"
292+ and "number-like" characters from outside the ASCII range, as detailed below.
295293
296- The exact definition of "letter-like" and "number-like" characters is based on
297- the Unicode standard annex `UAX-31 `_, with elaboration and changes as
298- defined below. See also :pep: `3131 ` for further details.
294+ All identifiers are converted into the `normalization form `_ NFKC while
295+ parsing; comparison of identifiers is based on NFKC.
299296
300- All identifiers are converted into the normal form NFKC while parsing;
301- comparison of identifiers is based on NFKC.
297+ Formally, the first character of a normalized identifier must belong to the
298+ set `` id_start ``, which is the union of:
302299
303- Formally, names are described by the following lexical definitions.
300+ * Unicode category ``<Lu> `` - uppercase letters (includes ``A `` to ``Z ``)
301+ * Unicode category ``<Ll> `` - lowercase letters (includes ``a `` to ``z ``)
302+ * Unicode category ``<Lt> `` - titlecase letters
303+ * Unicode category ``<Lm> `` - modifier letters
304+ * Unicode category ``<Lo> `` - other letters
305+ * Unicode category ``<Nl> `` - letter numbers
306+ * {``"_" ``} - the underscore
307+ * ``<Other_ID_Start> `` - an explicit set of characters in `PropList.txt `_
308+ to support backwards compatibility
304309
305- .. productionlist :: python-grammar
306- NAME: `xid_start ` `xid_continue`*
307- id_start: <all characters in general categories Lu, Ll, Lt, Lm, Lo, Nl, the underscore, and characters with the Other_ID_Start property>
308- id_continue: <all characters in `id_start `, plus characters in the categories Mn, Mc, Nd, Pc and others with the Other_ID_Continue property>
309- xid_start: <all characters in `id_start ` whose NFKC normalization is in "id_start xid_continue*">
310- xid_continue: <all characters in `id_continue ` whose NFKC normalization is in "id_continue*">
311- identifier: <`NAME `, except keywords>
312-
313- The Unicode category codes mentioned above stand for:
314-
315- * *Lu * - uppercase letters
316- * *Ll * - lowercase letters
317- * *Lt * - titlecase letters
318- * *Lm * - modifier letters
319- * *Lo * - other letters
320- * *Nl * - letter numbers
321- * *Mn * - nonspacing marks
322- * *Mc * - spacing combining marks
323- * *Nd * - decimal numbers
324- * *Pc * - connector punctuations
325- * *Other_ID_Start * - explicit list of characters in `PropList.txt
326- <https://www.unicode.org/Public/16.0.0/ucd/PropList.txt> `_ to support backwards
327- compatibility
328- * *Other_ID_Continue * - likewise
310+ The remaining characters must belong to the set ``id_continue ``, which is the
311+ union of:
312+
313+ * all characters in ``id_start ``
314+ * Unicode category ``<Nd> `` - decimal numbers (includes ``0 `` to ``9 ``)
315+ * Unicode category ``<Pc> `` - connector punctuations
316+ * Unicode category ``<Mn> `` - nonspacing marks
317+ * Unicode category ``<Mc> `` - spacing combining marks
318+ * ``<Other_ID_Continue> `` - another explicit set of characters in
319+ `PropList.txt `_ to support backwards compatibility
320+
321+ Unicode categories use the version of the Unicode Character Database as
322+ included in the :mod: `unicodedata ` module.
323+
324+ These sets are based on the Unicode standard annex `UAX-31 `_.
325+ See also :pep: `3131 ` for further details.
326+
327+ Even more formally, names are described by the following lexical definitions:
328+
329+ .. grammar-snippet ::
330+ :group: python-grammar
331+
332+ NAME: `xid_start ` `xid_continue`*
333+ id_start: <Lu> | <Ll> | <Lt> | <Lm> | <Lo> | <Nl> | "_" | <Other_ID_Start>
334+ id_continue: `id_start ` | <Nd> | <Pc> | <Mn> | <Mc> | <Other_ID_Continue>
335+ xid_start: <all characters in `id_start ` whose NFKC normalization is
336+ in (`id_start ` `xid_continue`*)">
337+ xid_continue: <all characters in `id_continue ` whose NFKC normalization is
338+ in (`id_continue`*)">
339+ identifier: <`NAME `, except keywords>
329340
330341A non-normative HTML file listing all valid identifier characters for Unicode
33134216.0.0 can be found at
332343https://www.unicode.org/Public/16.0.0/ucd/DerivedCoreProperties.txt
333344
334345
335346.. _UAX-31 : https://www.unicode.org/reports/tr31/
347+ .. _PropList.txt : https://www.unicode.org/Public/16.0.0/ucd/PropList.txt
348+ .. _normalization form : https://www.unicode.org/reports/tr15/#Norm_Forms
336349
337350
338351.. _keywords :
@@ -344,7 +357,7 @@ Keywords
344357 single: keyword
345358 single: reserved word
346359
347- The following identifiers are used as reserved words, or *keywords * of the
360+ The following names are used as reserved words, or *keywords * of the
348361language, and cannot be used as ordinary identifiers. They must be spelled
349362exactly as written here:
350363
@@ -368,18 +381,19 @@ Soft Keywords
368381
369382.. versionadded :: 3.10
370383
371- Some identifiers are only reserved under specific contexts. These are known as
372- *soft keywords *. The identifiers ``match ``, ``case ``, ``type `` and ``_ `` can
373- syntactically act as keywords in certain contexts,
384+ Some names are only reserved under specific contexts. These are known as
385+ *soft keywords *:
386+
387+ - ``match ``, ``case ``, and ``_ ``, when used in the :keyword: `match ` statement.
388+ - ``type ``, when used in the :keyword: `type ` statement.
389+
390+ These syntactically act as keywords in their specific contexts,
374391but this distinction is done at the parser level, not when tokenizing.
375392
376393As soft keywords, their use in the grammar is possible while still
377394preserving compatibility with existing code that uses these names as
378395identifier names.
379396
380- ``match ``, ``case ``, and ``_ `` are used in the :keyword: `match ` statement.
381- ``type `` is used in the :keyword: `type ` statement.
382-
383397.. versionchanged :: 3.12
384398 ``type `` is now a soft keyword.
385399
0 commit comments