From d4381aa9bfc80fe0f3d9530bc32aba8df47caa07 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 4 Jun 2025 18:01:25 +0200
Subject: [PATCH 01/17] WIP: String literals

Co-authored-by: Blaise Pabon <blaise@gmail.com>
---
 Doc/reference/expressions.rst      |  50 +++++-
 Doc/reference/lexical_analysis.rst | 255 ++++++++++++++++++-----------
 2 files changed, 207 insertions(+), 98 deletions(-)
diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst
index 17f39aaf5f57cd..743d43b1c9c1b1 100644
--- a/Doc/reference/expressions.rst
+++ b/Doc/reference/expressions.rst
@@ -133,13 +133,18 @@ Literals
 
 Python supports string and bytes literals and various numeric literals:
 
-.. productionlist:: python-grammar
-   literal: `stringliteral` | `bytesliteral` | `NUMBER`
+.. grammar-snippet::
+   :group: python-grammar
+
+   literal: `strings` | `NUMBER`
 
 Evaluation of a literal yields an object of the given type (string, bytes,
 integer, floating-point number, complex number) with the given value.  The value
 may be approximated in the case of floating-point and imaginary (complex)
-literals.  See section :ref:`literals` for details.
+literals.
+See section :ref:`literals` for details.
+Seee section :ref:`string-concatenation` for details on ``strings``.
+
 
 .. index::
    triple: immutable; data; type
@@ -152,6 +157,45 @@ occurrence) may obtain the same object or a different object with the same
 value.
 
 
+.. _string-concatenation:
+
+String literal concatenation
+............................
+
+Multiple adjacent string or bytes literals (delimited by whitespace), possibly
+using different quoting conventions, are allowed, and their meaning is the same
+as their concatenation.  Thus, ``"hello" 'world'`` is equivalent to
+``"helloworld"``.
+
+Formally:
+
+.. grammar-snippet::
+   :group: python-grammar
+
+   strings: ( `STRING` | `fstring` | `tstring`)+
+
+Note that this feature is defined at the syntactical level, so it only works
+with literals.
+To concatenate string expressions at run time, the '+' operator may be used::
+
+   greeting = "Hello"
+   space = " "
+   name = "Blaise"
+   print(greeting + space + name)   # not: print(greeting space name)
+
+Also note that literal concatenation can freely mix raw strings,
+triple-quoted strings, and formatted or template string literals.
+However, bytes literals may not be combined with string literals of any kind.
+
+This feature can be used to reduce the number of backslashes
+needed, to split long strings conveniently across long lines, or even to add
+comments to parts of strings, for example::
+
+   re.compile("[A-Za-z_]"       # letter or underscore
+              "[A-Za-z0-9_]*"   # letter, digit or underscore
+             )
+
+
 .. _parenthesized:
 
 Parenthesized forms
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 567c70111c20ec..58c8b15cfe5499 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -106,6 +106,16 @@ If an encoding is declared, the encoding name must be recognized by Python
 encoding is used for all lexical analysis, including string literals, comments
 and identifiers.
 
+All lexical analysis, including string literals, comments
+and identifiers, works on Unicode text decoded using the source encoding.
+Any Unicode code point, except the NUL control character, can appear in
+Python source.
+
+.. grammar-snippet::
+   :group: python-grammar
+
+   source_character:  <any Unicode code point, except NUL>
+
 
 .. _explicit-joining:
 
@@ -478,66 +488,104 @@ Literals are notations for constant values of some built-in types.
 .. index:: string literal, bytes literal, ASCII
    single: ' (single quote); string literal
    single: " (double quote); string literal
-   single: u'; string literal
-   single: u"; string literal
 .. _strings:
 
 String and Bytes literals
 -------------------------
 
-String literals are described by the following lexical definitions:
+String literals are text enclosed in single quotes (``'``) or double
+quotes (``"``). For example:
 
-.. productionlist:: python-grammar
-   stringliteral: [`stringprefix`](`shortstring` | `longstring`)
-   stringprefix: "r" | "u" | "R" | "U" | "f" | "F" | "t" | "T"
-               : | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF"
-               : | "tr" | "Tr" | "tR" | "TR" | "rt" | "rT" | "Rt" | "RT"
-   shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"'
-   longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""'
-   shortstringitem: `shortstringchar` | `stringescapeseq`
-   longstringitem: `longstringchar` | `stringescapeseq`
-   shortstringchar: <any source character except "\" or newline or the quote>
-   longstringchar: <any source character except "\">
-   stringescapeseq: "\" <any source character>
+.. code-block:: plain
 
-.. productionlist:: python-grammar
-   bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`)
-   bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB"
-   shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"'
-   longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""'
-   shortbytesitem: `shortbyteschar` | `bytesescapeseq`
-   longbytesitem: `longbyteschar` | `bytesescapeseq`
-   shortbyteschar: <any ASCII character except "\" or newline or the quote>
-   longbyteschar: <any ASCII character except "\">
-   bytesescapeseq: "\" <any ASCII character>
+   "spam"
+   'eggs'
+
+The quote used to start the literal also terminates it, so a string literal
+can only contain the other quote (except with escape sequences, see below).
+For example:
+
+.. code-block:: plain
 
-One syntactic restriction not indicated by these productions is that whitespace
-is not allowed between the :token:`~python-grammar:stringprefix` or
-:token:`~python-grammar:bytesprefix` and the rest of the literal. The source
-character set is defined by the encoding declaration; it is UTF-8 if no encoding
-declaration is given in the source file; see section :ref:`encodings`.
+   'Say "Hello", please.'
+   "Don't do that!"
 
-.. index:: triple-quoted string, Unicode Consortium, raw string
+Except for this limitation, the choice of quote character (``'`` or ``"``)
+does not affect how the literal is parsed.
+
+.. index:: triple-quoted string
    single: """; string literal
    single: '''; string literal
 
-In plain English: Both types of literals can be enclosed in matching single quotes
-(``'``) or double quotes (``"``).  They can also be enclosed in matching groups
-of three single or double quotes (these are generally referred to as
-*triple-quoted strings*). The backslash (``\``) character is used to give special
-meaning to otherwise ordinary characters like ``n``, which means 'newline' when
-escaped (``\n``). It can also be used to escape characters that otherwise have a
-special meaning, such as newline, backslash itself, or the quote character.
-See :ref:`escape sequences <escape-sequences>` below for examples.
+Triple-quoted strings
+---------------------
+
+Strings can also be enclosed in matching groups of three single or double
+quotes.
+These are generally referred to as :dfn:`triple-quoted strings`.
+
+In triple-quoted literals, unescaped newlines and quotes are allowed (and are
+retained), except that three unescaped quotes in a row terminate the literal.
+(Here, a *quote* is the character used to open the literal, that is,
+either ``'`` or ``"``.)
+
+For example:
+
+.. code-block:: plain
+
+   """This is a triple-quoted string with "quotes" inside."""
+
+   '''Another triple-quoted string. This one continues
+   on the next line.'''
+
+Escape sequences
+----------------
+
+Inside a string literal, the backslash (``\``) character introduces an
+:dfn:`escape sequence`, which has special meaning depending on the character
+after the backslash.
+For example, ``\n`` denotes the 'newline' character, rather the two characters
+``\`` and ``n``.
+See :ref:`escape sequences <escape-sequences>` below for a full list of such
+sequences, and more details.
+
+
+.. index::
+   single: u'; string literal
+   single: u"; string literal
+
+String prefixes
+---------------
+
+String literals can have an optional :dfn:`prefix` that influences how the literal
+is parsed, for example:
+
+.. code-block:: plain
+
+   b"data"
+   f'{result=}'
+
+* ``r``: Raw string
+* ``f``: "F-string"
+* ``t``: "T-string"
+* ``b``: Byte literal
+* ``u``: No effect (allowed for backwards compatibility)
+
+Prefixes are case-insensitive (for example, ``B`` works the same as ``b``).
+The ``r`` prefix can be combined with ``f``, ``t`` or ``b``, so ``fr``,
+``rf``, ``tr``, ``rt``, ``br`` and ``rb`` are also valid prefixes.
+
 
 .. index::
    single: b'; bytes literal
    single: b"; bytes literal
 
-Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an
-instance of the :class:`bytes` type instead of the :class:`str` type.  They
-may only contain ASCII characters; bytes with a numeric value of 128 or greater
-must be expressed with escapes.
+:dfn:`Bytes literals` are always prefixed with ``'b'`` or ``'B'``; they produce an
+instance of the :class:`bytes` type instead of the :class:`str` type.
+They may only contain ASCII characters; bytes with a numeric value of 128
+or greater must be expressed with escape sequences.
+Similarly, a zero byte must be expressed using an escape sequence.
+
 
 .. index::
    single: r'; raw string literal
@@ -546,9 +594,33 @@ must be expressed with escapes.
 Both string and bytes literals may optionally be prefixed with a letter ``'r'``
 or ``'R'``; such constructs are called :dfn:`raw string literals`
 and :dfn:`raw bytes literals` respectively and treat backslashes as
-literal characters.  As a result, in raw string literals, ``'\U'`` and ``'\u'``
+literal characters.
+As a result, in raw string literals, :ref:`escape sequences <escape-sequences>`
 escapes are not treated specially.
 
+Even in a raw literal, quotes can be escaped with a backslash, but the
+backslash remains in the result; for example, ``r"\""`` is a valid string
+literal consisting of two characters: a backslash and a double quote; ``r"\"``
+is not a valid string literal (even a raw string cannot end in an odd number of
+backslashes).  Specifically, *a raw literal cannot end in a single backslash*
+(since the backslash would escape the following quote character).  Note also
+that a single backslash followed by a newline is interpreted as those two
+characters as part of the literal, *not* as a line continuation.
+
+
+.. index::
+   single: f'; formatted string literal
+   single: f"; formatted string literal
+
+A string literal with ``'f'`` or ``'F'`` in its prefix is a
+:dfn:`formatted string literal`; see :ref:`f-strings`.
+Similarly, string literal with ``'t'`` or ``'T'`` in its prefix is a
+:dfn:`template string literal`; see :ref:`t-strings`.
+
+The ``'f'`` or ``t`` may be combined with ``'r'`` to create a
+:dfn:`raw formatted string` or :dfn:`raw template string`.
+They may not be combined with ``'b'``, ``'u'``, or each other.
+
 .. versionadded:: 3.3
    The ``'rb'`` prefix of raw bytes literals has been added as a synonym
    of ``'br'``.
@@ -557,18 +629,46 @@ escapes are not treated specially.
    to simplify the maintenance of dual Python 2.x and 3.x codebases.
    See :pep:`414` for more information.
 
-.. index::
-   single: f'; formatted string literal
-   single: f"; formatted string literal
 
-A string literal with ``'f'`` or ``'F'`` in its prefix is a
-:dfn:`formatted string literal`; see :ref:`f-strings`.  The ``'f'`` may be
-combined with ``'r'``, but not with ``'b'`` or ``'u'``, therefore raw
-formatted strings are possible, but formatted bytes literals are not.
+String literals, except "F-strings" and "T-strings", are described by the
+following lexical definitions:
+
+.. grammar-snippet::
+   :group: python-grammar
+
+   STRING: stringliteral | bytesliteral | fstring | tstring
+
+   stringliteral:   [`stringprefix`](`stringcontent`)
+   stringprefix:    <("r" | "u"), case-insensitive>
+   stringcontent:   `quote` `stringitem`* <matching `quote`>
+   quote:           "'" | '"' |  "'''"  | '"""'
+   stringitem:      `stringchar` | `stringescapeseq`
+   stringchar:      <any `source_character`, except as listed below>
+   stringescapeseq: "\" <any `source_character`>
+
+``stringchar`` can not include:
+
+- the backslash, ``\``;
+- in triple-quoted strings (quoted by ``'''`` or ``"""``), the newline;
+- the quote character.
+
+
+.. grammar-snippet::
+   :group: python-grammar
+
+   bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`)
+   bytesprefix: <("b" | "br" | "rb" ), case-insensitive>
+   shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"'
+   longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""'
+   shortbytesitem: `shortbyteschar` | `bytesescapeseq`
+   longbytesitem: `longbyteschar` | `bytesescapeseq`
+   shortbyteschar: <any ASCII `source_character` except "\" or newline or the quote>
+   longbyteschar: <any ASCII `source_character` except "\">
+   bytesescapeseq: "\" <any ASCII `source_character`>
+
+Note that as in all lexical definitions, whitespace is significant.
+The prefix, if any, must be followed immediately by the quoted string content.
 
-In triple-quoted literals, unescaped newlines and quotes are allowed (and are
-retained), except that three unescaped quotes in a row terminate the literal.  (A
-"quote" is the character used to open the literal, i.e. either ``'`` or ``"``.)
 
 .. index:: physical line, escape sequence, Standard C, C
    single: \ (backslash); escape sequence
@@ -587,7 +687,6 @@ retained), except that three unescaped quotes in a row terminate the literal.  (
 
 .. _escape-sequences:
 
-
 Escape sequences
 ^^^^^^^^^^^^^^^^
 
@@ -655,14 +754,14 @@ Notes:
 
 
 (2)
-   As in Standard C, up to three octal digits are accepted.
+   As in Standard C, up to three octal digits (0 through 7) are accepted.
 
    .. versionchanged:: 3.11
-      Octal escapes with value larger than ``0o377`` produce a
+      Octal escapes with value larger than ``0o377`` (255) produce a
       :exc:`DeprecationWarning`.
 
    .. versionchanged:: 3.12
-      Octal escapes with value larger than ``0o377`` produce a
+      Octal escapes with value larger than ``0o377`` (255) produce a
       :exc:`SyntaxWarning`. In a future Python version they will be eventually
       a :exc:`SyntaxError`.
 
@@ -689,11 +788,9 @@ Notes:
 .. index:: unrecognized escape sequence
 
 Unlike Standard C, all unrecognized escape sequences are left in the string
-unchanged, i.e., *the backslash is left in the result*.  (This behavior is
-useful when debugging: if an escape sequence is mistyped, the resulting output
-is more easily recognized as broken.)  It is also important to note that the
-escape sequences only recognized in string literals fall into the category of
-unrecognized escapes for bytes literals.
+unchanged, i.e., *the backslash is left in the result*.
+Note that for bytes literals, the escape sequences only recognized in string
+literals fall into the category of unrecognized escapes.
 
 .. versionchanged:: 3.6
    Unrecognized escape sequences produce a :exc:`DeprecationWarning`.
@@ -702,38 +799,6 @@ unrecognized escapes for bytes literals.
    Unrecognized escape sequences produce a :exc:`SyntaxWarning`. In a future
    Python version they will be eventually a :exc:`SyntaxError`.
 
-Even in a raw literal, quotes can be escaped with a backslash, but the
-backslash remains in the result; for example, ``r"\""`` is a valid string
-literal consisting of two characters: a backslash and a double quote; ``r"\"``
-is not a valid string literal (even a raw string cannot end in an odd number of
-backslashes).  Specifically, *a raw literal cannot end in a single backslash*
-(since the backslash would escape the following quote character).  Note also
-that a single backslash followed by a newline is interpreted as those two
-characters as part of the literal, *not* as a line continuation.
-
-
-.. _string-concatenation:
-
-String literal concatenation
-----------------------------
-
-Multiple adjacent string or bytes literals (delimited by whitespace), possibly
-using different quoting conventions, are allowed, and their meaning is the same
-as their concatenation.  Thus, ``"hello" 'world'`` is equivalent to
-``"helloworld"``.  This feature can be used to reduce the number of backslashes
-needed, to split long strings conveniently across long lines, or even to add
-comments to parts of strings, for example::
-
-   re.compile("[A-Za-z_]"       # letter or underscore
-              "[A-Za-z0-9_]*"   # letter, digit or underscore
-             )
-
-Note that this feature is defined at the syntactical level, but implemented at
-compile time.  The '+' operator must be used to concatenate string expressions
-at run time.  Also note that literal concatenation can use different quoting
-styles for each component (even mixing raw strings and triple quoted strings),
-and formatted string literals may be concatenated with plain string literals.
-
 
 .. index::
    single: formatted string literal

From 80ad85cc286f04a4ac19d03c5f99a9158d15231b Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 11 Jun 2025 16:22:08 +0200
Subject: [PATCH 02/17] Use correct Pygments lexer for plain text

---
 Doc/reference/lexical_analysis.rst | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 58c8b15cfe5499..6f3d90f89b98d3 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -496,7 +496,11 @@ String and Bytes literals
 String literals are text enclosed in single quotes (``'``) or double
 quotes (``"``). For example:
 
-.. code-block:: plain
+.. This is Python code, but we turn off highlighting because as of this
+   writing, highlighted strings don't look good when there's no code
+   surrounding them.
+
+.. code-block:: text
 
    "spam"
    'eggs'
@@ -505,7 +509,7 @@ The quote used to start the literal also terminates it, so a string literal
 can only contain the other quote (except with escape sequences, see below).
 For example:
 
-.. code-block:: plain
+.. code-block:: text
 
    'Say "Hello", please.'
    "Don't do that!"
@@ -531,7 +535,7 @@ either ``'`` or ``"``.)
 
 For example:
 
-.. code-block:: plain
+.. code-block:: text
 
    """This is a triple-quoted string with "quotes" inside."""
 
@@ -560,7 +564,7 @@ String prefixes
 String literals can have an optional :dfn:`prefix` that influences how the literal
 is parsed, for example:
 
-.. code-block:: plain
+.. code-block:: python
 
    b"data"
    f'{result=}'

From e44fa66cf2da63763a3ed37f7d59da28e95c785c Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 11 Jun 2025 17:59:01 +0200
Subject: [PATCH 03/17] WIP

---
 Doc/reference/grammar.rst          |   5 +-
 Doc/reference/introduction.rst     |  16 +++--
 Doc/reference/lexical_analysis.rst | 110 +++++++++++++++++------------
 3 files changed, 76 insertions(+), 55 deletions(-)

diff --git a/Doc/reference/grammar.rst b/Doc/reference/grammar.rst
index 55c148801d8559..1037feb691f6bc 100644
--- a/Doc/reference/grammar.rst
+++ b/Doc/reference/grammar.rst
@@ -10,11 +10,8 @@ error recovery.
 
 The notation used here is the same as in the preceding docs,
 and is described in the :ref:`notation <notation>` section,
-except for a few extra complications:
+except for an extra complication:
 
-* ``&e``: a positive lookahead (that is, ``e`` is required to match but
-  not consumed)
-* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match)
 * ``~`` ("cut"): commit to the current alternative and fail the rule
   even if this fails to parse
 
diff --git a/Doc/reference/introduction.rst b/Doc/reference/introduction.rst
index 444acac374a690..c62240b18cfe55 100644
--- a/Doc/reference/introduction.rst
+++ b/Doc/reference/introduction.rst
@@ -145,15 +145,23 @@ The definition to the right of the colon uses the following syntax elements:
 * ``e?``: A question mark has exactly the same meaning as square brackets:
   the preceding item is optional.
 * ``(e)``: Parentheses are used for grouping.
+
+The following notation is only used in
+:ref:`lexical definitions <notation-lexical-vs-syntactic>`.
+
 * ``"a"..."z"``: Two literal characters separated by three dots mean a choice
   of any single character in the given (inclusive) range of ASCII characters.
-  This notation is only used in
-  :ref:`lexical definitions <notation-lexical-vs-syntactic>`.
 * ``<...>``: A phrase between angular brackets gives an informal description
   of the matched symbol (for example, ``<any ASCII character except "\">``),
   or an abbreviation that is defined in nearby text (for example, ``<Lu>``).
-  This notation is only used in
-  :ref:`lexical definitions <notation-lexical-vs-syntactic>`.
+
+.. _lexical-lookaheads:
+
+Some definitions also use *lookaheads*, which indicate that an element
+must (or must not) match at a given position, but without consuming any input:
+
+* ``&e``: a positive lookahead (that is, ``e`` is required to match)
+* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match)
 
 The unary operators (``*``, ``+``, ``?``) bind as tightly as possible;
 the vertical bar (``|``) binds most loosely.
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 6f3d90f89b98d3..67cc9bd8fc7bac 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -39,7 +39,8 @@ The end of a logical line is represented by the token :data:`~token.NEWLINE`.
 Statements cannot cross logical line boundaries except where :data:`!NEWLINE`
 is allowed by the syntax (e.g., between statements in compound statements).
 A logical line is constructed from one or more *physical lines* by following
-the explicit or implicit *line joining* rules.
+the :ref:`explicit <explicit-joining>` or :ref:`implicit <implicit-joining>`
+*line joining* rules.
 
 
 .. _physical-lines:
@@ -47,17 +48,28 @@ the explicit or implicit *line joining* rules.
 Physical lines
 --------------
 
-A physical line is a sequence of characters terminated by an end-of-line
-sequence.  In source files and strings, any of the standard platform line
-termination sequences can be used - the Unix form using ASCII LF (linefeed),
-the Windows form using the ASCII sequence CR LF (return followed by linefeed),
-or the old Macintosh form using the ASCII CR (return) character.  All of these
-forms can be used equally, regardless of platform. The end of input also serves
-as an implicit terminator for the final physical line.
+A physical line is a sequence of characters terminated by one the following
+end-of-line sequences:
 
-When embedding Python, source code strings should be passed to Python APIs using
-the standard C conventions for newline characters (the ``\n`` character,
-representing ASCII LF, is the line terminator).
+* the Unix form using ASCII LF (linefeed),
+* the Windows form using the ASCII sequence CR LF (return followed by linefeed),
+* the old Macintosh form using the ASCII CR (return) character.
+
+Regardless of platform, each of these sequences is replaced by a single
+ASCII LF (linefeed) character.
+(This is done even inside :ref:`string literals <strings>`.)
+Each line can use any of the sequences; they do not need to be consistent
+within a file.
+
+The end of input also serves as an implicit terminator for the final
+physical line.
+
+Formally:
+
+.. grammar-snippet::
+   :group: python-grammar
+
+   newline: <ASCII LF> | <ASCII CR> <ASCII LF> | <ASCII CR>
 
 
 .. _comments:
@@ -484,6 +496,13 @@ Literals
 
 Literals are notations for constant values of some built-in types.
 
+In terms of lexical analysis, Python has :ref:`string, bytes <strings>`
+and :ref:`numeric <numbers>` literals.
+
+Other “literals” are lexically denoted using :ref:`keywords <keywords>`
+(``None``, ``True``, ``False``) and the special
+:ref:`ellipsis token <lexical-ellipsis>` (``...``):
+
 
 .. index:: string literal, bytes literal, ASCII
    single: ' (single quote); string literal
@@ -491,7 +510,7 @@ Literals are notations for constant values of some built-in types.
 .. _strings:
 
 String and Bytes literals
--------------------------
+=========================
 
 String literals are text enclosed in single quotes (``'``) or double
 quotes (``"``). For example:
@@ -635,41 +654,26 @@ They may not be combined with ``'b'``, ``'u'``, or each other.
 
 
 String literals, except "F-strings" and "T-strings", are described by the
-following lexical definitions:
+following lexical definitions.
+
+These definitions use :ref:`negative lookaheads <lexical-lookaheads>` (``!``)
+to indicate that an ending quote ends the literal.
 
 .. grammar-snippet::
    :group: python-grammar
 
-   STRING: stringliteral | bytesliteral | fstring | tstring
-
-   stringliteral:   [`stringprefix`](`stringcontent`)
-   stringprefix:    <("r" | "u"), case-insensitive>
-   stringcontent:   `quote` `stringitem`* <matching `quote`>
-   quote:           "'" | '"' |  "'''"  | '"""'
+   STRING:          [`stringprefix`] (`stringcontent`)
+   stringprefix:    <("r" | "u" | "b" | "br" | "rb"), case-insensitive>
+   stringcontent:
+      | "'" ( !"'" `stringitem`)* "'"
+      | '"' ( !'"' `stringitem`)* '"'
+      | "'''" ( !"'''" `longstringitem`)* "'''"
+      | '"""' ( !'"""' `longstringitem`)* '"""'
    stringitem:      `stringchar` | `stringescapeseq`
-   stringchar:      <any `source_character`, except as listed below>
+   stringchar:      <any `source_character`, except backslash and newline>
+   longstringitem:  `stringitem` | newline
    stringescapeseq: "\" <any `source_character`>
 
-``stringchar`` can not include:
-
-- the backslash, ``\``;
-- in triple-quoted strings (quoted by ``'''`` or ``"""``), the newline;
-- the quote character.
-
-
-.. grammar-snippet::
-   :group: python-grammar
-
-   bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`)
-   bytesprefix: <("b" | "br" | "rb" ), case-insensitive>
-   shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"'
-   longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""'
-   shortbytesitem: `shortbyteschar` | `bytesescapeseq`
-   longbytesitem: `longbyteschar` | `bytesescapeseq`
-   shortbyteschar: <any ASCII `source_character` except "\" or newline or the quote>
-   longbyteschar: <any ASCII `source_character` except "\">
-   bytesescapeseq: "\" <any ASCII `source_character`>
-
 Note that as in all lexical definitions, whitespace is significant.
 The prefix, if any, must be followed immediately by the quoted string content.
 
@@ -692,7 +696,7 @@ The prefix, if any, must be followed immediately by the quoted string content.
 .. _escape-sequences:
 
 Escape sequences
-^^^^^^^^^^^^^^^^
+----------------
 
 Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and
 bytes literals are interpreted according to rules similar to those used by
@@ -985,7 +989,7 @@ and :meth:`str.format`, which uses a related format string mechanism.
 .. _numbers:
 
 Numeric literals
-----------------
+================
 
 .. index:: number, numeric literal, integer literal
    floating-point literal, hexadecimal literal
@@ -1241,14 +1245,26 @@ The following tokens serve as delimiters in the grammar:
 
    (       )       [       ]       {       }
    ,       :       !       .       ;       @       =
+
+The period can also occur in floating-point and imaginary literals.
+
+.. _lexical-ellipsis:
+
+A sequence of three periods has a special meaning as an
+:py:data:`Ellipsis` literal:
+
+.. code-block:: none
+
+   ...
+
+The following *augmented assignment operators* serve
+lexically as delimiters, but also perform an operation:
+
+.. code-block:: none
+
    ->      +=      -=      *=      /=      //=     %=
    @=      &=      |=      ^=      >>=     <<=     **=
 
-The period can also occur in floating-point and imaginary literals.  A sequence
-of three periods has a special meaning as an ellipsis literal. The second half
-of the list, the augmented assignment operators, serve lexically as delimiters,
-but also perform an operation.
-
 The following printing ASCII characters have special meaning as part of other
 tokens or are otherwise significant to the lexical analyzer:
 

From 86bf94b0f4cc9f9eaa63728610d7bb71fc4f3107 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 18 Jun 2025 18:05:31 +0200
Subject: [PATCH 04/17] More WIP

---
 Doc/reference/lexical_analysis.rst | 424 +++++++++++++++++------------
 1 file changed, 251 insertions(+), 173 deletions(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 67cc9bd8fc7bac..36abfa31c093c9 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -501,7 +501,7 @@ and :ref:`numeric <numbers>` literals.
 
 Other “literals” are lexically denoted using :ref:`keywords <keywords>`
 (``None``, ``True``, ``False``) and the special
-:ref:`ellipsis token <lexical-ellipsis>` (``...``):
+:ref:`ellipsis token <lexical-ellipsis>` (``...``).
 
 
 .. index:: string literal, bytes literal, ASCII
@@ -519,7 +519,7 @@ quotes (``"``). For example:
    writing, highlighted strings don't look good when there's no code
    surrounding them.
 
-.. code-block:: text
+.. code-block:: python
 
    "spam"
    'eggs'
@@ -528,7 +528,7 @@ The quote used to start the literal also terminates it, so a string literal
 can only contain the other quote (except with escape sequences, see below).
 For example:
 
-.. code-block:: text
+.. code-block:: python
 
    'Say "Hello", please.'
    "Don't do that!"
@@ -536,6 +536,21 @@ For example:
 Except for this limitation, the choice of quote character (``'`` or ``"``)
 does not affect how the literal is parsed.
 
+Inside a string literal, the backslash (``\``) character introduces an
+:dfn:`escape sequence`, which has special meaning depending on the character
+after the backslash.
+For example, ``\"`` denotes the double quote character, and does *not* end
+the string:
+
+.. code-block:: python
+
+   >>> print("Say \"Hello\" to everyone!")
+   Say "Hello" to everyone!
+
+See :ref:`escape sequences <escape-sequences>` below for a full list of such
+sequences, and more details.
+
+
 .. index:: triple-quoted string
    single: """; string literal
    single: '''; string literal
@@ -545,32 +560,20 @@ Triple-quoted strings
 
 Strings can also be enclosed in matching groups of three single or double
 quotes.
-These are generally referred to as :dfn:`triple-quoted strings`.
+These are generally referred to as :dfn:`triple-quoted strings`::
 
-In triple-quoted literals, unescaped newlines and quotes are allowed (and are
-retained), except that three unescaped quotes in a row terminate the literal.
-(Here, a *quote* is the character used to open the literal, that is,
-either ``'`` or ``"``.)
+   """This is a triple-quoted string."""
 
-For example:
+In triple-quoted literals, unescaped quotes are allowed (and are
+retained), except that three unescaped quotes in a row terminate the literal,
+if they are of the same kind (``'`` or ``"``) used at the start::
 
-.. code-block:: text
+   """This string has "quotes" inside."""
 
-   """This is a triple-quoted string with "quotes" inside."""
+Unescaped newlines are also allowed and retained::
 
-   '''Another triple-quoted string. This one continues
-   on the next line.'''
-
-Escape sequences
-----------------
-
-Inside a string literal, the backslash (``\``) character introduces an
-:dfn:`escape sequence`, which has special meaning depending on the character
-after the backslash.
-For example, ``\n`` denotes the 'newline' character, rather the two characters
-``\`` and ``n``.
-See :ref:`escape sequences <escape-sequences>` below for a full list of such
-sequences, and more details.
+   '''This triple-quoted string
+   continues on the next line.'''
 
 
 .. index::
@@ -580,70 +583,28 @@ sequences, and more details.
 String prefixes
 ---------------
 
-String literals can have an optional :dfn:`prefix` that influences how the literal
-is parsed, for example:
+String literals can have an optional :dfn:`prefix` that influences how the
+content of the literal is parsed, for example:
 
 .. code-block:: python
 
    b"data"
    f'{result=}'
 
-* ``r``: Raw string
-* ``f``: "F-string"
-* ``t``: "T-string"
-* ``b``: Byte literal
+The allowed prefixes are:
+
+* ``b``: :ref:`Bytes literal <bytes-literal>`
+* ``r``: :ref:`Raw string <raw-strings>`
+* ``f``: :ref:`Formatted string literal <f-strings>` ("f-string")
+* ``t``: :ref:`Template string literal <t-strings>` ("t-string")
 * ``u``: No effect (allowed for backwards compatibility)
 
+See the linked sections for details on each type.
+
 Prefixes are case-insensitive (for example, ``B`` works the same as ``b``).
 The ``r`` prefix can be combined with ``f``, ``t`` or ``b``, so ``fr``,
 ``rf``, ``tr``, ``rt``, ``br`` and ``rb`` are also valid prefixes.
 
-
-.. index::
-   single: b'; bytes literal
-   single: b"; bytes literal
-
-:dfn:`Bytes literals` are always prefixed with ``'b'`` or ``'B'``; they produce an
-instance of the :class:`bytes` type instead of the :class:`str` type.
-They may only contain ASCII characters; bytes with a numeric value of 128
-or greater must be expressed with escape sequences.
-Similarly, a zero byte must be expressed using an escape sequence.
-
-
-.. index::
-   single: r'; raw string literal
-   single: r"; raw string literal
-
-Both string and bytes literals may optionally be prefixed with a letter ``'r'``
-or ``'R'``; such constructs are called :dfn:`raw string literals`
-and :dfn:`raw bytes literals` respectively and treat backslashes as
-literal characters.
-As a result, in raw string literals, :ref:`escape sequences <escape-sequences>`
-escapes are not treated specially.
-
-Even in a raw literal, quotes can be escaped with a backslash, but the
-backslash remains in the result; for example, ``r"\""`` is a valid string
-literal consisting of two characters: a backslash and a double quote; ``r"\"``
-is not a valid string literal (even a raw string cannot end in an odd number of
-backslashes).  Specifically, *a raw literal cannot end in a single backslash*
-(since the backslash would escape the following quote character).  Note also
-that a single backslash followed by a newline is interpreted as those two
-characters as part of the literal, *not* as a line continuation.
-
-
-.. index::
-   single: f'; formatted string literal
-   single: f"; formatted string literal
-
-A string literal with ``'f'`` or ``'F'`` in its prefix is a
-:dfn:`formatted string literal`; see :ref:`f-strings`.
-Similarly, string literal with ``'t'`` or ``'T'`` in its prefix is a
-:dfn:`template string literal`; see :ref:`t-strings`.
-
-The ``'f'`` or ``t`` may be combined with ``'r'`` to create a
-:dfn:`raw formatted string` or :dfn:`raw template string`.
-They may not be combined with ``'b'``, ``'u'``, or each other.
-
 .. versionadded:: 3.3
    The ``'rb'`` prefix of raw bytes literals has been added as a synonym
    of ``'br'``.
@@ -653,7 +614,11 @@ They may not be combined with ``'b'``, ``'u'``, or each other.
    See :pep:`414` for more information.
 
 
-String literals, except "F-strings" and "T-strings", are described by the
+Formal grammar
+--------------
+
+String literals, except :ref:`"F-strings" <f-strings>` and
+:ref:`"T-strings" <t-strings>`, are described by the
 following lexical definitions.
 
 These definitions use :ref:`negative lookaheads <lexical-lookaheads>` (``!``)
@@ -675,23 +640,8 @@ to indicate that an ending quote ends the literal.
    stringescapeseq: "\" <any `source_character`>
 
 Note that as in all lexical definitions, whitespace is significant.
-The prefix, if any, must be followed immediately by the quoted string content.
-
-
-.. index:: physical line, escape sequence, Standard C, C
-   single: \ (backslash); escape sequence
-   single: \\; escape sequence
-   single: \a; escape sequence
-   single: \b; escape sequence
-   single: \f; escape sequence
-   single: \n; escape sequence
-   single: \r; escape sequence
-   single: \t; escape sequence
-   single: \v; escape sequence
-   single: \x; escape sequence
-   single: \N; escape sequence
-   single: \u; escape sequence
-   single: \U; escape sequence
+In particular, the prefix (if any) must be immediately followed by the starting
+quote.
 
 .. _escape-sequences:
 
@@ -702,55 +652,50 @@ Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and
 bytes literals are interpreted according to rules similar to those used by
 Standard C.  The recognized escape sequences are:
 
-+-------------------------+---------------------------------+-------+
-| Escape Sequence         | Meaning                         | Notes |
-+=========================+=================================+=======+
-| ``\``\ <newline>        | Backslash and newline ignored   | \(1)  |
-+-------------------------+---------------------------------+-------+
-| ``\\``                  | Backslash (``\``)               |       |
-+-------------------------+---------------------------------+-------+
-| ``\'``                  | Single quote (``'``)            |       |
-+-------------------------+---------------------------------+-------+
-| ``\"``                  | Double quote (``"``)            |       |
-+-------------------------+---------------------------------+-------+
-| ``\a``                  | ASCII Bell (BEL)                |       |
-+-------------------------+---------------------------------+-------+
-| ``\b``                  | ASCII Backspace (BS)            |       |
-+-------------------------+---------------------------------+-------+
-| ``\f``                  | ASCII Formfeed (FF)             |       |
-+-------------------------+---------------------------------+-------+
-| ``\n``                  | ASCII Linefeed (LF)             |       |
-+-------------------------+---------------------------------+-------+
-| ``\r``                  | ASCII Carriage Return (CR)      |       |
-+-------------------------+---------------------------------+-------+
-| ``\t``                  | ASCII Horizontal Tab (TAB)      |       |
-+-------------------------+---------------------------------+-------+
-| ``\v``                  | ASCII Vertical Tab (VT)         |       |
-+-------------------------+---------------------------------+-------+
-| :samp:`\\\\{ooo}`       | Character with octal value      | (2,4) |
-|                         | *ooo*                           |       |
-+-------------------------+---------------------------------+-------+
-| :samp:`\\x{hh}`         | Character with hex value *hh*   | (3,4) |
-+-------------------------+---------------------------------+-------+
-
-Escape sequences only recognized in string literals are:
-
-+-------------------------+---------------------------------+-------+
-| Escape Sequence         | Meaning                         | Notes |
-+=========================+=================================+=======+
-| :samp:`\\N\\{{name}\\}` | Character named *name* in the   | \(5)  |
-|                         | Unicode database                |       |
-+-------------------------+---------------------------------+-------+
-| :samp:`\\u{xxxx}`       | Character with 16-bit hex value | \(6)  |
-|                         | *xxxx*                          |       |
-+-------------------------+---------------------------------+-------+
-| :samp:`\\U{xxxxxxxx}`   | Character with 32-bit hex value | \(7)  |
-|                         | *xxxxxxxx*                      |       |
-+-------------------------+---------------------------------+-------+
-
-Notes:
-
-(1)
+.. list-table::
+   :widths: auto
+   :header-rows: 1
+
+   * * Escape Sequence
+     * Meaning
+   * * ``\``\ <newline>
+     * :ref:`string-escape-ignore`
+   * * ``\\``
+     * :ref:`Backslash <string-escape-escaped-char>`
+   * * ``\'``
+     * :ref:`Single quote <string-escape-escaped-char>`
+   * * ``\"``
+     * :ref:`Double quote <string-escape-escaped-char>`
+   * * ``\a``
+     * ASCII Bell (BEL)
+   * * ``\b``
+     * ASCII Backspace (BS)
+   * * ``\f``
+     * ASCII Formfeed (FF)
+   * * ``\n``
+     * ASCII Linefeed (LF)
+   * * ``\r``
+     * ASCII Carriage Return (CR)
+   * * ``\t``
+     * ASCII Horizontal Tab (TAB)
+   * * ``\v``
+     * ASCII Vertical Tab (VT)
+   * * :samp:`\\\\{ooo}`
+     * :ref:`string-escape-oct`
+   * * :samp:`\\x{hh}`
+     * :ref:`string-escape-hex`
+   * * :samp:`\\N\\{{name}\\}`
+     * :ref:`string-escape-named`
+   * * :samp:`\\u{xxxx}`
+     * :ref:`Hexadecimal Unicode character <string-escape-long-hex>`
+   * * :samp:`\\U{xxxxxxxx}`
+     * :ref:`Hexadecimal Unicode character <string-escape-long-hex>`
+
+.. _string-escape-ignore:
+
+Ignored end of line
+^^^^^^^^^^^^^^^^^^^
+
    A backslash can be added at the end of a line to ignore the newline::
 
       >>> 'This string will not include \
@@ -760,9 +705,39 @@ Notes:
    The same result can be achieved using :ref:`triple-quoted strings <strings>`,
    or parentheses and :ref:`string literal concatenation <string-concatenation>`.
 
+.. _string-escape-escaped-char:
+
+Escaped characters
+^^^^^^^^^^^^^^^^^^
 
-(2)
-   As in Standard C, up to three octal digits (0 through 7) are accepted.
+  To include a backslash in a non-:ref:`raw <raw-strings>` Python string
+  literal, it must be doubled. The ``\\`` escape sequence denotes a single
+  backslash character::
+
+      >>> print('C:\\Program Files')
+      C:\Program Files
+
+  Similarly, the ``\'`` and ``\"`` sequences denote the single and double
+  quote character, respectively::
+
+      >>> print('\' and \"')
+      ' and "
+
+.. _string-escape-oct:
+
+Octal character
+^^^^^^^^^^^^^^^
+
+  The sequence :samp:`\\\\{ooo}` denotes a *character* with the octal (base 8)
+  value *ooo*::
+
+     >>> '\120'
+     'P'
+
+  Up to three octal digits (0 through 7) are accepted.
+
+  In a bytes literal, *character* means a *byte* with the given value.
+  In a string literal, it means a Unicode character with the given value.
 
    .. versionchanged:: 3.11
       Octal escapes with value larger than ``0o377`` (255) produce a
@@ -770,42 +745,147 @@ Notes:
 
    .. versionchanged:: 3.12
       Octal escapes with value larger than ``0o377`` (255) produce a
-      :exc:`SyntaxWarning`. In a future Python version they will be eventually
-      a :exc:`SyntaxError`.
+      :exc:`SyntaxWarning`.
+      In a future Python version they will raise a :exc:`SyntaxError`.
+
+.. _string-escape-hex:
+
+Hexadecimal character
+^^^^^^^^^^^^^^^^^^^^^
+
+  The sequence :samp:`\\x{hh}` denotes a *character* with the hex (base 16)
+  value *hh*::
+
+     >>> '\x50'
+     'P'
+
+  Unlike in Standard C, exactly two hex digits are required.
+
+  In a bytes literal, *character* means a *byte* with the given value.
+  In a string literal, it means a Unicode character with the given value.
+
+.. _string-escape-named:
+
+Named Unicode character
+^^^^^^^^^^^^^^^^^^^^^^^
+
+  The sequence :samp:`\\N\\{{name}\\}` denotes a Unicode character
+  with the given *name*::
+
+     >>> '\N{LATIN CAPITAL LETTER P}'
+     'P'
+     >>> '\N{SNAKE}'
+     '🐍'
+
+  This sequence cannot appear in :ref:`bytes literals <bytes-literal>`.
+
+  .. versionchanged:: 3.3
+      Support for `name aliases <https://www.unicode.org/Public/16.0.0/ucd/NameAliases.txt>`__
+      has been added.
 
-(3)
-   Unlike in Standard C, exactly two hex digits are required.
+.. _string-escape-long-hex:
 
-(4)
-   In a bytes literal, hexadecimal and octal escapes denote the byte with the
-   given value. In a string literal, these escapes denote a Unicode character
-   with the given value.
+Hexadecimal Unicode characters
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-(5)
-   .. versionchanged:: 3.3
-      Support for name aliases [#]_ has been added.
+  These sequences :samp:`\\u{xxxx}` and :samp:`\\U{xxxxxxxx}` denote the
+  Unicode character with the given hex (base 16) value.
+  Exactly four digits are required for ``\u``; exactly eight digits are
+  required for ``\U``.
+  The latter can encode any Unicode character.
 
-(6)
-   Exactly four hex digits are required.
+  .. code-block:: python
 
-(7)
-   Any Unicode character can be encoded this way.  Exactly eight hex digits
-   are required.
+      >>> '\u1234'
+      'ሴ'
+      >>> '\U0001f40d'
+      '🐍'
+
+  These sequences cannot appear in :ref:`bytes literals <bytes-literal>`.
 
 
 .. index:: unrecognized escape sequence
 
-Unlike Standard C, all unrecognized escape sequences are left in the string
-unchanged, i.e., *the backslash is left in the result*.
+Unrecognized escape sequences
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Unlike in Standard C, all unrecognized escape sequences are left in the string
+unchanged, that is, *the backslash is left in the result*::
+
+   >>> print('\q')
+   \q
+   >>> list('\q')
+   ['\\', 'q']
+
 Note that for bytes literals, the escape sequences only recognized in string
-literals fall into the category of unrecognized escapes.
+literals (``\N...``, ``\u...``, ``\U...``) fall into the category of
+unrecognized escapes.
 
 .. versionchanged:: 3.6
    Unrecognized escape sequences produce a :exc:`DeprecationWarning`.
 
 .. versionchanged:: 3.12
-   Unrecognized escape sequences produce a :exc:`SyntaxWarning`. In a future
-   Python version they will be eventually a :exc:`SyntaxError`.
+   Unrecognized escape sequences produce a :exc:`SyntaxWarning`.
+   In a future Python version they will raise a :exc:`SyntaxError`.
+
+
+.. index::
+   single: b'; bytes literal
+   single: b"; bytes literal
+
+
+.. _bytes-literal:
+
+Bytes literals
+--------------
+
+:dfn:`Bytes literals` are always prefixed with ``'b'`` or ``'B'``; they produce an
+instance of the :class:`bytes` type instead of the :class:`str` type.
+They may only contain ASCII characters; bytes with a numeric value of 128
+or greater must be expressed with escape sequences.
+Similarly, a zero byte must be expressed using an escape sequence.
+
+
+.. index::
+   single: r'; raw string literal
+   single: r"; raw string literal
+
+.. _raw-strings:
+
+Raw string literals
+-------------------
+
+Both string and bytes literals may optionally be prefixed with a letter ``'r'``
+or ``'R'``; such constructs are called :dfn:`raw string literals`
+and :dfn:`raw bytes literals` respectively and treat backslashes as
+literal characters.
+As a result, in raw string literals, :ref:`escape sequences <escape-sequences>`
+escapes are not treated specially.
+
+Even in a raw literal, quotes can be escaped with a backslash, but the
+backslash remains in the result; for example, ``r"\""`` is a valid string
+literal consisting of two characters: a backslash and a double quote; ``r"\"``
+is not a valid string literal (even a raw string cannot end in an odd number of
+backslashes).  Specifically, *a raw literal cannot end in a single backslash*
+(since the backslash would escape the following quote character).  Note also
+that a single backslash followed by a newline is interpreted as those two
+characters as part of the literal, *not* as a line continuation.
+
+
+.. index:: physical line, escape sequence, Standard C, C
+   single: \ (backslash); escape sequence
+   single: \\; escape sequence
+   single: \a; escape sequence
+   single: \b; escape sequence
+   single: \f; escape sequence
+   single: \n; escape sequence
+   single: \r; escape sequence
+   single: \t; escape sequence
+   single: \v; escape sequence
+   single: \x; escape sequence
+   single: \N; escape sequence
+   single: \u; escape sequence
+   single: \U; escape sequence
 
 
 .. index::
@@ -815,6 +895,8 @@ literals fall into the category of unrecognized escapes.
    single: string; interpolated literal
    single: f-string
    single: fstring
+   single: f'; formatted string literal
+   single: f"; formatted string literal
    single: {} (curly brackets); in formatted string literal
    single: ! (exclamation); in formatted string literal
    single: : (colon); in formatted string literal
@@ -1022,7 +1104,7 @@ actually an expression composed of the unary operator '``-``' and the literal
 .. _integers:
 
 Integer literals
-^^^^^^^^^^^^^^^^
+----------------
 
 Integer literals denote whole numbers. For example::
 
@@ -1095,7 +1177,7 @@ Formally, integer literals are described by the following lexical definitions:
 .. _floating:
 
 Floating-point literals
-^^^^^^^^^^^^^^^^^^^^^^^
+-----------------------
 
 Floating-point (float) literals, such as ``3.14`` or ``1.5``, denote
 :ref:`approximations of real numbers <datamodel-float>`.
@@ -1157,7 +1239,7 @@ lexical definitions:
 .. _imaginary:
 
 Imaginary literals
-^^^^^^^^^^^^^^^^^^
+------------------
 
 Python has :ref:`complex number <typesnumeric>` objects, but no complex
 literals.
@@ -1279,7 +1361,3 @@ occurrence outside string literals and comments is an unconditional error:
 
    $       ?       `
 
-
-.. rubric:: Footnotes
-
-.. [#] https://www.unicode.org/Public/16.0.0/ucd/NameAliases.txt

From faf05a192ed7ec80ab26e803544ce9585b59d583 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 25 Jun 2025 16:26:37 +0200
Subject: [PATCH 05/17] Byte strings, raw strings; f-string stub

---
 Doc/reference/lexical_analysis.rst | 65 +++++++++++++++++++++---------
 1 file changed, 46 insertions(+), 19 deletions(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 36abfa31c093c9..2c6ae9a16d0d08 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -643,6 +643,21 @@ Note that as in all lexical definitions, whitespace is significant.
 In particular, the prefix (if any) must be immediately followed by the starting
 quote.
 
+.. index:: physical line, escape sequence, Standard C, C
+   single: \ (backslash); escape sequence
+   single: \\; escape sequence
+   single: \a; escape sequence
+   single: \b; escape sequence
+   single: \f; escape sequence
+   single: \n; escape sequence
+   single: \r; escape sequence
+   single: \t; escape sequence
+   single: \v; escape sequence
+   single: \x; escape sequence
+   single: \N; escape sequence
+   single: \u; escape sequence
+   single: \U; escape sequence
+
 .. _escape-sequences:
 
 Escape sequences
@@ -842,8 +857,18 @@ Bytes literals
 :dfn:`Bytes literals` are always prefixed with ``'b'`` or ``'B'``; they produce an
 instance of the :class:`bytes` type instead of the :class:`str` type.
 They may only contain ASCII characters; bytes with a numeric value of 128
-or greater must be expressed with escape sequences.
-Similarly, a zero byte must be expressed using an escape sequence.
+or greater must be expressed with escape sequences (typically
+:ref:`string-escape-hex` or :ref:`string-escape-oct`):
+
+.. code-block:: python
+
+   >>> b'\x89PNG\r\n\x1a\n'
+   b'\x89PNG\r\n\x1a\n'
+   >>> list(b'\x89PNG\r\n\x1a\n')
+   [137, 80, 78, 71, 13, 10, 26, 10]
+
+Similarly, a zero byte must be expressed using an escape sequence (typically
+``\0`` or ``\x00``).
 
 
 .. index::
@@ -860,7 +885,12 @@ or ``'R'``; such constructs are called :dfn:`raw string literals`
 and :dfn:`raw bytes literals` respectively and treat backslashes as
 literal characters.
 As a result, in raw string literals, :ref:`escape sequences <escape-sequences>`
-escapes are not treated specially.
+are not treated specially:
+
+.. code-block:: python
+
+   >>> r'\d{4}-\d{2}-\d{2}'
+   '\\d{4}-\\d{2}-\\d{2}'
 
 Even in a raw literal, quotes can be escaped with a backslash, but the
 backslash remains in the result; for example, ``r"\""`` is a valid string
@@ -872,22 +902,6 @@ that a single backslash followed by a newline is interpreted as those two
 characters as part of the literal, *not* as a line continuation.
 
 
-.. index:: physical line, escape sequence, Standard C, C
-   single: \ (backslash); escape sequence
-   single: \\; escape sequence
-   single: \a; escape sequence
-   single: \b; escape sequence
-   single: \f; escape sequence
-   single: \n; escape sequence
-   single: \r; escape sequence
-   single: \t; escape sequence
-   single: \v; escape sequence
-   single: \x; escape sequence
-   single: \N; escape sequence
-   single: \u; escape sequence
-   single: \U; escape sequence
-
-
 .. index::
    single: formatted string literal
    single: interpolated string literal
@@ -1067,6 +1081,19 @@ include expressions.
 See also :pep:`498` for the proposal that added formatted string literals,
 and :meth:`str.format`, which uses a related format string mechanism.
 
+.. _t-strings:
+.. _template-string-literals:
+
+t-strings
+---------
+
+A :dfn:`template string literal` or :dfn:`t-string` is a string literal that
+is prefixed with ``'t'`` or ``'T'``.
+These strings have internal structure similar to :ref:`f-strings`,
+but are evaluated as Template objects instead of strings.
+
+.. versionadded:: 3.14
+
 
 .. _numbers:
 

From 687fe5830318ca89a5541703bae3e62b3c8a7b5e Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 25 Jun 2025 16:38:09 +0200
Subject: [PATCH 06/17] Remove outdated comment

---
 Doc/reference/lexical_analysis.rst | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 2c6ae9a16d0d08..e3d0bab8942ced 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -515,10 +515,6 @@ String and Bytes literals
 String literals are text enclosed in single quotes (``'``) or double
 quotes (``"``). For example:
 
-.. This is Python code, but we turn off highlighting because as of this
-   writing, highlighted strings don't look good when there's no code
-   surrounding them.
-
 .. code-block:: python
 
    "spam"

From 9f9d29ccab8a5c25aa9433a90bd03d2a5521c36b Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 25 Jun 2025 16:50:18 +0200
Subject: [PATCH 07/17] Fix ReST errors

---
 Doc/reference/expressions.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst
index 743d43b1c9c1b1..c1f046388c3d1b 100644
--- a/Doc/reference/expressions.rst
+++ b/Doc/reference/expressions.rst
@@ -160,7 +160,7 @@ value.
 .. _string-concatenation:
 
 String literal concatenation
-............................
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
 Multiple adjacent string or bytes literals (delimited by whitespace), possibly
 using different quoting conventions, are allowed, and their meaning is the same
@@ -172,7 +172,7 @@ Formally:
 .. grammar-snippet::
    :group: python-grammar
 
-   strings: ( `STRING` | `fstring` | `tstring`)+
+   strings: ( `STRING` | fstring | tstring)+
 
 Note that this feature is defined at the syntactical level, so it only works
 with literals.

From 1e0c84a0357207348d66e16dc51590cf6169dcd9 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 25 Jun 2025 18:04:08 +0200
Subject: [PATCH 08/17] TMP

---
 Doc/reference/lexical_analysis.rst | 72 ++++++++++++++++++++++++------
 1 file changed, 58 insertions(+), 14 deletions(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index e3d0bab8942ced..a9aeee965ad257 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -277,7 +277,15 @@ Whitespace between tokens
 
 Except at the beginning of a logical line or in string literals, the whitespace
 characters space, tab and formfeed can be used interchangeably to separate
-tokens.  Whitespace is needed between two tokens only if their concatenation
+tokens:
+
+.. grammar-snippet::
+   :group: python-grammar
+
+   whitespace:  ' ' | tab | formfeed
+
+
+Whitespace is needed between two tokens only if their concatenation
 could otherwise be interpreted as a different token. For example, ``ab`` is one
 token, but ``a b`` is two tokens. However, ``+a`` and ``+ a`` both produce
 two tokens, ``+`` and ``a``, as ``+a`` is not a valid token.
@@ -921,24 +929,60 @@ f-strings
 .. versionadded:: 3.6
 
 A :dfn:`formatted string literal` or :dfn:`f-string` is a string literal
-that is prefixed with ``'f'`` or ``'F'``.  These strings may contain
-replacement fields, which are expressions delimited by curly braces ``{}``.
-While other string literals always have a constant value, formatted strings
-are really expressions evaluated at run time.
+that is prefixed with ``'f'`` or ``'F'``.
+Unlike other string literals, f-strings do not have a constant value.
+They may contain *replacement fields*, which are expressions delimited by
+curly braces ``{}``, which are evaluated at run time.
+For example::
+
+   >>> f'One plus one is {1 + 1}.'
+   'One plus one is 2.'
+
 
 Escape sequences are decoded like in ordinary string literals (except when
 a literal is also marked as a raw string).  After decoding, the grammar
 for the contents of the string is:
 
-.. productionlist:: python-grammar
-   f_string: (`literal_char` | "{{" | "}}" | `replacement_field`)*
-   replacement_field: "{" `f_expression` ["="] ["!" `conversion`] [":" `format_spec`] "}"
-   f_expression: (`conditional_expression` | "*" `or_expr`)
-               :   ("," `conditional_expression` | "," "*" `or_expr`)* [","]
-               : | `yield_expression`
-   conversion: "s" | "r" | "a"
-   format_spec: (`literal_char` | `replacement_field`)*
-   literal_char: <any code point except "{", "}" or NULL>
+.. grammar-snippet:: python-grammar
+   :group: python-grammar
+
+   FSTRING_START:      `fstringprefix` ("'" | '"' | "'''" | '"""')
+   FSTRING_MIDDLE:
+      | <any `source_character`, except backslash, newline, '{' and '}'>
+      | `stringescapeseq`
+      | "{{"
+      | "}}"
+      | <newline, in triple-quoted f-strings only>
+   FSTRING_END:        ("'" | '"' | "'''" | '"""')
+   fstringprefix:      <("f" | "fr" | "rf"), case-insensitive>
+   f_debug_specifier:  whitespace* '=' whitespace*
+
+.. grammar-snippet:: python-grammar
+   :group: python-grammar
+
+   fstring:    `FSTRING_START` `fstring_middle`* `FSTRING_END`
+   fstring_middle:
+      | `fstring_replacement_field`
+      | `FSTRING_MIDDLE`
+   fstring_replacement_field:
+      | '{' `f_expression` [`f_debug_specifier`] [`fstring_conversion`]
+            [`fstring_full_format_spec`] '}'
+   fstring_conversion:
+      | "!" ("s" | "r" | "a")
+   fstring_full_format_spec:
+      | ':' `fstring_format_spec`*
+   fstring_format_spec:
+      | `FSTRING_MIDDLE`
+      | `fstring_replacement_field`
+   f_expression:
+      | ','.(`conditional_expression` | "*" `or_expr`)+ [","]
+      | `yield_expression`
+
+
+---------------
+
+
+
 
 The parts of the string outside curly braces are treated literally,
 except that any doubled curly braces ``'{{'`` or ``'}}'`` are replaced

From e7b57b582b8296b12b5e8a86f008b58b5a00590d Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 2 Jul 2025 16:56:27 +0200
Subject: [PATCH 09/17] Continue with f-strings

---
 Doc/library/stdtypes.rst           |  2 +
 Doc/reference/lexical_analysis.rst | 90 ++++++++++++++++++------------
 2 files changed, 56 insertions(+), 36 deletions(-)

diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
index 394c302fd354b9..6976838eceb03e 100644
--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -2526,6 +2526,8 @@ expression support in the :mod:`re` module).
    single: : (colon); in formatted string literal
    single: = (equals); for help in debugging using string literals
 
+.. _stdtypes-fstrings:
+
 Formatted String Literals (f-strings)
 -------------------------------------
 
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index a9aeee965ad257..82b0f711afd071 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -931,17 +931,62 @@ f-strings
 A :dfn:`formatted string literal` or :dfn:`f-string` is a string literal
 that is prefixed with ``'f'`` or ``'F'``.
 Unlike other string literals, f-strings do not have a constant value.
-They may contain *replacement fields*, which are expressions delimited by
-curly braces ``{}``, which are evaluated at run time.
+They may contain *replacement fields* delimited by curly braces ``{}``.
+Replacement fields contain expressions which are evaluated at run time.
 For example::
 
    >>> f'One plus one is {1 + 1}.'
    'One plus one is 2.'
 
+The parts of the string outside curly braces are treated literally,
+except that any doubled curly braces ``'{{'`` or ``'}}'`` are replaced
+with the corresponding single curly brace::
+
+   >>> print(f'{{...}}')
+   {...}
 
 Escape sequences are decoded like in ordinary string literals (except when
-a literal is also marked as a raw string).  After decoding, the grammar
-for the contents of the string is:
+a literal is also marked as a raw string)::
+
+   >>> name = 'Galahad'
+   >>> favorite_color = 'blue'
+   >>> print(f'{name}:\t{favorite_color}')
+   Galahad:       blue
+   >>> print(rf'C:\Users\{name}')
+   C:\Users\Galahad
+
+In addition to the expression, replacement fields may contain:
+
+* a *debug specifier* -- an equal sign (``=``);
+* a *conversion specifier* -- ``!s``, ``!r`` or ``!a``; and/or
+* a *format specifier* prefixed with a colon (``:``).
+
+See :ref:`stdtypes-fstrings` for how these specifiers are interpreted.
+
+Note that whitespace on both sides of a debug specifier (``=``) is
+significant --- it is retained in the result::
+
+   >>> print(f'{name=}')
+   name='Galahad'
+   >>> print(f'{name = }')
+   name = 'Galahad'
+
+Expressions in formatted string literals are treated like regular
+Python expressions surrounded by parentheses, with a few exceptions.
+An empty expression is not allowed, and both :keyword:`lambda`  and
+assignment expressions ``:=`` must be surrounded by explicit parentheses.
+Each expression is evaluated in the context where the formatted string literal
+appears, in order from left to right.  Replacement expressions can contain
+newlines in both single-quoted and triple-quoted f-strings and they can contain
+comments.  Everything that comes after a ``#`` inside a replacement field
+is a comment (even closing braces and quotes). In that case, replacement fields
+must be closed in a different line.
+
+.. code-block:: text
+
+   >>> f"abc{a # This is a comment }"
+   ... + 3}"
+   'abc5'
 
 .. grammar-snippet:: python-grammar
    :group: python-grammar
@@ -979,38 +1024,6 @@ for the contents of the string is:
       | `yield_expression`
 
 
----------------
-
-
-
-
-The parts of the string outside curly braces are treated literally,
-except that any doubled curly braces ``'{{'`` or ``'}}'`` are replaced
-with the corresponding single curly brace.  A single opening curly
-bracket ``'{'`` marks a replacement field, which starts with a
-Python expression. To display both the expression text and its value after
-evaluation, (useful in debugging), an equal sign ``'='`` may be added after the
-expression. A conversion field, introduced by an exclamation point ``'!'`` may
-follow.  A format specifier may also be appended, introduced by a colon ``':'``.
-A replacement field ends with a closing curly bracket ``'}'``.
-
-Expressions in formatted string literals are treated like regular
-Python expressions surrounded by parentheses, with a few exceptions.
-An empty expression is not allowed, and both :keyword:`lambda`  and
-assignment expressions ``:=`` must be surrounded by explicit parentheses.
-Each expression is evaluated in the context where the formatted string literal
-appears, in order from left to right.  Replacement expressions can contain
-newlines in both single-quoted and triple-quoted f-strings and they can contain
-comments.  Everything that comes after a ``#`` inside a replacement field
-is a comment (even closing braces and quotes). In that case, replacement fields
-must be closed in a different line.
-
-.. code-block:: text
-
-   >>> f"abc{a # This is a comment }"
-   ... + 3}"
-   'abc5'
-
 .. versionchanged:: 3.7
    Prior to Python 3.7, an :keyword:`await` expression and comprehensions
    containing an :keyword:`async for` clause were illegal in the expressions
@@ -1020,6 +1033,11 @@ must be closed in a different line.
    Prior to Python 3.12, comments were not allowed inside f-string replacement
    fields.
 
+---------------
+
+
+
+
 When the equal sign ``'='`` is provided, the output will have the expression
 text, the ``'='`` and the evaluated value. Spaces after the opening brace
 ``'{'``, within the expression and after the ``'='`` are all retained in the

From d593940fb290105481b5b8dfc01170c08d455e82 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 9 Jul 2025 17:38:34 +0200
Subject: [PATCH 10/17] Work on the f-string semantics

---
 Doc/reference/lexical_analysis.rst | 132 +++++++++++++++++++++++------
 1 file changed, 106 insertions(+), 26 deletions(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 82b0f711afd071..954ebbed9ba1be 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -935,58 +935,138 @@ They may contain *replacement fields* delimited by curly braces ``{}``.
 Replacement fields contain expressions which are evaluated at run time.
 For example::
 
-   >>> f'One plus one is {1 + 1}.'
-   'One plus one is 2.'
+   >>> who = 'nobody'
+   >>> nationality = 'Spanish'
+   >>> f'{who.title()} expects the {nationality} Inquisition!'
+   'Nobody expects the Spanish Inquisition!'
 
-The parts of the string outside curly braces are treated literally,
-except that any doubled curly braces ``'{{'`` or ``'}}'`` are replaced
-with the corresponding single curly brace::
+Any doubled curly braces (``{{`` or ``}}``) outside replacement fields
+are replaced with the corresponding single curly brace::
 
    >>> print(f'{{...}}')
    {...}
 
-Escape sequences are decoded like in ordinary string literals (except when
-a literal is also marked as a raw string)::
+Other characters outside replacement fields are treated like in ordinary
+string literals.
+This means that escape sequences are decoded (except when a literal is
+also marked as a raw string), and newlines are possible in triple-quoted
+f-strings::
 
    >>> name = 'Galahad'
    >>> favorite_color = 'blue'
    >>> print(f'{name}:\t{favorite_color}')
    Galahad:       blue
-   >>> print(rf'C:\Users\{name}')
+   >>> print(rf"C:\Users\{name}")
    C:\Users\Galahad
+   >>> print(f'''Three shall be the number of the counting
+   ... and the number of the counting shall be three.''')
+   Three shall be the number of the counting
+   and the number of the counting shall be three.
 
-In addition to the expression, replacement fields may contain:
+Expressions in formatted string literals are treated like regular
+Python expressions.
+Each expression is evaluated in the context where the formatted string literal
+appears, in order from left to right.
+An empty expression is not allowed, and both :keyword:`lambda` and
+assignment expressions ``:=`` must be surrounded by explicit parentheses::
+
+   >>> f'{(half := 1/2)}, {half * 42}'
+   '0.5, 21.0'
+
+Replacement expressions can contain newlines in both single-quoted and
+triple-quoted f-strings and they can contain comments.
+Everything that comes after a ``#`` inside a replacement field
+is a comment (even closing braces and quotes).
+This means that replacement fields with comments must be closed in a
+different line:
+
+.. code-block:: text
+
+   >>> a = 2
+   >>> f"abc{a  # This comment  }"  continues until the end of the line
+   ...       + 3}"
+   'abc5'
+
+After the expression, replacement fields may optionally contain:
 
 * a *debug specifier* -- an equal sign (``=``);
 * a *conversion specifier* -- ``!s``, ``!r`` or ``!a``; and/or
 * a *format specifier* prefixed with a colon (``:``).
 
-See :ref:`stdtypes-fstrings` for how these specifiers are interpreted.
+Debug specifier
+^^^^^^^^^^^^^^^
 
-Note that whitespace on both sides of a debug specifier (``=``) is
-significant --- it is retained in the result::
+If a debug specifier -- an equal sign (``=``) -- appears after the replacement
+field expression, the resulting f-string will contain the expression's source,
+the equal sign, and the value of the expression.
+This is often useful for debugging::
 
    >>> print(f'{name=}')
    name='Galahad'
+
+Whitespace on both sides of the equal sign is significant --- it is retained
+in the result::
+
    >>> print(f'{name = }')
    name = 'Galahad'
 
-Expressions in formatted string literals are treated like regular
-Python expressions surrounded by parentheses, with a few exceptions.
-An empty expression is not allowed, and both :keyword:`lambda`  and
-assignment expressions ``:=`` must be surrounded by explicit parentheses.
-Each expression is evaluated in the context where the formatted string literal
-appears, in order from left to right.  Replacement expressions can contain
-newlines in both single-quoted and triple-quoted f-strings and they can contain
-comments.  Everything that comes after a ``#`` inside a replacement field
-is a comment (even closing braces and quotes). In that case, replacement fields
-must be closed in a different line.
 
-.. code-block:: text
+Conversion specifier
+^^^^^^^^^^^^^^^^^^^^
 
-   >>> f"abc{a # This is a comment }"
-   ... + 3}"
-   'abc5'
+By default, the value of a replacement field expression is converted to
+string using :func:`str`::
+
+   >>> from fractions import Fraction
+   >>> one_third = Fraction(1, 3)
+   >>> f'{one_third}'
+   '1/3'
+
+When a debug specifier but no format specifier is used, the default conversion
+instead uses :func:`repr`::
+
+   >>> f'{one_third = }'
+   'one_third = Fraction(1, 3)'
+
+The conversion can be specified explicitly using one of these specifiers:
+
+* ``!s`` for :func:`str`
+* ``!r`` for :func:`repr`
+* ``!a`` for :func:`ascii`
+
+For example::
+
+   >>> f'{one_third!r} is {one_third!s}'
+   'Fraction(1, 3) is 1/3'
+
+   >>> string = "¡kočka 😸!"
+   >>> f'{string = !a}'
+   "string = '\\xa1ko\\u010dka \\U0001f638!'"
+
+
+Format specifier
+^^^^^^^^^^^^^^^^
+
+After the expression has been evaluated, and possibly converted using an
+explicit conversion specifier, it is formatted using the :func:`format` function.
+If the replacement field includes a *format specifier*, an arbitrary string
+introduced by a colon (``:``), the specifier is passed to :func:`!format`
+as the second argument.
+The result of :func:`!format` is then used as the final value for the
+replacement field. For example::
+
+   >>> f'{one_third:.6f}'
+   '0.333333'
+   >>> f'{one_third:_^+10}'
+   '___+1/3___'
+   >>> >>> f'{one_third!r:_^20}'
+   '___Fraction(1, 3)___'
+   >>> f'{one_third = :~>10}~'
+   'one_third = ~~~~~~~1/3~'
+
+
+Formal grammar
+^^^^^^^^^^^^^^
 
 .. grammar-snippet:: python-grammar
    :group: python-grammar

From 0d8a91789283892d6b8248034cf9e2f394f3b1a8 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 9 Jul 2025 18:08:57 +0200
Subject: [PATCH 11/17] Work on f-strings

---
 Doc/library/stdtypes.rst           | 149 ++++++++++++-----------------
 Doc/reference/lexical_analysis.rst | 132 ++++++++-----------------
 2 files changed, 101 insertions(+), 180 deletions(-)

diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
index 6976838eceb03e..59fbb07ccf512c 100644
--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -2536,123 +2536,98 @@ Formatted String Literals (f-strings)
    The :keyword:`await` and :keyword:`async for` can be used in expressions
    within f-strings.
 .. versionchanged:: 3.8
-   Added the debugging operator (``=``)
+   Added the debug specifier (``=``)
 .. versionchanged:: 3.12
    Many restrictions on expressions within f-strings have been removed.
    Notably, nested strings, comments, and backslashes are now permitted.
 
 An :dfn:`f-string` (formally a :dfn:`formatted string literal`) is
 a string literal that is prefixed with ``f`` or ``F``.
-This type of string literal allows embedding arbitrary Python expressions
-within *replacement fields*, which are delimited by curly brackets (``{}``).
-These expressions are evaluated at runtime, similarly to :meth:`str.format`,
-and are converted into regular :class:`str` objects.
-For example:
-
-.. doctest::
-
-   >>> who = 'nobody'
-   >>> nationality = 'Spanish'
-   >>> f'{who.title()} expects the {nationality} Inquisition!'
-   'Nobody expects the Spanish Inquisition!'
-
-It is also possible to use a multi line f-string:
-
-.. doctest::
-
-   >>> f'''This is a string
-   ... on two lines'''
-   'This is a string\non two lines'
+This type of string literal allows embedding the results of arbitrary Python
+expressions within *replacement fields*, which are delimited by curly
+brackets (``{}``).
+Each replacement field must contain an expression, optionally followed by:
 
-A single opening curly bracket, ``'{'``, marks a *replacement field* that
-can contain any Python expression:
+* a *debug specifier* -- an equal sign (``=``);
+* a *conversion specifier* -- ``!s``, ``!r`` or ``!a``; and/or
+* a *format specifier* prefixed with a colon (``:``).
 
-.. doctest::
+See the :ref:`Lexical Analysis section on f-strings <f-strings>` for details
+on the syntax of these fields.
 
-   >>> nationality = 'Spanish'
-   >>> f'The {nationality} Inquisition!'
-   'The Spanish Inquisition!'
+Debug specifier
+^^^^^^^^^^^^^^^
 
-To include a literal ``{`` or ``}``, use a double bracket:
+.. versionadded:: 3.8
 
-.. doctest::
+If a debug specifier -- an equal sign (``=``) -- appears after the replacement
+field expression, the resulting f-string will contain the expression's source,
+the equal sign, and the value of the expression.
+This is often useful for debugging::
 
-   >>> x = 42
-   >>> f'{{x}} is {x}'
-   '{x} is 42'
+   >>> print(f'{name=}')
+   name='Galahad'
 
-Functions can also be used, and :ref:`format specifiers <formatstrings>`:
+Whitespace on both sides of the equal sign is significant --- it is retained
+in the result::
 
-.. doctest::
+   >>> print(f'{name = }')
+   name = 'Galahad'
 
-   >>> from math import sqrt
-   >>> f'√2 \N{ALMOST EQUAL TO} {sqrt(2):.5f}'
-   '√2 ≈ 1.41421'
 
-Any non-string expression is converted using :func:`str`, by default:
+Conversion specifier
+^^^^^^^^^^^^^^^^^^^^
 
-.. doctest::
+By default, the value of a replacement field expression is converted to
+string using :func:`str`::
 
    >>> from fractions import Fraction
-   >>> f'{Fraction(1, 3)}'
+   >>> one_third = Fraction(1, 3)
+   >>> f'{one_third}'
    '1/3'
 
-To use an explicit conversion, use the ``!`` (exclamation mark) operator,
-followed by any of the valid formats, which are:
+When a debug specifier but no format specifier is used, the default conversion
+instead uses :func:`repr`::
 
-========== ==============
-Conversion  Meaning
-========== ==============
-``!a``      :func:`ascii`
-``!r``      :func:`repr`
-``!s``      :func:`str`
-========== ==============
+   >>> f'{one_third = }'
+   'one_third = Fraction(1, 3)'
 
-For example:
+The conversion can be specified explicitly using one of these specifiers:
 
-.. doctest::
+* ``!s`` for :func:`str`
+* ``!r`` for :func:`repr`
+* ``!a`` for :func:`ascii`
 
-   >>> from fractions import Fraction
-   >>> f'{Fraction(1, 3)!s}'
-   '1/3'
-   >>> f'{Fraction(1, 3)!r}'
-   'Fraction(1, 3)'
-   >>> question = '¿Dónde está el Presidente?'
-   >>> print(f'{question!a}')
-   '\xbfD\xf3nde est\xe1 el Presidente?'
-
-While debugging it may be helpful to see both the expression and its value,
-by using the equals sign (``=``) after the expression.
-This preserves spaces within the brackets, and can be used with a converter.
-By default, the debugging operator uses the :func:`repr` (``!r``) conversion.
-For example:
+For example::
 
-.. doctest::
+   >>> f'{one_third!r} is {one_third!s}'
+   'Fraction(1, 3) is 1/3'
 
-   >>> from fractions import Fraction
-   >>> calculation = Fraction(1, 3)
-   >>> f'{calculation=}'
-   'calculation=Fraction(1, 3)'
-   >>> f'{calculation = }'
-   'calculation = Fraction(1, 3)'
-   >>> f'{calculation = !s}'
-   'calculation = 1/3'
-
-Once the output has been evaluated, it can be formatted using a
-:ref:`format specifier <formatstrings>` following a colon (``':'``).
-After the expression has been evaluated, and possibly converted to a string,
-the :meth:`!__format__` method of the result is called with the format specifier,
-or the empty string if no format specifier is given.
-The formatted result is then used as the final value for the replacement field.
-For example:
+   >>> string = "¡kočka 😸!"
+   >>> f'{string = !a}'
+   "string = '\\xa1ko\\u010dka \\U0001f638!'"
 
-.. doctest::
+
+Format specifier
+^^^^^^^^^^^^^^^^
+
+After the expression has been evaluated, and possibly converted using an
+explicit conversion specifier, it is formatted using the :func:`format` function.
+If the replacement field includes a *format specifier* introduced by a colon
+(``:``), the specifier is passed to :func:`!format` as the second argument.
+The result of :func:`!format` is then used as the final value for the
+replacement field. For example::
 
    >>> from fractions import Fraction
-   >>> f'{Fraction(1, 7):.6f}'
-   '0.142857'
-   >>> f'{Fraction(1, 7):_^+10}'
-   '___+1/7___'
+   >>> one_third = Fraction(1, 3)
+   >>> f'{one_third:.6f}'
+   '0.333333'
+   >>> f'{one_third:_^+10}'
+   '___+1/3___'
+   >>> >>> f'{one_third!r:_^20}'
+   '___Fraction(1, 3)___'
+   >>> f'{one_third = :~>10}~'
+   'one_third = ~~~~~~~1/3~'
 
 
 .. _old-string-formatting:
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 954ebbed9ba1be..1d50eeca0b92e8 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -927,6 +927,14 @@ f-strings
 ---------
 
 .. versionadded:: 3.6
+.. versionchanged:: 3.7
+   The :keyword:`await` and :keyword:`async for` can be used in expressions
+   within f-strings.
+.. versionchanged:: 3.8
+   Added the debug specifier (``=``)
+.. versionchanged:: 3.12
+   Many restrictions on expressions within f-strings have been removed.
+   Notably, nested strings, comments, and backslashes are now permitted.
 
 A :dfn:`formatted string literal` or :dfn:`f-string` is a string literal
 that is prefixed with ``'f'`` or ``'F'``.
@@ -989,80 +997,49 @@ different line:
 
 After the expression, replacement fields may optionally contain:
 
-* a *debug specifier* -- an equal sign (``=``);
+* a *debug specifier* -- an equal sign (``=``), optionally surrounded by
+  whitespace on one or both sides;
 * a *conversion specifier* -- ``!s``, ``!r`` or ``!a``; and/or
 * a *format specifier* prefixed with a colon (``:``).
 
-Debug specifier
-^^^^^^^^^^^^^^^
-
-If a debug specifier -- an equal sign (``=``) -- appears after the replacement
-field expression, the resulting f-string will contain the expression's source,
-the equal sign, and the value of the expression.
-This is often useful for debugging::
-
-   >>> print(f'{name=}')
-   name='Galahad'
-
-Whitespace on both sides of the equal sign is significant --- it is retained
-in the result::
-
-   >>> print(f'{name = }')
-   name = 'Galahad'
+See the :ref:`Standard Library section on f-strings <stdtypes-fstrings>`
+for details on how these fields are evaluated.
 
+As that section explains, *format specifiers* are passed as the second argument
+to the :func:`format` function to format a replacement field value.
+For example, they can be used to specify a field width and padding characters
+using the :ref:`Format Specification Mini-Language <formatspec>`::
 
-Conversion specifier
-^^^^^^^^^^^^^^^^^^^^
+   >>> color = 'blue'
+   >>> f'{color:-^20s}'
+   '--------blue--------'
 
-By default, the value of a replacement field expression is converted to
-string using :func:`str`::
+Top-level format specifiers may include nested replacement fields::
 
-   >>> from fractions import Fraction
-   >>> one_third = Fraction(1, 3)
-   >>> f'{one_third}'
-   '1/3'
-
-When a debug specifier but no format specifier is used, the default conversion
-instead uses :func:`repr`::
-
-   >>> f'{one_third = }'
-   'one_third = Fraction(1, 3)'
-
-The conversion can be specified explicitly using one of these specifiers:
-
-* ``!s`` for :func:`str`
-* ``!r`` for :func:`repr`
-* ``!a`` for :func:`ascii`
-
-For example::
+   >>> field_size = 20
+   >>> f'{color:-^{field_size}s}'
+   '--------blue--------'
 
-   >>> f'{one_third!r} is {one_third!s}'
-   'Fraction(1, 3) is 1/3'
+These nested fields may include their own conversion fields and
+:ref:`format specifiers <formatspec>`::
 
-   >>> string = "¡kočka 😸!"
-   >>> f'{string = !a}'
-   "string = '\\xa1ko\\u010dka \\U0001f638!'"
+   >>> number = 3
+   >>> f'{number:{field_size}}'
+   '                   3'
+   >>> f'{number:{field_size:05}}'
+   '00000000000000000003'
 
+However, these nested fields may not include more deeply nested replacement
+fields.
 
-Format specifier
-^^^^^^^^^^^^^^^^
+Formatted string literals may be concatenated, but replacement fields
+cannot be split across literals.
+For example, the following is a single f-string::
 
-After the expression has been evaluated, and possibly converted using an
-explicit conversion specifier, it is formatted using the :func:`format` function.
-If the replacement field includes a *format specifier*, an arbitrary string
-introduced by a colon (``:``), the specifier is passed to :func:`!format`
-as the second argument.
-The result of :func:`!format` is then used as the final value for the
-replacement field. For example::
+   >>> f'{' '}'
+   ' '
 
-   >>> f'{one_third:.6f}'
-   '0.333333'
-   >>> f'{one_third:_^+10}'
-   '___+1/3___'
-   >>> >>> f'{one_third!r:_^20}'
-   '___Fraction(1, 3)___'
-   >>> f'{one_third = :~>10}~'
-   'one_third = ~~~~~~~1/3~'
+It is equivalent to ``f'{" "}'``, rather than ``f'{' "}"``.
 
 
 Formal grammar
@@ -1116,38 +1093,6 @@ Formal grammar
 ---------------
 
 
-
-
-When the equal sign ``'='`` is provided, the output will have the expression
-text, the ``'='`` and the evaluated value. Spaces after the opening brace
-``'{'``, within the expression and after the ``'='`` are all retained in the
-output. By default, the ``'='`` causes the :func:`repr` of the expression to be
-provided, unless there is a format specified. When a format is specified it
-defaults to the :func:`str` of the expression unless a conversion ``'!r'`` is
-declared.
-
-.. versionadded:: 3.8
-   The equal sign ``'='``.
-
-If a conversion is specified, the result of evaluating the expression
-is converted before formatting.  Conversion ``'!s'`` calls :func:`str` on
-the result, ``'!r'`` calls :func:`repr`, and ``'!a'`` calls :func:`ascii`.
-
-The result is then formatted using the :func:`format` protocol.  The
-format specifier is passed to the :meth:`~object.__format__` method of the
-expression or conversion result.  An empty string is passed when the
-format specifier is omitted.  The formatted result is then included in
-the final value of the whole string.
-
-Top-level format specifiers may include nested replacement fields. These nested
-fields may include their own conversion fields and :ref:`format specifiers
-<formatspec>`, but may not include more deeply nested replacement fields. The
-:ref:`format specifier mini-language <formatspec>` is the same as that used by
-the :meth:`str.format` method.
-
-Formatted string literals may be concatenated, but replacement fields
-cannot be split across literals.
-
 Some examples of formatted string literals::
 
    >>> name = "Fred"
@@ -1219,6 +1164,7 @@ include expressions.
 See also :pep:`498` for the proposal that added formatted string literals,
 and :meth:`str.format`, which uses a related format string mechanism.
 
+
 .. _t-strings:
 .. _template-string-literals:
 

From 5fdb129c28ef1236bb0424748dfac6796d76a4e1 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 23 Jul 2025 17:22:40 +0200
Subject: [PATCH 12/17] Details & start on the formal grammar

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Blaise Pabon <blaise@gmail.com>
---
 Doc/library/stdtypes.rst           |  25 ++--
 Doc/reference/lexical_analysis.rst | 189 ++++++++++++-----------------
 2 files changed, 97 insertions(+), 117 deletions(-)

diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
index 63526d0165fe33..dc601dce294243 100644
--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -2567,14 +2567,15 @@ field expression, the resulting f-string will contain the expression's source,
 the equal sign, and the value of the expression.
 This is often useful for debugging::
 
-   >>> print(f'{name=}')
-   name='Galahad'
+   >>> number = 14.3
+   >>> 'number=14.3'
+   number=14.3
 
-Whitespace on both sides of the equal sign is significant --- it is retained
-in the result::
+Whitespace before, inside and after the expression, as well as whitespace
+after the equal sign, is significant --- it is retained in the result::
 
-   >>> print(f'{name = }')
-   name = 'Galahad'
+   >>> f'{ number  -  4  = }'
+   ' number  -  4  = 10.3'
 
 
 Conversion specifier
@@ -2602,10 +2603,18 @@ The conversion can be specified explicitly using one of these specifiers:
 
 For example::
 
-   >>> f'{one_third!r} is {one_third!s}'
-   'Fraction(1, 3) is 1/3'
+   >>> str(one_third)
+   '1/3'
+   >>> repr(one_third)
+   'Fraction(1, 3)'
+
+   >>> f'{one_third!s} is {one_third!r}'
+   '1/3 is Fraction(1, 3)'
 
    >>> string = "¡kočka 😸!"
+   >>> ascii(string)
+   "'\\xa1ko\\u010dka \\U0001f638!'"
+
    >>> f'{string = !a}'
    "string = '\\xa1ko\\u010dka \\U0001f638!'"
 
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index c4dc34f30d3cdf..88b5295a590ab8 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -981,6 +981,35 @@ assignment expressions ``:=`` must be surrounded by explicit parentheses::
    >>> f'{(half := 1/2)}, {half * 42}'
    '0.5, 21.0'
 
+Reusing the outer f-string quoting type inside a replacement field is
+permitted::
+
+   >>> a = dict(x=2)
+   >>> f"abc {a["x"]} def"
+   'abc 2 def'
+
+Backslashes are also allowed in replacement fields and are evaluated the same
+way as in any other context::
+
+   >>> a = ["a", "b", "c"]
+   >>> print(f"List a contains:\n{"\n".join(a)}")
+   List a contains:
+   a
+   b
+   c
+
+It is possible to nest f-strings::
+
+   >>> name = 'world'
+   >>> f'Repeated:{f' hello {name}' * 3}'
+   'Repeated: hello world hello world hello world'
+
+Portable Python programs should not use more than 5 levels of nesting.
+
+.. impl-detail::
+
+   CPython does not limit nesting of f-strings.
+
 Replacement expressions can contain newlines in both single-quoted and
 triple-quoted f-strings and they can contain comments.
 Everything that comes after a ``#`` inside a replacement field
@@ -1010,15 +1039,16 @@ to the :func:`format` function to format a replacement field value.
 For example, they can be used to specify a field width and padding characters
 using the :ref:`Format Specification Mini-Language <formatspec>`::
 
-   >>> color = 'blue'
-   >>> f'{color:-^20s}'
-   '--------blue--------'
+   >>> number = 14.3
+   >>> f'{number:20.7f}'
+   '          14.3000000'
 
 Top-level format specifiers may include nested replacement fields::
 
    >>> field_size = 20
-   >>> f'{color:-^{field_size}s}'
-   '--------blue--------'
+   >>> precision = 7
+   >>> f'{number:{field_size}.{precision}f}'
+   '          14.3000000'
 
 These nested fields may include their own conversion fields and
 :ref:`format specifiers <formatspec>`::
@@ -1032,40 +1062,65 @@ These nested fields may include their own conversion fields and
 However, these nested fields may not include more deeply nested replacement
 fields.
 
-Formatted string literals may be concatenated, but replacement fields
-cannot be split across literals.
-For example, the following is a single f-string::
+Formatted string literals cannot be used as :term:`docstrings <docstring>`,
+even if they do not include expressions::
 
-   >>> f'{' '}'
-   ' '
+   >>> def foo():
+   ...     f"Not a docstring"
+   ...
+   >>> print(foo.__doc__)
+   None
 
-It is equivalent to ``f'{" "}'``, rather than ``f'{' "}"``.
+.. seealso::
 
+   * :pep:`498` -- Literal String Interpolation
+   * :pep:`701` -- Syntactic formalization of f-strings
+   * :meth:`str.format`, which uses a related format string mechanism.
 
-Formal grammar
-^^^^^^^^^^^^^^
 
-.. grammar-snippet:: python-grammar
-   :group: python-grammar
+Formal grammar for f-strings
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-   FSTRING_START:      `fstringprefix` ("'" | '"' | "'''" | '"""')
-   FSTRING_MIDDLE:
-      | <any `source_character`, except backslash, newline, '{' and '}'>
-      | `stringescapeseq`
-      | "{{"
-      | "}}"
-      | <newline, in triple-quoted f-strings only>
-   FSTRING_END:        ("'" | '"' | "'''" | '"""')
-   fstringprefix:      <("f" | "fr" | "rf"), case-insensitive>
-   f_debug_specifier:  whitespace* '=' whitespace*
+F-strings are handled partly by the :term:`lexical analyzer`, which produces the
+tokens :py:data:`~token.FSTRING_START`, :py:data:`~token.FSTRING_MIDDLE`
+and :py:data:`~token.FSTRING_END`, and the parser, which handles expressions
+in the replacement field.
+The exact way the work is split is a CPython implementation detail.
+
+Correspondingly, the f-string grammar is a mix of
+:ref:`lexical and syntactic definitions <notation-lexical-vs-syntactic>`.
+
+Whitespace is significant in these situations:
+
+* There may be no whitespace in :py:data:`~token.FSTRING_START` (between
+  the prefix and quote).
+* Whitespace in :py:data:`~token.FSTRING_MIDDLE` is part of the literal
+  string contents.
+* In ``fstring_replacement_field``, if ``f_debug_specifier`` is present,
+  all whitespace after the opening brace up to the ``!`` of
+  ``fstring_conversion``, ``:`` of ``fstring_full_format_spec``,
+  or the closing brace, is retained as part of the expression.
 
 .. grammar-snippet:: python-grammar
    :group: python-grammar
 
    fstring:    `FSTRING_START` `fstring_middle`* `FSTRING_END`
+
+   FSTRING_START:      `fstringprefix` ("'" | '"' | "'''" | '"""')
+   FSTRING_END:        `f_quote`
+   fstringprefix:      <("f" | "fr" | "rf"), case-insensitive>
+   f_debug_specifier:  '='
+   f_quote:            <the quote character(s) used in FSTRING_START>
+
    fstring_middle:
       | `fstring_replacement_field`
       | `FSTRING_MIDDLE`
+   FSTRING_MIDDLE:
+      | (!"\" !`newline` !'{' !'}' !`f_quote`) `source_character`
+      | `stringescapeseq`
+      | "{{"
+      | "}}"
+      | <newline, in triple-quoted f-strings only>
    fstring_replacement_field:
       | '{' `f_expression` [`f_debug_specifier`] [`fstring_conversion`]
             [`fstring_full_format_spec`] '}'
@@ -1081,90 +1136,6 @@ Formal grammar
       | `yield_expression`
 
 
-.. versionchanged:: 3.7
-   Prior to Python 3.7, an :keyword:`await` expression and comprehensions
-   containing an :keyword:`async for` clause were illegal in the expressions
-   in formatted string literals due to a problem with the implementation.
-
-.. versionchanged:: 3.12
-   Prior to Python 3.12, comments were not allowed inside f-string replacement
-   fields.
-
----------------
-
-
-Some examples of formatted string literals::
-
-   >>> name = "Fred"
-   >>> f"He said his name is {name!r}."
-   "He said his name is 'Fred'."
-   >>> f"He said his name is {repr(name)}."  # repr() is equivalent to !r
-   "He said his name is 'Fred'."
-   >>> width = 10
-   >>> precision = 4
-   >>> value = decimal.Decimal("12.34567")
-   >>> f"result: {value:{width}.{precision}}"  # nested fields
-   'result:      12.35'
-   >>> today = datetime(year=2017, month=1, day=27)
-   >>> f"{today:%B %d, %Y}"  # using date format specifier
-   'January 27, 2017'
-   >>> f"{today=:%B %d, %Y}" # using date format specifier and debugging
-   'today=January 27, 2017'
-   >>> number = 1024
-   >>> f"{number:#0x}"  # using integer format specifier
-   '0x400'
-   >>> foo = "bar"
-   >>> f"{ foo = }" # preserves whitespace
-   " foo = 'bar'"
-   >>> line = "The mill's closed"
-   >>> f"{line = }"
-   'line = "The mill\'s closed"'
-   >>> f"{line = :20}"
-   "line = The mill's closed   "
-   >>> f"{line = !r:20}"
-   'line = "The mill\'s closed" '
-
-
-Reusing the outer f-string quoting type inside a replacement field is
-permitted::
-
-   >>> a = dict(x=2)
-   >>> f"abc {a["x"]} def"
-   'abc 2 def'
-
-.. versionchanged:: 3.12
-   Prior to Python 3.12, reuse of the same quoting type of the outer f-string
-   inside a replacement field was not possible.
-
-Backslashes are also allowed in replacement fields and are evaluated the same
-way as in any other context::
-
-   >>> a = ["a", "b", "c"]
-   >>> print(f"List a contains:\n{"\n".join(a)}")
-   List a contains:
-   a
-   b
-   c
-
-.. versionchanged:: 3.12
-   Prior to Python 3.12, backslashes were not permitted inside an f-string
-   replacement field.
-
-Formatted string literals cannot be used as docstrings, even if they do not
-include expressions.
-
-::
-
-   >>> def foo():
-   ...     f"Not a docstring"
-   ...
-   >>> foo.__doc__ is None
-   True
-
-See also :pep:`498` for the proposal that added formatted string literals,
-and :meth:`str.format`, which uses a related format string mechanism.
-
-
 .. _t-strings:
 .. _template-string-literals:
 

From e88843c23054caa7c22231d8ac485facb7d942c5 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 23 Jul 2025 18:05:51 +0200
Subject: [PATCH 13/17] Improve text on whitespace in f-string debug
 expressions

---
 Doc/reference/lexical_analysis.rst | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 88b5295a590ab8..bd508399e6fcb4 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -1097,9 +1097,15 @@ Whitespace is significant in these situations:
 * Whitespace in :py:data:`~token.FSTRING_MIDDLE` is part of the literal
   string contents.
 * In ``fstring_replacement_field``, if ``f_debug_specifier`` is present,
-  all whitespace after the opening brace up to the ``!`` of
-  ``fstring_conversion``, ``:`` of ``fstring_full_format_spec``,
-  or the closing brace, is retained as part of the expression.
+  all whitespace after the opening brace until the ``f_debug_specifier``,
+  as well as whitespace immediatelly following ``f_debug_specifier``,
+  is retained as part of the expression.
+
+  .. impl-detail::
+
+     The expression is not handled in the tokenization phase; it is
+     retrieved from the source code using locations of the ``{`` token
+     and the token after ``=``.
 
 .. grammar-snippet:: python-grammar
    :group: python-grammar

From f2db8f9b660454700d32bf6aa516755d099c9855 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 6 Aug 2025 16:51:03 +0200
Subject: [PATCH 14/17] Comment on the funkiness of the t-string grammar

---
 Doc/reference/lexical_analysis.rst | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index b97b7bbc492712..a86a3521bedfe0 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -1085,8 +1085,8 @@ Formal grammar for f-strings
 
 F-strings are handled partly by the :term:`lexical analyzer`, which produces the
 tokens :py:data:`~token.FSTRING_START`, :py:data:`~token.FSTRING_MIDDLE`
-and :py:data:`~token.FSTRING_END`, and the parser, which handles expressions
-in the replacement field.
+and :py:data:`~token.FSTRING_END`, and partly by the parser, which handles
+expressions in the replacement field.
 The exact way the work is split is a CPython implementation detail.
 
 Correspondingly, the f-string grammar is a mix of
@@ -1109,6 +1109,12 @@ Whitespace is significant in these situations:
      retrieved from the source code using locations of the ``{`` token
      and the token after ``=``.
 
+
+The ``FSTRING_MIDDLE`` definition uses
+:ref:`negative lookaheads <lexical-lookaheads>` (``!``)
+to indicate special characters (backslash, newline, ``{``, ``}``) and
+sequences (``f_quote``).
+
 .. grammar-snippet:: python-grammar
    :group: python-grammar
 
@@ -1143,6 +1149,15 @@ Whitespace is significant in these situations:
       | ','.(`conditional_expression` | "*" `or_expr`)+ [","]
       | `yield_expression`
 
+.. note::
+
+   In the above grammar snippet, the ``f_quote`` and ``FSTRING_MIDDLE`` rules
+   are context-sensitive -- they depend on the contents of ``FSTRING_START``
+   of the nearest enclosing ``fstring``.
+
+   Constructing a more traditional formal grammar from this template is left
+   as an exercise for the reader.
+
 
 .. _t-strings:
 .. _template-string-literals:

From 6468a97fb2d8ee497760e9ebc22e52a96d9aab7c Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 6 Aug 2025 17:15:37 +0200
Subject: [PATCH 15/17] Adjust t-string docs: move evaluation rules out, add a
 note on grammar

---
 Doc/library/stdtypes.rst           | 40 +++++++++++++++++++
 Doc/reference/expressions.rst      |  2 +-
 Doc/reference/lexical_analysis.rst | 64 +++++++++++-------------------
 3 files changed, 65 insertions(+), 41 deletions(-)

diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
index 281592e2508f92..e9eda86a18adcd 100644
--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -2640,6 +2640,46 @@ replacement field. For example::
    >>> f'{one_third = :~>10}~'
    'one_third = ~~~~~~~1/3~'
 
+.. _stdtypes-tstrings:
+
+Template String Literals (t-strings)
+------------------------------------
+
+An :dfn:`t-string` (formally a :dfn:`template string literal`) is
+a string literal that is prefixed with ``t`` or ``T``.
+
+These strings follow the same syntax and evaluation rules as
+:ref:`formatted string literals <stdtypes-fstrings>`,
+with for the following differences:
+
+* Rather than evaluating to a ``str`` object, template string literals evaluate
+  to a :class:`string.templatelib.Template` object.
+
+* The :func:`format` protocol is not used.
+  Instead, the format specifier and conversions (if any) are passed to
+  a new :class:`~string.templatelib.Interpolation` object that is created
+  for each evaluated expression.
+  It is up to code that processes the resulting :class:`~string.templatelib.Template`
+  object to decide how to handle format specifiers and conversions.
+
+* Format specifiers containing nested replacement fields are evaluated eagerly,
+  prior to being passed to the :class:`~string.templatelib.Interpolation` object.
+  For instance, an interpolation of the form ``{amount:.{precision}f}`` will
+  evaluate the inner expression ``{precision}`` to determine the value of the
+  ``format_spec`` attribute.
+  If ``precision`` were to be ``2``, the resulting format specifier
+  would be ``'.2f'``.
+
+* When the equals sign ``'='`` is provided in an interpolation expression,
+  the text of the expression is appended to the literal string that precedes
+  the relevant interpolation.
+  This includes the equals sign and any surrounding whitespace.
+  The :class:`!Interpolation` instance for the expression will be created as
+  normal, except that :attr:`~string.templatelib.Interpolation.conversion` will
+  be set to '``r``' (:func:`repr`) by default.
+  If an explicit conversion or format specifier are provided,
+  this will override the default behaviour.
+
 
 .. _old-string-formatting:
 
diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst
index 9aca25e3214a16..20100e6617f10d 100644
--- a/Doc/reference/expressions.rst
+++ b/Doc/reference/expressions.rst
@@ -174,7 +174,7 @@ Formally:
 .. grammar-snippet::
    :group: python-grammar
 
-   strings: ( `STRING` | fstring)+ | tstring+
+   strings: ( `STRING` | `fstring`)+ | `tstring`+
 
 This feature is defined at the syntactical level, so it only works with literals.
 To concatenate string expressions at run time, the '+' operator may be used::
diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index a86a3521bedfe0..f9615299bf5e42 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -1080,8 +1080,24 @@ even if they do not include expressions::
    * :meth:`str.format`, which uses a related format string mechanism.
 
 
+.. _t-strings:
+.. _template-string-literals:
+
+t-strings
+---------
+
+.. versionadded:: 3.14
+
+A :dfn:`template string literal` or :dfn:`t-string` is a string literal
+that is prefixed with '``t``' or '``T``'.
+These strings follow the same syntax rules as
+:ref:`formatted string literals <f-strings>`.
+For differences in evaluation rules, see the
+:ref:`Standard Library section on t-strings <stdtypes-tstrings>`
+
+
 Formal grammar for f-strings
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+----------------------------
 
 F-strings are handled partly by the :term:`lexical analyzer`, which produces the
 tokens :py:data:`~token.FSTRING_START`, :py:data:`~token.FSTRING_MIDDLE`
@@ -1115,7 +1131,7 @@ The ``FSTRING_MIDDLE`` definition uses
 to indicate special characters (backslash, newline, ``{``, ``}``) and
 sequences (``f_quote``).
 
-.. grammar-snippet:: python-grammar
+.. grammar-snippet::
    :group: python-grammar
 
    fstring:    `FSTRING_START` `fstring_middle`* `FSTRING_END`
@@ -1158,47 +1174,15 @@ sequences (``f_quote``).
    Constructing a more traditional formal grammar from this template is left
    as an exercise for the reader.
 
+The grammar for t-strings is identical to the one for f-strings, with *t*
+instead of *f* at the beginning of rule and token names and in the prefix.
 
-.. _t-strings:
-.. _template-string-literals:
-
-t-strings
----------
+.. grammar-snippet::
+   :group: python-grammar
 
-.. versionadded:: 3.14
+   tstring:    `TSTRING_START` `tstring_middle`* `TSTRING_END`
 
-A :dfn:`template string literal` or :dfn:`t-string` is a string literal
-that is prefixed with '``t``' or '``T``'.
-These strings follow the same syntax and evaluation rules as
-:ref:`formatted string literals <f-strings>`, with the following differences:
-
-* Rather than evaluating to a ``str`` object, template string literals evaluate
-  to a :class:`string.templatelib.Template` object.
-
-* The :func:`format` protocol is not used.
-  Instead, the format specifier and conversions (if any) are passed to
-  a new :class:`~string.templatelib.Interpolation` object that is created
-  for each evaluated expression.
-  It is up to code that processes the resulting :class:`~string.templatelib.Template`
-  object to decide how to handle format specifiers and conversions.
-
-* Format specifiers containing nested replacement fields are evaluated eagerly,
-  prior to being passed to the :class:`~string.templatelib.Interpolation` object.
-  For instance, an interpolation of the form ``{amount:.{precision}f}`` will
-  evaluate the inner expression ``{precision}`` to determine the value of the
-  ``format_spec`` attribute.
-  If ``precision`` were to be ``2``, the resulting format specifier
-  would be ``'.2f'``.
-
-* When the equals sign ``'='`` is provided in an interpolation expression,
-  the text of the expression is appended to the literal string that precedes
-  the relevant interpolation.
-  This includes the equals sign and any surrounding whitespace.
-  The :class:`!Interpolation` instance for the expression will be created as
-  normal, except that :attr:`~string.templatelib.Interpolation.conversion` will
-  be set to '``r``' (:func:`repr`) by default.
-  If an explicit conversion or format specifier are provided,
-  this will override the default behaviour.
+   <rest of the t-string grammar is omitted; see above>
 
 
 .. _numbers:

From 681112d9ecf00c8f1b81dc10f93b87dd49a67281 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 13 Aug 2025 16:10:39 +0200
Subject: [PATCH 16/17] Apply suggestions from code review

Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
---
 Doc/library/stdtypes.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Doc/library/stdtypes.rst b/Doc/library/stdtypes.rst
index e9eda86a18adcd..9df41d380b5e79 100644
--- a/Doc/library/stdtypes.rst
+++ b/Doc/library/stdtypes.rst
@@ -2568,8 +2568,8 @@ the equal sign, and the value of the expression.
 This is often useful for debugging::
 
    >>> number = 14.3
-   >>> 'number=14.3'
-   number=14.3
+   >>> f'{number=}'
+   'number=14.3'
 
 Whitespace before, inside and after the expression, as well as whitespace
 after the equal sign, is significant --- it is retained in the result::
@@ -2582,7 +2582,7 @@ Conversion specifier
 ^^^^^^^^^^^^^^^^^^^^
 
 By default, the value of a replacement field expression is converted to
-string using :func:`str`::
+a string using :func:`str`::
 
    >>> from fractions import Fraction
    >>> one_third = Fraction(1, 3)

From 9e1290fa18533a819942e49730317dd0be9f3f67 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 13 Aug 2025 16:12:26 +0200
Subject: [PATCH 17/17] Don't link TSTRING_START &c. since we don't define them

---
 Doc/reference/lexical_analysis.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index f9615299bf5e42..082b770ede749c 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -1180,7 +1180,7 @@ instead of *f* at the beginning of rule and token names and in the prefix.
 .. grammar-snippet::
    :group: python-grammar
 
-   tstring:    `TSTRING_START` `tstring_middle`* `TSTRING_END`
+   tstring:    TSTRING_START tstring_middle* TSTRING_END
 
    <rest of the t-string grammar is omitted; see above>