Skip to content
Open
Changes from 6 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
817b3f3
Doc: Fix the array.fromfile method doc
adorilson Sep 1, 2020
6b53456
gh-106320: Remove private _PyInterpreterState functions (#106335)
vstinner Jul 2, 2023
1b4d152
[Doc] Divide RE Syntax in subsections
adorilson Jan 20, 2024
6ad009c
[DOC] Add crasis surrounding some RE-matched words
adorilson Jan 20, 2024
94f765f
[DOC] Make clearer what will be matched with a RE
adorilson Jan 20, 2024
292672b
Doc: minor change
adorilson Dec 30, 2023
65b4278
Merge branch 'python:main' into re_improvements
adorilson Feb 3, 2024
fe7389a
Merge branch 'python:main' into re_improvements
adorilson Feb 4, 2024
8394cd3
Merge branch 'python:main' into re_improvements
adorilson Feb 5, 2024
e2023e0
Doc: Put PatternError's attributes inside a table instead of regular …
adorilson Feb 5, 2024
cdaa9ae
Doc: Fix PatternError's attributes
adorilson Feb 5, 2024
bb98dad
Doc: fix lint issue
adorilson Feb 5, 2024
22ffed7
Merge branch 'main' into re_improvements
adorilson Feb 25, 2024
6a1e74e
Merge branch 'python:main' into re_improvements
adorilson Sep 25, 2024
6b357af
Doc: Add extension notation header
adorilson Sep 25, 2024
8f7356d
Doc: Add some more backticks
adorilson Sep 25, 2024
6ed5109
Merge branch 'python:main' into re_improvements
adorilson Sep 26, 2024
9c17aa8
Doc: Fix malformed hyperlink target
adorilson Sep 26, 2024
acb2e38
Merge branch 'main' into re_improvements
adorilson Sep 26, 2024
4d3b8dd
Merge branch 'python:main' into re_improvements
adorilson Oct 1, 2024
643070c
Merge branch 'main' into re_improvements
adorilson Oct 3, 2024
17baf98
Docs: add a 'also' for $ special character and RE examples reference …
adorilson Oct 3, 2024
4e12f7c
Docs: add some RE raw string notation references
adorilson Oct 3, 2024
a09a187
Merge branch 'python:main' into re_improvements
adorilson Oct 20, 2024
625a5cf
Revert "[DOC] Make clearer what will be matched with a RE"
adorilson Oct 20, 2024
12ecb3a
Doc: Put some subheadings at Special Character section
adorilson Oct 20, 2024
f576282
Doc: Fix raw string notation reference
adorilson Oct 20, 2024
337e4b4
Merge branch 'python:main' into re_improvements
adorilson Oct 28, 2024
0e0e082
Doc: Include "Python's" to a link text in RE module
adorilson Oct 28, 2024
f094a90
Doc: Add some backticks in re.IGNORECASE section
adorilson Oct 28, 2024
fd24e0f
Merge branch 'main' into re_improvements
adorilson Nov 2, 2024
a8c44e1
Merge branch 'main' into re_improvements
adorilson Nov 21, 2024
f970235
Doc: rename some heading in RE
adorilson Mar 15, 2025
8d52469
Doc: Connect some s in RE
adorilson Mar 15, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 30 additions & 14 deletions Doc/library/re.rst
Original file line number Diff line number Diff line change
Expand Up @@ -83,6 +83,12 @@ characters, so ``last`` matches the string ``'last'``. (In the rest of this
section, we'll write RE's in ``this special style``, usually without quotes, and
strings to be matched ``'in single quotes'``.)


.. _re-special-characters:

Special characters
^^^^^^^^^^^^^^^^^^

Some characters, like ``'|'`` or ``'('``, are special. Special
characters either stand for classes of ordinary characters, or affect
how the regular expressions around them are interpreted.
Expand All @@ -93,7 +99,6 @@ directly nested. This avoids ambiguity with the non-greedy modifier suffix
repetition to an inner repetition, parentheses may be used. For example,
the expression ``(?:a{6})*`` matches any multiple of six ``'a'`` characters.


The special characters are:

.. index:: single: . (dot); in regular expressions
Expand All @@ -114,31 +119,33 @@ The special characters are:
``$``
Matches the end of the string or just before the newline at the end of the
string, and in :const:`MULTILINE` mode also matches before a newline. ``foo``
matches both 'foo' and 'foobar', while the regular expression ``foo$`` matches
matches both ``'foo'`` and ``'foobar'``, while the regular expression ``foo$``
matches
only 'foo'. More interestingly, searching for ``foo.$`` in ``'foo1\nfoo2\n'``
matches 'foo2' normally, but 'foo1' in :const:`MULTILINE` mode; searching for
matches 'foo2' normally, but ``'foo1'`` in :const:`MULTILINE` mode; searching
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be consistent with other additions, 'foo' above and 'foo2' here should be backticked. But see review summary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for
a single ``$`` in ``'foo\n'`` will find two (empty) matches: one just before
the newline, and one at the end of the string.

.. index:: single: * (asterisk); in regular expressions

``*``
Causes the resulting RE to match 0 or more repetitions of the preceding RE, as
many repetitions as are possible. ``ab*`` will match 'a', 'ab', or 'a' followed
by any number of 'b's.
many repetitions as are possible. ``ab*`` will match ``'a'``, ``'ab'``, or
``'a'`` followed by any number of ``'b'`` s.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The -s signifying plural has become disconnected from the b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting decision about: #114357 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that I think it is great formatting, but looking at the markup under *+ you might join the s on with

Suggested change
``'a'`` followed by any number of ``'b'`` s.
``'a'`` followed by any number of ``'b'``\ s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


.. index:: single: + (plus); in regular expressions

``+``
Causes the resulting RE to match 1 or more repetitions of the preceding RE.
``ab+`` will match 'a' followed by any non-zero number of 'b's; it will not
match just 'a'.
``ab+`` will match ``'a'`` followed by any non-zero number of ``'b'`` s; it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-s disconnected again

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

will not match just ``'a'``.

.. index:: single: ? (question mark); in regular expressions

``?``
Causes the resulting RE to match 0 or 1 repetitions of the preceding RE.
``ab?`` will match either 'a' or 'ab'.
``ab?`` will match either ``'a'`` or ``'ab'``.

.. index::
single: *?; in regular expressions
Expand Down Expand Up @@ -514,6 +521,9 @@ The special characters are:

.. _re-special-sequences:

Special sequences
^^^^^^^^^^^^^^^^^
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we call them all escape sequences? Differentiates better from the multi-character “special character” sequences above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there other subtypes of special characters? What if about subsections?

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest change the next heading to something like String literal escapes, and change this heading from Special sequences to Escape sequences.

These are the types of the special characters I can think of for REs:

  • The single-character metacharacters: $, *, [, ], \, etc, as listed in the how-to https://cpython-previews--114357.org.readthedocs.build/en/114357/howto/regex.html#matching-characters
  • Multicharacter syntax built with the metacharacters, like *?, {m,n} and the bracketed extension notation (?. . .)
  • “Special sequences” a.k.a. escape sequences, which begin with a backslash. These could be subdivided into
    • Non-alphanumeric, for escaping metacharacters and other syntax: \$, \*, \\, \', \", etc
    • Group references \1–\99
    • Alphanumeric sequences that specify locations to match, or categories of characters: \A, \b, \d, etc
    • String literal escapes: \n, \\, \N{. . .}, \0–\777, etc. Excludes \b and \<newline>.
  • Characters only special in “verbose” expressions: whitespace and #
  • Additional backslash sequence for re.sub templates: \g<. . .>
  • Special characters inside square-bracketed classes/sets [. . .], especially -, ^, ], \b, and reserved [, &&, etc


The special sequences consist of ``'\'`` and a character from the list below.
If the ordinary character is not an ASCII digit or an ASCII letter, then the
resulting RE will match the second character. For example, ``\$`` matches the
Expand Down Expand Up @@ -580,7 +590,7 @@ character ``'$'``.
(that is, any character in Unicode character category `[Nd]`__).
This includes ``[0-9]``, and also many other digit characters.

Matches ``[0-9]`` if the :py:const:`~re.ASCII` flag is used.
Matches only ``[0-9]`` if the :py:const:`~re.ASCII` flag is used.

__ https://www.unicode.org/versions/Unicode15.0.0/ch04.pdf#G134153

Expand All @@ -594,7 +604,7 @@ character ``'$'``.
Matches any character which is not a decimal digit.
This is the opposite of ``\d``.

Matches ``[^0-9]`` if the :py:const:`~re.ASCII` flag is used.
Matches only ``[^0-9]`` if the :py:const:`~re.ASCII` flag is used.

.. index:: single: \s; in regular expressions

Expand All @@ -605,7 +615,7 @@ character ``'$'``.
non-breaking spaces mandated by typography rules in many
languages).

Matches ``[ \t\n\r\f\v]`` if the :py:const:`~re.ASCII` flag is used.
Matches only ``[ \t\n\r\f\v]`` if the :py:const:`~re.ASCII` flag is used.

For 8-bit (bytes) patterns:
Matches characters considered whitespace in the ASCII character set;
Expand All @@ -617,7 +627,7 @@ character ``'$'``.
Matches any character which is not a whitespace character. This is
the opposite of ``\s``.

Matches ``[^ \t\n\r\f\v]`` if the :py:const:`~re.ASCII` flag is used.
Matches only ``[^ \t\n\r\f\v]`` if the :py:const:`~re.ASCII` flag is used.

.. index:: single: \w; in regular expressions

Expand All @@ -628,7 +638,7 @@ character ``'$'``.
(as defined by :py:meth:`str.isalnum`),
as well as the underscore (``_``).

Matches ``[a-zA-Z0-9_]`` if the :py:const:`~re.ASCII` flag is used.
Matches only ``[a-zA-Z0-9_]`` if the :py:const:`~re.ASCII` flag is used.

For 8-bit (bytes) patterns:
Matches characters considered alphanumeric in the ASCII character set;
Expand All @@ -644,7 +654,7 @@ character ``'$'``.
By default, matches non-underscore (``_``) characters
for which :py:meth:`str.isalnum` returns ``False``.

Matches ``[^a-zA-Z0-9_]`` if the :py:const:`~re.ASCII` flag is used.
Matches only ``[^a-zA-Z0-9_]`` if the :py:const:`~re.ASCII` flag is used.

If the :py:const:`~re.LOCALE` flag is used,
matches characters which are neither alphanumeric in the current locale
Expand All @@ -655,6 +665,12 @@ character ``'$'``.
``\Z``
Matches only at the end of the string.


.. _re-escape-sequences:

Escape sequences
^^^^^^^^^^^^^^^^^

.. index::
single: \a; in regular expressions
single: \b; in regular expressions
Expand Down