-
-
Notifications
You must be signed in to change notification settings - Fork 33.2k
Improvements in regular expression doc #114357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 6 commits
817b3f3
6b53456
1b4d152
6ad009c
94f765f
292672b
65b4278
fe7389a
8394cd3
e2023e0
cdaa9ae
bb98dad
22ffed7
6a1e74e
6b357af
8f7356d
6ed5109
9c17aa8
acb2e38
4d3b8dd
643070c
17baf98
4e12f7c
a09a187
625a5cf
12ecb3a
f576282
337e4b4
0e0e082
f094a90
fd24e0f
a8c44e1
f970235
8d52469
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -83,6 +83,12 @@ characters, so ``last`` matches the string ``'last'``. (In the rest of this | |||||
section, we'll write RE's in ``this special style``, usually without quotes, and | ||||||
strings to be matched ``'in single quotes'``.) | ||||||
|
||||||
|
||||||
.. _re-special-characters: | ||||||
|
||||||
Special characters | ||||||
^^^^^^^^^^^^^^^^^^ | ||||||
|
||||||
Some characters, like ``'|'`` or ``'('``, are special. Special | ||||||
characters either stand for classes of ordinary characters, or affect | ||||||
how the regular expressions around them are interpreted. | ||||||
|
@@ -93,7 +99,6 @@ directly nested. This avoids ambiguity with the non-greedy modifier suffix | |||||
repetition to an inner repetition, parentheses may be used. For example, | ||||||
the expression ``(?:a{6})*`` matches any multiple of six ``'a'`` characters. | ||||||
|
||||||
|
||||||
The special characters are: | ||||||
|
||||||
.. index:: single: . (dot); in regular expressions | ||||||
|
@@ -114,31 +119,33 @@ The special characters are: | |||||
``$`` | ||||||
Matches the end of the string or just before the newline at the end of the | ||||||
string, and in :const:`MULTILINE` mode also matches before a newline. ``foo`` | ||||||
matches both 'foo' and 'foobar', while the regular expression ``foo$`` matches | ||||||
matches both ``'foo'`` and ``'foobar'``, while the regular expression ``foo$`` | ||||||
matches | ||||||
only 'foo'. More interestingly, searching for ``foo.$`` in ``'foo1\nfoo2\n'`` | ||||||
matches 'foo2' normally, but 'foo1' in :const:`MULTILINE` mode; searching for | ||||||
matches 'foo2' normally, but ``'foo1'`` in :const:`MULTILINE` mode; searching | ||||||
for | ||||||
a single ``$`` in ``'foo\n'`` will find two (empty) matches: one just before | ||||||
the newline, and one at the end of the string. | ||||||
|
||||||
.. index:: single: * (asterisk); in regular expressions | ||||||
|
||||||
``*`` | ||||||
Causes the resulting RE to match 0 or more repetitions of the preceding RE, as | ||||||
many repetitions as are possible. ``ab*`` will match 'a', 'ab', or 'a' followed | ||||||
by any number of 'b's. | ||||||
many repetitions as are possible. ``ab*`` will match ``'a'``, ``'ab'``, or | ||||||
``'a'`` followed by any number of ``'b'`` s. | ||||||
|
``'a'`` followed by any number of ``'b'`` s. | |
``'a'`` followed by any number of ``'b'``\ s. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-s disconnected again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call them all escape sequences? Differentiates better from the multi-character “special character” sequences above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest change the next heading to something like String literal escapes, and change this heading from Special sequences to Escape sequences.
These are the types of the special characters I can think of for REs:
- The single-character metacharacters:
$, *, [, ], \, etc
, as listed in the how-to https://cpython-previews--114357.org.readthedocs.build/en/114357/howto/regex.html#matching-characters - Multicharacter syntax built with the metacharacters, like *?, {m,n} and the bracketed extension notation (?. . .)
- “Special sequences” a.k.a. escape sequences, which begin with a backslash. These could be subdivided into
- Non-alphanumeric, for escaping metacharacters and other syntax:
\$, \*, \\, \', \", etc
- Group references \1–\99
- Alphanumeric sequences that specify locations to match, or categories of characters: \A, \b, \d, etc
- String literal escapes:
\n, \\, \N{. . .}, \0–\777, etc
. Excludes \b and\<newline>
.
- Non-alphanumeric, for escaping metacharacters and other syntax:
- Characters only special in “verbose” expressions: whitespace and #
- Additional backslash sequence for re.sub templates: \g<. . .>
- Special characters inside square-bracketed classes/sets [. . .], especially -, ^, ], \b, and reserved [, &&, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent with other additions, 'foo' above and 'foo2' here should be backticked. But see review summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.