You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/Correctly-Using-Regular-Expressions-Rationale.md
+15-7Lines changed: 15 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -146,6 +146,8 @@ In both BRE and ERE notation, by default “^” means beginning-of-string and
146
146
147
147
The [regcomp function (which compiles regular expressions)](https://pubs.opengroup.org/onlinepubs/9699919799.2018edition/) accepts a “REG_NEWLINE” flag, to help text editors search many lines. If REG_NEW_LINE is set, the interpretation changes: a “^” matches the zero-length string immediately after a <newline> in string, and “$” matches the zero-length string immediately before a <newline> in string. There’s no way in the POSIX specification to separately match the beginning of a string nor an end of a string when REG_NEWLINE is enabled, which is why \A, \Z, and \z were later created by Perl. When validating input from untrusted users the REG_NEWLINE option is normally not used.
148
148
149
+
The Austin Group (who maintain the POSIX specification) in 2025 [added \A and \z to POSIX for EREs](https://www.austingroupbugs.net/view.php?id=1919) and recommends that BREs also implement them.
150
+
149
151
### Perl
150
152
151
153
[Perl documentation for perlre (perl regular expressions)](https://perldoc.perl.org/perlre) describes its support for regular expressions. Version 5.38.2 documents the following, where “/m” is the “multiple lines” modifier (the multiple lines modifier is _not_ enabled by default):
@@ -185,6 +187,8 @@ Python3’s regular expression library “re” has the method “fullmatch” w
185
187
186
188
As of 2024-03-24, [Tutorialspoints incorrectly claims that “$ matches the end of a string” in Python](https://www.tutorialspoint.com/How-to-match-at-the-end-of-string-in-python-using-Regular-Expression#). StackOverflow answer [1218783](https://stackoverflow.com/a/12187839) is also incorrect.
187
189
190
+
In 2025 Python decided to add support for [\z as end-of-string](https://github.com/python/cpython/issues/133306) and modified various libraries to use it.
191
+
188
192
### RE2
189
193
190
194
[RE2](https://github.com/google/re2) is a regular expression library using a non-backtracking impllementation approach. Such implementations are don’t have catestrophic cases and are sometimes orders of magnitude faster, but they’re less featureful (e.g., they don’t support backreferences). RE2’s speed is compelling in many cases, so RE2 ended up being used in many places.
@@ -506,15 +510,19 @@ be nearly universal:
506
510
[Regular Expression Buffer Boundaries for ECMAScript](https://github.com/tc39/proposal-regexp-buffer-boundaries)
507
511
to add \A and \z to ECMAScript/JavaScript, and it advanced to stage 2,
508
512
but it seems to be stuck there. We intend to see if we can help it advance.
509
-
* Python: Python supports \A, but it uses the unique \Z instead of the
510
-
\z used everywhere else for end-of-string.
511
-
We'll ask to see if \z could be supported in addition to \Z for end-of-string.
512
-
We'll probably start with a minor git request (as this is a really
513
-
small change), otherwise we'll create a PEP, depending on the desires
514
-
of the Python community.
513
+
* Python: Python supports \A, but historically it uses
514
+
the rare \Z instead of the \z used almost everywhere else for end-of-string.
515
515
In current versions of Python3 a \z in a regex raises an exception, so
516
516
adding \z for end-of-string would be a backwards-compatible addition.
517
-
See [CPython issue 133306](https://github.com/python/cpython/issues/133306).
517
+
In [CPython issue 133306](https://github.com/python/cpython/issues/133306)
518
+
it was agreed to add \z in addition to \Z to match end-of-string,
Copy file name to clipboardExpand all lines: docs/Correctly-Using-Regular-Expressions.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,7 +34,7 @@ When using regexes for secure validation of untrusted input, do the following so
34
34
| Python | “^” or “\A” | “\Z” (not “$” nor “\z”) | Yes |
35
35
| Ruby | “\A” (not “^”) | “\z” (not “$”) | Yes |
36
36
37
-
For example, to validate in JavaScript that the input is only “ab” or “de”, use the regex “<tt>^(ab|de)$</tt>”. To validate the same thing in Python, use “<tt>^(ab|de)\Z</tt>” or “<tt>\A(ab|de)\Z</tt>”. Note that the “$” anchor has different meanings among platforms and is often misunderstood; on many platforms it’s permissive by default and doesn’t match only the end of the input. Instead of using “$” on a platform if $ is permissive, consider using an explicit form instead (e.g., “`\n?\z`”). Consider preferring “\A” and “\z” where it’s supported (this is necessary when using Ruby).
37
+
For example, to validate in JavaScript that the input is only “ab” or “de”, use the regex “<tt>^(ab|de)$</tt>”. To validate the same thing in Python, use “<tt>^(ab|de)\Z</tt>” or “<tt>\A(ab|de)\Z</tt>”. Note that the “$” anchor has different meanings among platforms and is often misunderstood; on many platforms it’s permissive by default and doesn’t match only the end of the input. Instead of using “$” on a platform if $ is permissive, consider using an explicit form instead (e.g., “`\n?\z`”). Consider preferring “\A” and “\z” where it’s supported (this is necessary when using Ruby).[POSIX EREs](https://www.austingroupbugs.net/view.php?id=1919) and [Python](https://github.com/python/cpython/issues/133306) are being changed to support `\A`...`\z`.
38
38
39
39
In addition, ensure your regex is not vulnerable to a Regular Expression Denial of Service (ReDoS) attack. A ReDoS “[is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size)](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)”. Many regex implementations are “backtracking” implementations, that is, they try all possible matches. In these implementations, a poorly-written regular expression can be exploited by an attacker to take a vast amount of time.
0 commit comments