Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/Correctly-Using-Regular-Expressions-Rationale.md
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,7 @@ Setting both PCRE2_ANCHORED and PCRE2_ENDANCHORED forces a full-string match, bu
The [Python3 language documentation on re](https://docs.python.org/3/library/re.html) notes that its operations are “similar to those found in Perl” - but note that they are _similar_ not _identical_. In this library:

* ^ (Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline.
* $ Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline.
* $ Matches the end of the string or just before the newline at the end of the string (it is _permissive_), and in MULTILINE mode it also matches before a newline.
* \A Matches only at the start of the string.
* \Z Matches only at the end of the string. Note that this is spelled \Z not \z, and there is no \z.

Expand Down
8 changes: 4 additions & 4 deletions docs/Correctly-Using-Regular-Expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ Platform
</td>
<td>“\Z” (not “$” nor “\z”)
</td>
<td>No
<td>Yes
</td>
</tr>
<tr>
Expand All @@ -112,18 +112,18 @@ Platform
</td>
<td>“\z” (not “$”)
</td>
<td>No
<td>Yes
</td>
</tr>
</table>

For example, to validate in JavaScript that the input is only “ab” or “de”, use the regex “<tt>^(ab&#x7c;de)$</tt>”. To validate the same thing in Python, use “<tt>^(ab&#x7c;de)\Z</tt>” or “<tt>\A(ab&#x7c;de)\Z</tt>”. Note that the “$” anchor has different meanings among platforms and is often misunderstood; on many platforms it’s permissive and doesn’t match only the end of the input. Instead of using “$” on a platform if $ is permissive, consider using an explicit form instead (e.g., “`\n?\z`”). Consider preferring “\A” and “\z” where it’s supported (this is necessary when using Ruby).
For example, to validate in JavaScript that the input is only “ab” or “de”, use the regex “<tt>^(ab&#x7c;de)$</tt>”. To validate the same thing in Python, use “<tt>^(ab&#x7c;de)\Z</tt>” or “<tt>\A(ab&#x7c;de)\Z</tt>”. Note that the “$” anchor has different meanings among platforms and is often misunderstood; on many platforms it’s permissive by default and doesn’t match only the end of the input. Instead of using “$” on a platform if $ is permissive, consider using an explicit form instead (e.g., “`\n?\z`”). Consider preferring “\A” and “\z” where it’s supported (this is necessary when using Ruby).

In addition, ensure your regex is not vulnerable to a Regular Expression Denial of Service (ReDoS) attack. A ReDoS “[is a Denial of Service attack, that exploits the fact that most Regular Expression implementations may reach extreme situations that cause them to work very slowly (exponentially related to input size)](https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS)”. Many regex implementations are “backtracking” implementations, that is, they try all possible matches. In these implementations, a poorly-written regular expression can be exploited by an attacker to take a vast amount of time.

1. One solution is to use a regex implementation that does not have this vulnerability because it never backtracks. E.g., use Go’s default regex system, RE2, or on .NET enable the RegexOptions.NonBacktracking option. Non-backtracking implementations can sometimes be orders of magnitude faster, but they also omit some features (e.g., backreferences).
2. Alternatively, create regexes that require no or little backtracking. Where a branch (“&#x7c;”) occurs, the next character should select one branch. Where there is optional repetition (e.g., “&#x2a;”), the next character should determine if there is a repetition or not. One common cause of unnecessary backtracking are poorly-written regexes with repetitions in repetitions, e.g., “(a+)&#x2a;”. Some tools can help find these defects.
3. A partial countermeasure is to greatly limit the length of the untrusted input. This can limit the impact of a vulnerability.
3. A partial countermeasure is to greatly limit the length of the untrusted input and/or the number of repetitions. This can limit the impact of a vulnerability. For example, in a regex, use “{0,4}” (0 through 4 repetitions inclusive) instead of “*” (0 or more repetitions, with no maximum).

## Detailed Rationale

Expand Down
9 changes: 9 additions & 0 deletions docs/src/regex.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/usr/bin/env python3

import re

print('Test Python regex')
print("Must be false: ", bool(re.search(r'^wrong$', "hello")))
print("Must be true: ", bool(re.search(r'^hello$', "hello")))
print("True if permissive: ", bool(re.search(r'^hello$', "hello\n")))
print("Should be false: ", bool(re.search(r'^hello$', "hello\nthere")))
7 changes: 7 additions & 0 deletions docs/src/regex.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/usr/bin/env ruby

puts('Test Ruby regex')
puts("Must be false: ", !! /^wrong$/.match("hello"))
puts("Must be true: ", !! /^hello$/.match("hello"))
puts("True if permissive: ", !! /^hello$/.match("hello\n"))
puts("Should be true ($ always multi): ", !! /^hello$/.match("hello\nthere"))
Loading