1
+ <!DOCTYPE qhelp PUBLIC "-//Semmle//qhelp//EN" "qhelp.dtd">
2
+ <qhelp >
3
+ <overview >
4
+ <p >
5
+ Some regular expressions take a long time to match certain input strings
6
+ to the point where the time it takes to match a string of length <i >n</i >
7
+ is proportional to <i >n<sup >k</sup ></i > or even <i >2<sup >n</sup ></i >.
8
+ Such regular expressions can negatively affect performance, or even allow
9
+ a malicious user to perform a Denial of Service ("DoS") attack by crafting
10
+ an expensive input string for the regular expression to match.
11
+ </p >
12
+ <p >
13
+ The regular expression engine used by the Ruby interpreter (MRI) uses
14
+ backtracking non-deterministic finite automata to implement regular
15
+ expression matching. While this approach is space-efficient and allows
16
+ supporting advanced features like capture groups, it is not time-efficient
17
+ in general. The worst-case time complexity of such an automaton can be
18
+ polynomial or even exponential, meaning that for strings of a certain
19
+ shape, increasing the input length by ten characters may make the
20
+ automaton about 1000 times slower.
21
+ </p >
22
+ <p >
23
+ Note that Ruby 3.2 and later have implemented a caching mechanism that
24
+ completely eliminates the worst-case time complexity for the regular
25
+ expressions flagged by this query. The regular expressions flagged by this
26
+ query are therefore only problematic for Ruby versions prior to 3.2.
27
+ </p >
28
+ <p >
29
+ Typically, a regular expression is affected by this problem if it contains
30
+ a repetition of the form <code >r*</code > or <code >r+</code > where the
31
+ sub-expression <code >r</code > is ambiguous in the sense that it can match
32
+ some string in multiple ways. More information about the precise
33
+ circumstances can be found in the references.
34
+ </p >
35
+ </overview >
36
+ <recommendation >
37
+ <p >
38
+ Modify the regular expression to remove the ambiguity, or ensure that the
39
+ strings matched with the regular expression are short enough that the
40
+ time-complexity does not matter.
41
+ </p >
42
+ </recommendation >
43
+ </qhelp >
0 commit comments