Skip to content

Commit 22aa4c9

Browse files
authored
Python: Fix performance issue in charSet
Observed on `mozilla/bugbug` on the 2.8.0 CLI branch, we had the following line in the timing report: ``` FullServerSideRequestForgery.ql-17:regex::RegexString::charSet_dispred#fff#antijoin_rhs ............... 1m13s ``` Inspecting the logs, we see the following join: ``` (644s) Tuple counts for regex::RegexString::charSet_dispred#fff#antijoin_rhs/5@f295d1bk after 1m13s: 1 ~0% {1} r1 = CONSTANT(unique string)["]"] 2389 ~4% {3} r2 = JOIN r1 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_201#join_rhs ON FIRST 1 OUTPUT Rhs.1 'arg0', Rhs.2 'arg1', (Rhs.2 'arg1' + 1) 668873 ~0% {6} r3 = JOIN r2 WITH regex::RegexString::char_set_start_dispred#fff ON FIRST 1 OUTPUT Lhs.0 'arg0', "]", Lhs.1 'arg1', Lhs.2 'arg2', Rhs.1 'arg3', Rhs.2 'arg4' 537501371 ~4% {7} r4 = JOIN r3 WITH regex::RegexString::nonEscapedCharAt_dispred#fff_021#join_rhs ON FIRST 2 OUTPUT Lhs.0 'arg0', Lhs.2 'arg1', Lhs.3 'arg2', Lhs.4 'arg3', Lhs.5 'arg4', "]", Rhs.2 269085087 ~0% {7} r5 = SELECT r4 ON In.6 > In.4 'arg4' 89583155 ~3% {7} r6 = SELECT r5 ON In.6 < In.1 'arg1' 89583155 ~26634% {5} r7 = SCAN r6 OUTPUT In.0 'arg0', In.1 'arg1', In.2 'arg2', In.3 'arg3', In.4 'arg4' return r7 ``` Now, this is problematic not just because of the large intermediary join but also because of the large number of tuples being materialised at the end. The culprit in this case turns out to be this bit of `charSet`: ``` not exists(int mid | this.nonEscapedCharAt(mid) = "]" | mid > inner_start and mid < inner_end) ``` Rewriting this to instead look for the minimum index at which a `]` appears resulted in a much nicer join. I also fixed up a similar issue surrounding the `\N` unicode escape. Not that I think this will necessarily be relevant, but the `min`-based solution is more robust either way.
1 parent e93c46a commit 22aa4c9

File tree

1 file changed

+2
-4
lines changed

1 file changed

+2
-4
lines changed

python/ql/lib/semmle/python/regex.qll

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -204,8 +204,7 @@ abstract class RegexString extends Expr {
204204
|
205205
end = inner_end + 1 and
206206
inner_end > inner_start and
207-
this.nonEscapedCharAt(inner_end) = "]" and
208-
not exists(int mid | this.nonEscapedCharAt(mid) = "]" | mid > inner_start and mid < inner_end)
207+
inner_end = min(int i | this.nonEscapedCharAt(i) = "]" and inner_start < i)
209208
)
210209
}
211210

@@ -344,9 +343,8 @@ abstract class RegexString extends Expr {
344343
this.escapingChar(start) and
345344
this.getChar(start + 1) = "N" and
346345
this.getChar(start + 2) = "{" and
347-
this.getChar(end - 1) = "}" and
348346
end > start and
349-
not exists(int i | start + 2 < i and i < end - 1 | this.getChar(i) = "}")
347+
end - 1 = min(int i | start + 2 < i and this.getChar(i) = "}")
350348
}
351349

352350
/**

0 commit comments

Comments
 (0)