Commit 48a162c
authored
Fix longstanding regex interpreter bug around lazy loops with empty matches (#120872)
This has been an issue in the interpreter forever, as far as I can tell.
We've had multiple issues over the years all flagging problems with
different symptoms that stem from the same core problem.
Fixes #43314
Fixes #58786
Fixes #63385
Fixes #111051
Fixes #114626
The problem came down to how the regex interpreter handled lazy
quantifiers over expressions that can match the empty string. When the
interpreter reaches one of these lazy loops, it uses an internal
instruction called `Lazybranchmark` to manage entering and potentially
looping the subexpression. To keep track of loop state and capture
boundaries, the interpreter uses two internal stacks, a grouping stack,
that tracks positions relevant to capturing groups (e.g. where a group
started), and a backtracking stack, that tracks states that are needed
if the engine has to go back and try a different match. The bug occurred
in the case when the subpattern inside the lazy loop matches nothing. In
this case, the interpreter unconditionally pushed a placeholder onto the
grouping stack. If the rest of the pattern then succeeded without
backtracking through this loop, that extra placeholder remained on the
grouping stack. This polluted the capture bookkeeping: later parts of
the pattern popped that placeholder, treating it as a real start
position, and shifted captures to the wrong place.
The fix is to stop pushing onto the grouping stack when the loop matches
empty. Instead, the interpreter records two things on the backtracking
stack: the old group boundary and a flag indicating whether the grouping
stack needs to be popped later. If the interpreter ends up backtracking
through this lazy loop, it checks the flag: if a grouping stack entry
was added earlier, it pops it; if not, it leaves the grouping stack
untouched. This keeps the grouping stack and backtracking stack in sync
in both forward and backtracking paths. As a result, empty lazy loops no
longer leave stray entries on the grouping stack. This also prevents the
unbounded stack growth that previously caused overflows or hangs on some
patterns involving nested lazy quantifiers.1 parent cd66519 commit 48a162c
File tree
3 files changed
+60
-11
lines changed- src/libraries/System.Text.RegularExpressions
- src/System/Text/RegularExpressions
- tests/FunctionalTests
3 files changed
+60
-11
lines changedLines changed: 20 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
521 | 521 | | |
522 | 522 | | |
523 | 523 | | |
524 | | - | |
525 | | - | |
526 | | - | |
527 | | - | |
528 | | - | |
529 | | - | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
530 | 528 | | |
531 | 529 | | |
532 | 530 | | |
| |||
541 | 539 | | |
542 | 540 | | |
543 | 541 | | |
544 | | - | |
| 542 | + | |
| 543 | + | |
| 544 | + | |
| 545 | + | |
| 546 | + | |
545 | 547 | | |
546 | 548 | | |
547 | 549 | | |
| |||
551 | 553 | | |
552 | 554 | | |
553 | 555 | | |
554 | | - | |
555 | | - | |
556 | | - | |
| 556 | + | |
| 557 | + | |
| 558 | + | |
| 559 | + | |
| 560 | + | |
| 561 | + | |
| 562 | + | |
| 563 | + | |
| 564 | + | |
| 565 | + | |
| 566 | + | |
557 | 567 | | |
558 | 568 | | |
559 | 569 | | |
| |||
Lines changed: 20 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
329 | 329 | | |
330 | 330 | | |
331 | 331 | | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
332 | 352 | | |
333 | 353 | | |
334 | 354 | | |
| |||
Lines changed: 20 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
452 | 452 | | |
453 | 453 | | |
454 | 454 | | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
455 | 464 | | |
456 | 465 | | |
457 | | - | |
| 466 | + | |
458 | 467 | | |
459 | 468 | | |
460 | 469 | | |
| |||
463 | 472 | | |
464 | 473 | | |
465 | 474 | | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
466 | 485 | | |
467 | 486 | | |
468 | 487 | | |
| |||
0 commit comments