Fix catastrophic regex backtracking in shell framework#4658
Fix catastrophic regex backtracking in shell framework#4658thrix wants to merge 4 commits intoteemtee:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
The catastrophic regex backtracking is fixed by replacing re.findall with a line-by-line re.search. The added unit tests cover the fix and include a performance regression test. Two suggestions are provided to improve the new test code, focusing on type correctness and assertion specificity.
|
@gemini-code-assist please review again |
There was a problem hiding this comment.
Code Review
This pull request addresses a catastrophic backtracking issue in _extract_failures by using a line-by-line search. The added unit tests cover the changes, including a performance regression test. I have provided two suggestions for minor improvements.
6884a58 to
4063a35
Compare
|
@gemini-code-assist please review again |
There was a problem hiding this comment.
Code Review
This pull request effectively resolves a critical performance issue related to catastrophic regex backtracking by implementing a more efficient line-by-line search strategy. The accompanying unit tests are thorough and include a valuable performance regression test to prevent this issue from recurring. My feedback includes one suggestion to enhance the maintainability of the new performance test.
tests/unit/test_shell.py
Outdated
| # Must complete in under 5 seconds (the old regex took 5+ seconds | ||
| # on just 10k characters, and would never complete on 1M characters) | ||
| assert elapsed < 5.0, ( |
There was a problem hiding this comment.
5 seconds is a very arbitrary cut-off. If the old one would have finished these in 5.001 seconds, and this change brings it down to 4.999, this test does not really tell anything. Similarly if we make changes to the code and it gradually increases the test time to the point that it takes 4.999 tests still pass.
If there is log-scale changes then confirm the intended change are at least an order of magnitude difference and account more realistically buffers of what time scale it is expected
There was a problem hiding this comment.
Changed to 30 seconds — generous ceiling that still catches catastrophic backtracking (old regex never completed on 1M chars in 10s of minutes)
There was a problem hiding this comment.
# The old regex would never complete on lines this long — it took 5+
# seconds on just 10k characters.
Unless we know the scaling, how long it takes on 10k characters is unclear if it's relevant for 1M.
tests/unit/test_shell.py
Outdated
| # Build a log with a 1M-character line (similar to base64 attestation data) | ||
| long_line = "A" * 1_000_000 | ||
| log_content = f"start\n{long_line}\nsome error here\nend\n" |
There was a problem hiding this comment.
What if the long line does not have a new-line that separates from the some error part, what if it is comprised of multiple words which would match the \b?
There was a problem hiding this comment.
Added 3 parametrized long-line cases: error on separate line, error embedded within the long line (no newline separator), and word-boundary false positive withing long line
There was a problem hiding this comment.
what if it is comprised of multiple words which would match the \b?
Is not addressed, but also the time difference between them is unclear
8e59206 to
7aab157
Compare
|
@gemini-code-assist please review |
There was a problem hiding this comment.
Code Review
The pull request effectively addresses the catastrophic regex backtracking issue in _extract_failures by switching to a line-by-line search with a pre-compiled regex. This significantly improves performance and stability when processing large log files. The new unit tests, especially test_extract_failures_long_lines, provide excellent coverage for the regression and include a crucial performance assertion to prevent future issues. No further issues or improvement opportunities of medium, high, or critical severity were identified in the changes.
Replace re.findall() with a line-by-line re.search() approach in _extract_failures() to avoid catastrophic backtracking on long lines. The old regex r'.*\b(?:error|fail)\b.*' caused tmt processes to hang for hours when test output contained lines over 1M characters (e.g. base64-encoded in-toto attestation payloads from container builds). Fixes teemtee#4657 Assisted-by Claude Code
- Use Path("dummy.log") instead of MagicMock() for log_path parameter
- Make test assertions exact lists instead of length/membership checks
Assisted-by Claude Code
- Remove TestExtractFailures class, use plain test functions (happz) - Move import tmt.utils to top of file (gemini-code-assist) - Pre-compile regex as module-level FAILURE_PATTERN constant (gemini-code-assist) Assisted-by Claude Code
- Convert `_make_invocation` helper to a pytest factory fixture
- Parametrize matching tests into a single `test_extract_failures`
with descriptive IDs instead of separate test functions
- Add long-line edge cases: error embedded within a long line
(no newline separator), and word-boundary false positive
- Use 30s timeout ceiling instead of 5s to avoid CI flakiness
while still catching catastrophic backtracking
- Add comments explaining `time.monotonic()` choice
- Minor: fix comment typo ("per line" -> "per call")
Assisted-by: Claude Code
Signed-off-by: Miroslav Vadkerti <mvadkert@redhat.com>
7aab157 to
493d9a7
Compare
| assert _extract_failures(invocation, Path('dummy.log')) == [] | ||
|
|
||
|
|
||
| _LONG_LINE_CASES: list[tuple[str, str, list[str]]] = [ |
There was a problem hiding this comment.
Please no list of tuple of list of ... Make it structured in a dataclass
| ('log_template', 'expected_template'), | ||
| [(log, expected) for _, log, expected in _LONG_LINE_CASES], | ||
| ids=[name for name, _, _ in _LONG_LINE_CASES], |
There was a problem hiding this comment.
Use pytest.param to put the id next to the value. Maybe also store it in the dataclass
| @pytest.fixture | ||
| def make_invocation(): | ||
| """Factory fixture creating a mock TestInvocation returning given log content.""" | ||
|
|
||
| def _factory(log_content: str) -> MagicMock: | ||
| mock = MagicMock() | ||
| mock.phase.step.plan.execute.read.return_value = log_content | ||
| return mock | ||
|
|
||
| return _factory |
There was a problem hiding this comment.
Why the whole factory? You do not need it, let it provide just the mock object and mock the relevant part in the tests. That would make it more consistent with all other instances
| # The old regex would never complete on lines this long — it took 5+ | ||
| # seconds on just 10k characters. The splitlines approach finishes in | ||
| # well under 1 second. Use 30 seconds as a generous ceiling to avoid | ||
| # flakiness on slow CI while still catching catastrophic backtracking. |
There was a problem hiding this comment.
I don't like this. The difference is well within a single order of magnitude. It seems that the test is not catching anything significant.
There was a problem hiding this comment.
Well, here are the measurements that I got using the 50000 length:
- Old method
- Case 1: 11.335 s
- Case 2: 0.001 s
- Case 3: 49.290 s
- New method: All 0.001 s
What I would propose is documenting this difference explicitly. Choose whatever size you want and run them fully and document explicitly how much they (can) differ. E.g. in this case all of them are expected to be instantaneous in the new case so cutting in the middle, having a threshold of 0.1 s guarantees that on either side will for sure fail and for sure pass, mentioning how much the time difference is expected:
- at the time of writing the test with the old method expecting
XX s - at the time of writing the test with the current method expected below measurable
0.001 ms
Running these does highlight one weird quirk between Case 1 and Case 2. I would not have expected that the latter to be that much faster in the old method given the documentation in the test
There was a problem hiding this comment.
I ran the test locally without the code fix (to make sure it fails) and it hung until I did Ctrl+C. Gemini suggested I add @pytest.mark.timeout(5) which force fails the test after 5 (or however many) sec and it can move onto the next test. We would need pytest-timeout installed. It worked when I added it, and it should hopefully prevent our CI from getting stuck if anyone breaks the regex again.
| ( | ||
| 'long line without match followed by error line', | ||
| 'start\n{long}\nsome error here\nend\n', | ||
| ['some error here'], | ||
| ), | ||
| ( | ||
| 'error embedded in long line without newline separator', | ||
| 'start\n{long} error in the middle {long}\nend\n', | ||
| ['{long} error in the middle {long}'], | ||
| ), | ||
| ( | ||
| 'long line with word boundary - no false positive', | ||
| 'start\n{long}errorless{long}\nend\n', | ||
| [], | ||
| ), |
There was a problem hiding this comment.
This just shows that what we are doing in _extract_failures is likely wrong, e.g. the user could have something
def test_accuracy()
value = get_value()
assert abs(value - refrence) < 1E-10
print("All is good, the value is within the acceptable error bar")
Why are we catching any word error and concluding that if it is present it must be relevant. It is a very poorman's guess and the question is if anyone is actually relying on these.
Would consider adding a deprecation note and see who actually complains and/or cares about them. A 0 second evaluation is faster still
There was a problem hiding this comment.
@LecrisUT sure, but that is not for this issue, I will file a new one
| The current implementation uses ``str.splitlines()`` and per-line | ||
| ``re.search()`` which processes each line in linear time. | ||
| """ | ||
| long_segment = 'A' * 1_000_000 |
There was a problem hiding this comment.
Move this construction outside of the test, and make it more lorem-ipsum-like with more word separators and variations that it could randomly hit.
Summary
Fix
_extract_failures()intmt/frameworks/shell.pywhich causes tmt processes to hang for hours when test output contains very long lines.re.findall(r'.*\b(?:error|fail)\b.*', ...)with a line-by-linere.search()approach that avoids catastrophic regex backtracking_extract_failures()including a regression test with 1M-character linesFixes #4657
Context
Testing Farm jobs from the Hummingbird project were getting stuck in "running" state for hours. The test output files contained base64-encoded in-toto attestation payloads — single lines of 1M+ characters — causing the regex engine to backtrack indefinitely.
Detailed write-up: https://vadkerti.net/posts/debugging-stuck-tmt-processes-with-claude-code/
Test plan
hatch -e dev run -- python -m pytest tests/unit/test_shell.py -v)