Skip to content

Fix catastrophic regex backtracking in shell framework#4658

Open
thrix wants to merge 4 commits intoteemtee:mainfrom
thrix:fix-shell-regex-backtracking
Open

Fix catastrophic regex backtracking in shell framework#4658
thrix wants to merge 4 commits intoteemtee:mainfrom
thrix:fix-shell-regex-backtracking

Conversation

@thrix
Copy link
Contributor

@thrix thrix commented Mar 6, 2026

Summary

Fix _extract_failures() in tmt/frameworks/shell.py which causes tmt processes to hang for hours when test output contains very long lines.

  • Replace re.findall(r'.*\b(?:error|fail)\b.*', ...) with a line-by-line re.search() approach that avoids catastrophic regex backtracking
  • Add unit tests for _extract_failures() including a regression test with 1M-character lines

Fixes #4657

Context

Testing Farm jobs from the Hummingbird project were getting stuck in "running" state for hours. The test output files contained base64-encoded in-toto attestation payloads — single lines of 1M+ characters — causing the regex engine to backtrack indefinitely.

Detailed write-up: https://vadkerti.net/posts/debugging-stuck-tmt-processes-with-claude-code/

Test plan

  • Unit tests pass (hatch -e dev run -- python -m pytest tests/unit/test_shell.py -v)
  • Existing shell framework integration tests still pass

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The catastrophic regex backtracking is fixed by replacing re.findall with a line-by-line re.search. The added unit tests cover the fix and include a performance regression test. Two suggestions are provided to improve the new test code, focusing on type correctness and assertion specificity.

@thrix
Copy link
Contributor Author

thrix commented Mar 6, 2026

@gemini-code-assist please review again

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a catastrophic backtracking issue in _extract_failures by using a line-by-line search. The added unit tests cover the changes, including a performance regression test. I have provided two suggestions for minor improvements.

@thrix thrix added the ci | full test Pull request is ready for the full test execution label Mar 6, 2026
@thrix thrix requested a review from happz March 9, 2026 13:19
@thrix thrix force-pushed the fix-shell-regex-backtracking branch from 6884a58 to 4063a35 Compare March 9, 2026 13:19
@thrix
Copy link
Contributor Author

thrix commented Mar 9, 2026

@gemini-code-assist please review again

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively resolves a critical performance issue related to catastrophic regex backtracking by implementing a more efficient line-by-line search strategy. The accompanying unit tests are thorough and include a valuable performance regression test to prevent this issue from recurring. My feedback includes one suggestion to enhance the maintainability of the new performance test.

@thrix thrix requested review from LecrisUT and psss March 9, 2026 15:00
@happz happz added this to planning Mar 11, 2026
@github-project-automation github-project-automation bot moved this to backlog in planning Mar 11, 2026
@happz happz moved this from backlog to review in planning Mar 11, 2026
@happz happz moved this from review to implement in planning Mar 11, 2026
Comment on lines +77 to +79
# Must complete in under 5 seconds (the old regex took 5+ seconds
# on just 10k characters, and would never complete on 1M characters)
assert elapsed < 5.0, (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 seconds is a very arbitrary cut-off. If the old one would have finished these in 5.001 seconds, and this change brings it down to 4.999, this test does not really tell anything. Similarly if we make changes to the code and it gradually increases the test time to the point that it takes 4.999 tests still pass.

If there is log-scale changes then confirm the intended change are at least an order of magnitude difference and account more realistically buffers of what time scale it is expected

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to 30 seconds — generous ceiling that still catches catastrophic backtracking (old regex never completed on 1M chars in 10s of minutes)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    # The old regex would never complete on lines this long — it took 5+
    # seconds on just 10k characters.

Unless we know the scaling, how long it takes on 10k characters is unclear if it's relevant for 1M.

Comment on lines +67 to +69
# Build a log with a 1M-character line (similar to base64 attestation data)
long_line = "A" * 1_000_000
log_content = f"start\n{long_line}\nsome error here\nend\n"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the long line does not have a new-line that separates from the some error part, what if it is comprised of multiple words which would match the \b?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 3 parametrized long-line cases: error on separate line, error embedded within the long line (no newline separator), and word-boundary false positive withing long line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it is comprised of multiple words which would match the \b?

Is not addressed, but also the time difference between them is unclear

@psss psss added this to the 1.70 milestone Mar 16, 2026
@thrix thrix force-pushed the fix-shell-regex-backtracking branch from 8e59206 to 7aab157 Compare March 20, 2026 14:54
@thrix thrix requested a review from LecrisUT March 20, 2026 14:57
@thrix
Copy link
Contributor Author

thrix commented Mar 20, 2026

@gemini-code-assist please review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request effectively addresses the catastrophic regex backtracking issue in _extract_failures by switching to a line-by-line search with a pre-compiled regex. This significantly improves performance and stability when processing large log files. The new unit tests, especially test_extract_failures_long_lines, provide excellent coverage for the regression and include a crucial performance assertion to prevent future issues. No further issues or improvement opportunities of medium, high, or critical severity were identified in the changes.

thrix added 4 commits March 20, 2026 16:23
Replace re.findall() with a line-by-line re.search() approach in
_extract_failures() to avoid catastrophic backtracking on long lines.

The old regex r'.*\b(?:error|fail)\b.*' caused tmt processes to hang
for hours when test output contained lines over 1M characters (e.g.
base64-encoded in-toto attestation payloads from container builds).

Fixes teemtee#4657

Assisted-by Claude Code
- Use Path("dummy.log") instead of MagicMock() for log_path parameter
- Make test assertions exact lists instead of length/membership checks

Assisted-by Claude Code
- Remove TestExtractFailures class, use plain test functions (happz)
- Move import tmt.utils to top of file (gemini-code-assist)
- Pre-compile regex as module-level FAILURE_PATTERN constant (gemini-code-assist)

Assisted-by Claude Code
- Convert `_make_invocation` helper to a pytest factory fixture
- Parametrize matching tests into a single `test_extract_failures`
  with descriptive IDs instead of separate test functions
- Add long-line edge cases: error embedded within a long line
  (no newline separator), and word-boundary false positive
- Use 30s timeout ceiling instead of 5s to avoid CI flakiness
  while still catching catastrophic backtracking
- Add comments explaining `time.monotonic()` choice
- Minor: fix comment typo ("per line" -> "per call")

Assisted-by: Claude Code
Signed-off-by: Miroslav Vadkerti <mvadkert@redhat.com>
@thrix thrix force-pushed the fix-shell-regex-backtracking branch from 7aab157 to 493d9a7 Compare March 20, 2026 15:23
assert _extract_failures(invocation, Path('dummy.log')) == []


_LONG_LINE_CASES: list[tuple[str, str, list[str]]] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please no list of tuple of list of ... Make it structured in a dataclass

Comment on lines +79 to +81
('log_template', 'expected_template'),
[(log, expected) for _, log, expected in _LONG_LINE_CASES],
ids=[name for name, _, _ in _LONG_LINE_CASES],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use pytest.param to put the id next to the value. Maybe also store it in the dataclass

Comment on lines +11 to +20
@pytest.fixture
def make_invocation():
"""Factory fixture creating a mock TestInvocation returning given log content."""

def _factory(log_content: str) -> MagicMock:
mock = MagicMock()
mock.phase.step.plan.execute.read.return_value = log_content
return mock

return _factory
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the whole factory? You do not need it, let it provide just the mock object and mock the relevant part in the tests. That would make it more consistent with all other instances

Comment on lines +113 to +116
# The old regex would never complete on lines this long — it took 5+
# seconds on just 10k characters. The splitlines approach finishes in
# well under 1 second. Use 30 seconds as a generous ceiling to avoid
# flakiness on slow CI while still catching catastrophic backtracking.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this. The difference is well within a single order of magnitude. It seems that the test is not catching anything significant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, here are the measurements that I got using the 50000 length:

  • Old method
    • Case 1: 11.335 s
    • Case 2: 0.001 s
    • Case 3: 49.290 s
  • New method: All 0.001 s

What I would propose is documenting this difference explicitly. Choose whatever size you want and run them fully and document explicitly how much they (can) differ. E.g. in this case all of them are expected to be instantaneous in the new case so cutting in the middle, having a threshold of 0.1 s guarantees that on either side will for sure fail and for sure pass, mentioning how much the time difference is expected:

  • at the time of writing the test with the old method expecting XX s
  • at the time of writing the test with the current method expected below measurable 0.001 ms

Running these does highlight one weird quirk between Case 1 and Case 2. I would not have expected that the latter to be that much faster in the old method given the documentation in the test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran the test locally without the code fix (to make sure it fails) and it hung until I did Ctrl+C. Gemini suggested I add @pytest.mark.timeout(5) which force fails the test after 5 (or however many) sec and it can move onto the next test. We would need pytest-timeout installed. It worked when I added it, and it should hopefully prevent our CI from getting stuck if anyone breaks the regex again.

Comment on lines +60 to +74
(
'long line without match followed by error line',
'start\n{long}\nsome error here\nend\n',
['some error here'],
),
(
'error embedded in long line without newline separator',
'start\n{long} error in the middle {long}\nend\n',
['{long} error in the middle {long}'],
),
(
'long line with word boundary - no false positive',
'start\n{long}errorless{long}\nend\n',
[],
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This just shows that what we are doing in _extract_failures is likely wrong, e.g. the user could have something

def test_accuracy()
    value = get_value()
    assert abs(value - refrence) < 1E-10
    print("All is good, the value is within the acceptable error bar") 

Why are we catching any word error and concluding that if it is present it must be relevant. It is a very poorman's guess and the question is if anyone is actually relying on these.

Would consider adding a deprecation note and see who actually complains and/or cares about them. A 0 second evaluation is faster still

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LecrisUT sure, but that is not for this issue, I will file a new one

The current implementation uses ``str.splitlines()`` and per-line
``re.search()`` which processes each line in linear time.
"""
long_segment = 'A' * 1_000_000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this construction outside of the test, and make it more lorem-ipsum-like with more word separators and variations that it could randomly hit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci | full test Pull request is ready for the full test execution

Projects

Status: implement

Development

Successfully merging this pull request may close these issues.

Catastrophic regex backtracking in shell framework _extract_failures on long lines

5 participants