Fix catastrophic regex backtracking in shell framework by thrix · Pull Request #4658 · teemtee/tmt

thrix · 2026-03-06T15:14:21Z

Summary

Fix _extract_failures() in tmt/frameworks/shell.py which causes tmt processes to hang for hours when test output contains very long lines.

Replace re.findall(r'.*\b(?:error|fail)\b.*', ...) with a line-by-line re.search() approach that avoids catastrophic regex backtracking
Add unit tests for _extract_failures() including a regression test with 1M-character lines

Fixes #4657

Context

Testing Farm jobs from the Hummingbird project were getting stuck in "running" state for hours. The test output files contained base64-encoded in-toto attestation payloads — single lines of 1M+ characters — causing the regex engine to backtrack indefinitely.

Detailed write-up: https://vadkerti.net/posts/debugging-stuck-tmt-processes-with-claude-code/

Test plan

Unit tests pass (hatch -e dev run -- python -m pytest tests/unit/test_shell.py -v)
Existing shell framework integration tests still pass

gemini-code-assist

Code Review

The catastrophic regex backtracking is fixed by replacing re.findall with a line-by-line re.search. The added unit tests cover the fix and include a performance regression test. Two suggestions are provided to improve the new test code, focusing on type correctness and assertion specificity.

tests/unit/test_shell.py

thrix · 2026-03-06T15:37:17Z

@gemini-code-assist please review again

gemini-code-assist

Code Review

This pull request addresses a catastrophic backtracking issue in _extract_failures by using a line-by-line search. The added unit tests cover the changes, including a performance regression test. I have provided two suggestions for minor improvements.

tests/unit/test_shell.py

tmt/frameworks/shell.py

tests/unit/test_shell.py

thrix · 2026-03-09T13:19:58Z

@gemini-code-assist please review again

gemini-code-assist

Code Review

This pull request effectively resolves a critical performance issue related to catastrophic regex backtracking by implementing a more efficient line-by-line search strategy. The accompanying unit tests are thorough and include a valuable performance regression test to prevent this issue from recurring. My feedback includes one suggestion to enhance the maintainability of the new performance test.

tests/unit/test_shell.py

LecrisUT · 2026-03-12T14:48:20Z

tests/unit/test_shell.py

+    # Must complete in under 5 seconds (the old regex took 5+ seconds
+    # on just 10k characters, and would never complete on 1M characters)
+    assert elapsed < 5.0, (


5 seconds is a very arbitrary cut-off. If the old one would have finished these in 5.001 seconds, and this change brings it down to 4.999, this test does not really tell anything. Similarly if we make changes to the code and it gradually increases the test time to the point that it takes 4.999 tests still pass.

If there is log-scale changes then confirm the intended change are at least an order of magnitude difference and account more realistically buffers of what time scale it is expected

Changed to 30 seconds — generous ceiling that still catches catastrophic backtracking (old regex never completed on 1M chars in 10s of minutes)

# The old regex would never complete on lines this long — it took 5+ # seconds on just 10k characters.

Unless we know the scaling, how long it takes on 10k characters is unclear if it's relevant for 1M.

LecrisUT · 2026-03-12T14:49:50Z

tests/unit/test_shell.py

+    # Build a log with a 1M-character line (similar to base64 attestation data)
+    long_line = "A" * 1_000_000
+    log_content = f"start\n{long_line}\nsome error here\nend\n"


What if the long line does not have a new-line that separates from the some error part, what if it is comprised of multiple words which would match the \b?

Added 3 parametrized long-line cases: error on separate line, error embedded within the long line (no newline separator), and word-boundary false positive withing long line

what if it is comprised of multiple words which would match the \b?

Is not addressed, but also the time difference between them is unclear

thrix · 2026-03-20T14:57:45Z

@gemini-code-assist please review

gemini-code-assist

Code Review

The pull request effectively addresses the catastrophic regex backtracking issue in _extract_failures by switching to a line-by-line search with a pre-compiled regex. This significantly improves performance and stability when processing large log files. The new unit tests, especially test_extract_failures_long_lines, provide excellent coverage for the regression and include a crucial performance assertion to prevent future issues. No further issues or improvement opportunities of medium, high, or critical severity were identified in the changes.

Replace re.findall() with a line-by-line re.search() approach in _extract_failures() to avoid catastrophic backtracking on long lines. The old regex r'.*\b(?:error|fail)\b.*' caused tmt processes to hang for hours when test output contained lines over 1M characters (e.g. base64-encoded in-toto attestation payloads from container builds). Fixes teemtee#4657 Assisted-by Claude Code

- Use Path("dummy.log") instead of MagicMock() for log_path parameter - Make test assertions exact lists instead of length/membership checks Assisted-by Claude Code

- Remove TestExtractFailures class, use plain test functions (happz) - Move import tmt.utils to top of file (gemini-code-assist) - Pre-compile regex as module-level FAILURE_PATTERN constant (gemini-code-assist) Assisted-by Claude Code

- Convert `_make_invocation` helper to a pytest factory fixture - Parametrize matching tests into a single `test_extract_failures` with descriptive IDs instead of separate test functions - Add long-line edge cases: error embedded within a long line (no newline separator), and word-boundary false positive - Use 30s timeout ceiling instead of 5s to avoid CI flakiness while still catching catastrophic backtracking - Add comments explaining `time.monotonic()` choice - Minor: fix comment typo ("per line" -> "per call") Assisted-by: Claude Code Signed-off-by: Miroslav Vadkerti <mvadkert@redhat.com>

LecrisUT · 2026-03-20T15:45:53Z

tests/unit/test_shell.py

+    assert _extract_failures(invocation, Path('dummy.log')) == []
+
+
+_LONG_LINE_CASES: list[tuple[str, str, list[str]]] = [


Please no list of tuple of list of ... Make it structured in a dataclass

LecrisUT · 2026-03-20T15:49:00Z

tests/unit/test_shell.py

+    ('log_template', 'expected_template'),
+    [(log, expected) for _, log, expected in _LONG_LINE_CASES],
+    ids=[name for name, _, _ in _LONG_LINE_CASES],


Use pytest.param to put the id next to the value. Maybe also store it in the dataclass

LecrisUT · 2026-03-20T15:51:52Z

tests/unit/test_shell.py

+@pytest.fixture
+def make_invocation():
+    """Factory fixture creating a mock TestInvocation returning given log content."""
+
+    def _factory(log_content: str) -> MagicMock:
+        mock = MagicMock()
+        mock.phase.step.plan.execute.read.return_value = log_content
+        return mock
+
+    return _factory


Why the whole factory? You do not need it, let it provide just the mock object and mock the relevant part in the tests. That would make it more consistent with all other instances

LecrisUT · 2026-03-20T15:53:17Z

tests/unit/test_shell.py

+    # The old regex would never complete on lines this long — it took 5+
+    # seconds on just 10k characters. The splitlines approach finishes in
+    # well under 1 second. Use 30 seconds as a generous ceiling to avoid
+    # flakiness on slow CI while still catching catastrophic backtracking.


I don't like this. The difference is well within a single order of magnitude. It seems that the test is not catching anything significant.

Well, here are the measurements that I got using the 50000 length:

Old method

Case 1: 11.335 s

Case 2: 0.001 s

Case 3: 49.290 s

New method: All 0.001 s

What I would propose is documenting this difference explicitly. Choose whatever size you want and run them fully and document explicitly how much they (can) differ. E.g. in this case all of them are expected to be instantaneous in the new case so cutting in the middle, having a threshold of 0.1 s guarantees that on either side will for sure fail and for sure pass, mentioning how much the time difference is expected:

at the time of writing the test with the old method expecting XX s

at the time of writing the test with the current method expected below measurable 0.001 ms

Running these does highlight one weird quirk between Case 1 and Case 2. I would not have expected that the latter to be that much faster in the old method given the documentation in the test

I ran the test locally without the code fix (to make sure it fails) and it hung until I did Ctrl+C. Gemini suggested I add @pytest.mark.timeout(5) which force fails the test after 5 (or however many) sec and it can move onto the next test. We would need pytest-timeout installed. It worked when I added it, and it should hopefully prevent our CI from getting stuck if anyone breaks the regex again.

LecrisUT · 2026-03-20T16:00:24Z

tests/unit/test_shell.py

+    (
+        'long line without match followed by error line',
+        'start\n{long}\nsome error here\nend\n',
+        ['some error here'],
+    ),
+    (
+        'error embedded in long line without newline separator',
+        'start\n{long} error in the middle {long}\nend\n',
+        ['{long} error in the middle {long}'],
+    ),
+    (
+        'long line with word boundary - no false positive',
+        'start\n{long}errorless{long}\nend\n',
+        [],
+    ),


This just shows that what we are doing in _extract_failures is likely wrong, e.g. the user could have something

def test_accuracy() value = get_value() assert abs(value - refrence) < 1E-10 print("All is good, the value is within the acceptable error bar")

Why are we catching any word error and concluding that if it is present it must be relevant. It is a very poorman's guess and the question is if anyone is actually relying on these.

Would consider adding a deprecation note and see who actually complains and/or cares about them. A 0 second evaluation is faster still

@LecrisUT sure, but that is not for this issue, I will file a new one

LecrisUT · 2026-03-20T16:01:59Z

tests/unit/test_shell.py

+    The current implementation uses ``str.splitlines()`` and per-line
+    ``re.search()`` which processes each line in linear time.
+    """
+    long_segment = 'A' * 1_000_000


Move this construction outside of the test, and make it more lorem-ipsum-like with more word separators and variations that it could randomly hit.

gemini-code-assist bot reviewed Mar 6, 2026

View reviewed changes

tests/unit/test_shell.py Outdated Show resolved Hide resolved

tests/unit/test_shell.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Mar 6, 2026

View reviewed changes

tests/unit/test_shell.py Outdated Show resolved Hide resolved

tmt/frameworks/shell.py Outdated Show resolved Hide resolved

thrix added the ci | full test Pull request is ready for the full test execution label Mar 6, 2026

happz reviewed Mar 6, 2026

View reviewed changes

tests/unit/test_shell.py Outdated Show resolved Hide resolved

thrix requested a review from happz March 9, 2026 13:19

thrix force-pushed the fix-shell-regex-backtracking branch from 6884a58 to 4063a35 Compare March 9, 2026 13:19

gemini-code-assist bot reviewed Mar 9, 2026

View reviewed changes

tests/unit/test_shell.py Outdated Show resolved Hide resolved

thrix requested review from LecrisUT and psss March 9, 2026 15:00

happz added this to planning Mar 11, 2026

github-project-automation bot moved this to backlog in planning Mar 11, 2026

happz moved this from backlog to review in planning Mar 11, 2026

happz moved this from review to implement in planning Mar 11, 2026

LecrisUT reviewed Mar 12, 2026

View reviewed changes

LecrisUT assigned LecrisUT and tcornell-bus Mar 13, 2026

psss added this to the 1.70 milestone Mar 16, 2026

thrix force-pushed the fix-shell-regex-backtracking branch from 8e59206 to 7aab157 Compare March 20, 2026 14:54

thrix requested a review from LecrisUT March 20, 2026 14:57

gemini-code-assist bot reviewed Mar 20, 2026

View reviewed changes

thrix added 4 commits March 20, 2026 16:23

Address review comments: use Path for log_path, exact assertions

17c2851

- Use Path("dummy.log") instead of MagicMock() for log_path parameter - Make test assertions exact lists instead of length/membership checks Assisted-by Claude Code

thrix force-pushed the fix-shell-regex-backtracking branch from 7aab157 to 493d9a7 Compare March 20, 2026 15:23

LecrisUT reviewed Mar 20, 2026

View reviewed changes

		assert _extract_failures(invocation, Path('dummy.log')) == []


		_LONG_LINE_CASES: list[tuple[str, str, list[str]]] = [

Conversation

thrix commented Mar 6, 2026

Summary

Context

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

thrix commented Mar 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

thrix commented Mar 9, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thrix commented Mar 20, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants