Skip to content

[autorevert] implement autorevert and fix detection logic #6983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 12, 2025

Conversation

izaitsevfb
Copy link
Contributor

@izaitsevfb izaitsevfb commented Aug 8, 2025

Summary

  • Implemented revert detection/recording
  • Implemented failure-only rule matching in the autorevert detector to prevent “success” jobs with a classification label from contaminating pattern detection
  • Added a unit test

Bug Fixed

  • Cause: The detector previously matched on classification_rule regardless of
    job conclusion. Baseline commit 33ec6e3 had multiple “success” shards labele
    d with rule='pytest failure', which the detector misread as “older commit alre
    ady has the same failure,” suppressing the pattern for bbc0df1/4fd5fab.
  • Fix: Require conclusion == 'failure' wherever the detector compares rules (b
    oth for newer commit confirmation and older baseline exclusion). This prevents n
    oise from success+rule rows and correctly flags commit-caused failures like the
    ROCm case.

Testing

python -m pytorch_auto_revert autorevert-checker rocm --hours 82 --do-restart --dry-run
python -m pytorch_auto_revert autorevert-checker rocm --hours 82 --do-restart --dry-run
Fetching workflow data for 1 workflows since 2025-08-04T08:56:25.851470...
Found 161 commits with job data for workflow 'rocm'
✓ 3 AUTOREVERT PATTERNS DETECTED

Pattern #1:
Failure rule: 'pytest failure'
Recent commits with failure: bdb07a2b 8085edc8
Older commit without failure: 41081276
✗ NOT REVERTED: 8085edc8f9c98f670f585586b4286a942927537a was not reverted
  ⟳ DRY RUN: Would restart rocm for 8085edc8
  ⟳ DRY RUN: Would restart rocm for 41081276

Pattern #2:
Failure rule: 'pytest failure'
Recent commits with failure: 908c5cc4 b6c53383
Older commit without failure: 33ec6e3e
✗ NOT REVERTED: b6c53383fe2f29e6ed35430e90867dbeb8980d42 was not reverted
  ⟳ DRY RUN: Would restart rocm for b6c53383
  ⟳ DRY RUN: Would restart rocm for 33ec6e3e

Pattern #3:
Failure rule: 'pytest failure'
Recent commits with failure: 4fd5fabe bbc0df10
Older commit without failure: efc4b460
✓ REVERTED (nosignal): bbc0df1094b5a4dcd2cce83f8402127b07913231 was reverted by 41081276 after 18.5 hours

==================================================
SUMMARY STATISTICS
==================================================
Workflow(s): rocm
Timeframe: 82 hours
Commits checked: 161
Auto revert patterns detected: 3
Actual reverts inside auto revert patterns detected (precision): 1 (33.3%)
Total revert commits in period: 9

Revert categories:
  nosignal: 5 (55.6%)
  ignoredsignal: 2 (22.2%)
  ghfirst: 2 (22.2%)

Total reverts excluding ghfirst: 7
Reverts (excluding ghfirst) that dont match any auto revert pattern detected (recall): 6 (85.7%)
Per workflow precision:
  rocm: 1 reverts out of 3 patterns (33.3%) [excluding ghfirst: 1 (33.3%)]

Reverted patterns:
  - pytest failure: bbc0df10 (nosignal)

Restarted workflows: 4
  - rocm for 8085edc8
  - rocm for 41081276
  - rocm for b6c53383
  - rocm for 33ec6e3e

the actual culprit was correctly identified:

Pattern #7:
Failure rule: 'pytest failure'
Recent commits with failure: 4fd5fabe bbc0df10
Older commit without failure: efc4b460
✓ REVERTED (nosignal): bbc0df1094b5a4dcd2cce83f8402127b07913231 was reverted by 41081276 after 18.5 hours

there are multiple patterns detected, because the failure was jumping across workflows: rocm and rocm-mi300

@pytorch-bot pytorch-bot bot added the ci-no-td label Aug 8, 2025
Copy link

vercel bot commented Aug 8, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Project Deployment Preview Updated (UTC)
torchci ⬜️ Ignored Preview Aug 12, 2025 0:16am

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 8, 2025
@izaitsevfb izaitsevfb force-pushed the autorevert-do-autorevert branch from 2891ee8 to 1508f12 Compare August 8, 2025 01:57
@@ -230,6 +256,25 @@ def _find_last_commit_with_job(
)
return None, None

def _find_last_commit_with_rule(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function is not called anywhere, what is its purpose?

@@ -74,13 +84,24 @@ def __init__(
self._workflow_commits_cache: Dict[str, List[CommitJobs]] = {}
self._commit_history = None
self._ignore_classification_rules = ignore_classification_rules or set()
# Controls whether queries target restarted runs only (workflow_dispatch/tagged trunk/<sha>)
self._use_restarted_runs_only = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a variable, if it never changes?

query = """
base_where = (
"workflow_event = 'workflow_dispatch' AND head_branch LIKE 'trunk/%'"
if self._use_restarted_runs_only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

condition is never true

@@ -311,25 +359,61 @@ def detect_autorevert_pattern_workflow(self, workflow_name: str) -> List[Dict]:
)

if not last_commit_with_same_job or not last_same_jobs:
# No older commit with the same job found
# No older commit with any jobs found
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function _fild_last_commit_with_job is expected to only return the commit with the given job. Not any job.

If a commit ran some jobs, but not that specifically being checked on job_name it should be skipped in favor of finding the next one.

So, I believe the fix in the comment here is misplaced.

If there is a bug, maybe we should fix in the function itself? But I re-read it and could not pinpoint a problem.

https://github.com/pytorch/test-infra/pull/6983/files#diff-7174b1a731f38e3efb2765fac87a65ce7835d26a68cccd9c1e329e0b2070f1e2R234

continue

# Ensure there is some overlap in job coverage between suspected and older commit
older_coverage = list(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can only happen because you removed the check in https://github.com/pytorch/test-infra/pull/6983/files#diff-7174b1a731f38e3efb2765fac87a65ce7835d26a68cccd9c1e329e0b2070f1e2L285

If you keep the check, it is guaranteed that the job_name is present in both commits. This is due checks in _fild_last_commit_with_job

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the check is different (older commit vs newer commit), but your point is correct

self, workflow_name: str, sha: str
) -> Optional[CommitJobs]:
"""Return CommitJobs for a workflow and head_sha if present in cache."""
for cj in self.get_workflow_commits(workflow_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this loops happens inside a loop, I suspect that when running it for a long period like 2 years for evaluation this would be a significant bottleneck in the code due the quadratic nature.

Maybe the search here could leverage a map...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

trie would be even better for prefix search, but I don't think that would be a bottleneck

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove this function, it is not being used anywhere

# No overlapping jobs -> insufficient comparable signal
continue

# Cross-workflow baseline check: if multiple workflows were provided,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you post stats with and without this? I suspect this might be overkill and removing lots of possible commits from the list.

But need to do other fixes before running it...

Copy link
Contributor Author

@izaitsevfb izaitsevfb Aug 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no point, it's slop, removed

# Secondary verification: compare first failing vs previous on restarted runs.
if do_revert:
# Best-effort; skip if query fails or restarted runs not yet present
with suppress(Exception):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OMG

I do not believe we're suppressing all exceptions and not printing anything when one occurs, for a production code.

Smells like vibe coding, and using old versions of llama :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gpt-5 actually!

@jeanschmidt
Copy link
Contributor

left a few comments, they are major IMO. Lets work on those and evaluate stats changes before merging this change.

Overall great catch, seems a problematic bug :P

# Conflicts:
#	aws/lambda/pytorch-auto-revert/pytorch_auto_revert/autorevert_checker.py
@izaitsevfb
Copy link
Contributor Author

@jeanschmidt, thanks for the review! I removed all unnecessary changes, preserving the fix and the "do revert" functionality. please take a look

@jeanschmidt
Copy link
Contributor

I double-checked the stats and they are misleading. Not that the numbers are not relevant, but the names and values displayed don't match.

So I recalculated precision, recall and f1 scores. (What we want)

And kept the other values that use non-ghf signals only for inverse-recall.

stats are now:

==================================================
SUMMARY STATISTICS
==================================================
Workflow(s): Lint, trunk, pull, inductor, linux-binary-manywheel
Timeframe: 4380 hours
Commits checked: 34040
Auto revert patterns detected: 664
Actual reverts inside auto revert patterns detected (%): 184 (27.7%)
Total revert commits in period: 589

Revert categories:
  nosignal: 209 (35.5%)
  ghfirst: 153 (26.0%)
  uncategorized: 101 (17.1%)
  ignoredsignal: 70 (11.9%)
  weird: 45 (7.6%)
  landrace: 11 (1.9%)

Total reverts excluding ghfirst: 436
Reverts (excluding ghfirst) that dont match any auto revert pattern detected (%): (296) (64.0%)

*********************************************************************
STATS SUMMARY:
 PRECISION: 27.7%
 RECALL: 31.2%
 F1: 29.4%
*********************************************************************

Per workflow precision:
  Lint: 47 reverts out of 67 patterns (70.1%) [excluding ghfirst: 43 (64.2%)]
  trunk: 30 reverts out of 79 patterns (38.0%) [excluding ghfirst: 29 (36.7%)]
  pull: 74 reverts out of 359 patterns (20.6%) [excluding ghfirst: 67 (18.7%)]
  inductor: 31 reverts out of 151 patterns (20.5%) [excluding ghfirst: 29 (19.2%)]
  linux-binary-manywheel: 2 reverts out of 8 patterns (25.0%) [excluding ghfirst: 1 (12.5%)]

@jeanschmidt jeanschmidt merged commit e9bc36e into main Aug 12, 2025
5 checks passed
@jeanschmidt jeanschmidt deleted the autorevert-do-autorevert branch August 12, 2025 12:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-no-td CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants