You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[autorevert] implement autorevert and fix detection logic (#6983)
### Summary
- Implemented revert detection/recording
- Implemented failure-only rule matching in the autorevert detector to
prevent “success” jobs with a classification label from contaminating
pattern detection
- Added a unit test
### Bug Fixed
- Cause: The detector previously matched on `classification_rule`
regardless of
job `conclusion`. Baseline commit `33ec6e3` had multiple “success”
shards labele
d with `rule='pytest failure'`, which the detector misread as “older
commit alre
ady has the same failure,” suppressing the pattern for
`bbc0df1`/`4fd5fab`.
- Fix: Require `conclusion == 'failure'` wherever the detector compares
rules (b
oth for newer commit confirmation and older baseline exclusion). This
prevents n
oise from success+rule rows and correctly flags commit-caused failures
like the
ROCm case.
### Testing
<details>
<summary>python -m pytorch_auto_revert autorevert-checker rocm --hours
82 --do-restart --dry-run</summary>
```
python -m pytorch_auto_revert autorevert-checker rocm --hours 82 --do-restart --dry-run
Fetching workflow data for 1 workflows since 2025-08-04T08:56:25.851470...
Found 161 commits with job data for workflow 'rocm'
✓ 3 AUTOREVERT PATTERNS DETECTED
Pattern #1:
Failure rule: 'pytest failure'
Recent commits with failure: bdb07a2b 8085edc8
Older commit without failure: 41081276
✗ NOT REVERTED: 8085edc8f9c98f670f585586b4286a942927537a was not reverted
⟳ DRY RUN: Would restart rocm for 8085edc8
⟳ DRY RUN: Would restart rocm for 41081276
Pattern #2:
Failure rule: 'pytest failure'
Recent commits with failure: 908c5cc4 b6c53383
Older commit without failure: 33ec6e3e
✗ NOT REVERTED: b6c53383fe2f29e6ed35430e90867dbeb8980d42 was not reverted
⟳ DRY RUN: Would restart rocm for b6c53383
⟳ DRY RUN: Would restart rocm for 33ec6e3e
Pattern #3:
Failure rule: 'pytest failure'
Recent commits with failure: 4fd5fabe bbc0df10
Older commit without failure: efc4b460
✓ REVERTED (nosignal): bbc0df1094b5a4dcd2cce83f8402127b07913231 was reverted by 41081276 after 18.5 hours
==================================================
SUMMARY STATISTICS
==================================================
Workflow(s): rocm
Timeframe: 82 hours
Commits checked: 161
Auto revert patterns detected: 3
Actual reverts inside auto revert patterns detected (precision): 1 (33.3%)
Total revert commits in period: 9
Revert categories:
nosignal: 5 (55.6%)
ignoredsignal: 2 (22.2%)
ghfirst: 2 (22.2%)
Total reverts excluding ghfirst: 7
Reverts (excluding ghfirst) that dont match any auto revert pattern detected (recall): 6 (85.7%)
Per workflow precision:
rocm: 1 reverts out of 3 patterns (33.3%) [excluding ghfirst: 1 (33.3%)]
Reverted patterns:
- pytest failure: bbc0df10 (nosignal)
Restarted workflows: 4
- rocm for 8085edc8
- rocm for 41081276
- rocm for b6c53383
- rocm for 33ec6e3e
```
</details>
the actual culprit was correctly identified:
```
Pattern #7:
Failure rule: 'pytest failure'
Recent commits with failure: 4fd5fabe bbc0df10
Older commit without failure: efc4b460
✓ REVERTED (nosignal): bbc0df1094b5a4dcd2cce83f8402127b07913231 was
reverted by 41081276 after 18.5 hours
```
there are multiple patterns detected, because the failure was jumping across **workflows**: rocm and rocm-mi300
---------
Co-authored-by: Jean Schmidt <[email protected]>
0 commit comments