-
Notifications
You must be signed in to change notification settings - Fork 102
[autorevert] implement autorevert and fix detection logic #6983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎ |
2891ee8
to
1508f12
Compare
@@ -230,6 +256,25 @@ def _find_last_commit_with_job( | |||
) | |||
return None, None | |||
|
|||
def _find_last_commit_with_rule( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function is not called anywhere, what is its purpose?
@@ -74,13 +84,24 @@ def __init__( | |||
self._workflow_commits_cache: Dict[str, List[CommitJobs]] = {} | |||
self._commit_history = None | |||
self._ignore_classification_rules = ignore_classification_rules or set() | |||
# Controls whether queries target restarted runs only (workflow_dispatch/tagged trunk/<sha>) | |||
self._use_restarted_runs_only = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a variable, if it never changes?
query = """ | ||
base_where = ( | ||
"workflow_event = 'workflow_dispatch' AND head_branch LIKE 'trunk/%'" | ||
if self._use_restarted_runs_only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
condition is never true
@@ -311,25 +359,61 @@ def detect_autorevert_pattern_workflow(self, workflow_name: str) -> List[Dict]: | |||
) | |||
|
|||
if not last_commit_with_same_job or not last_same_jobs: | |||
# No older commit with the same job found | |||
# No older commit with any jobs found |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function _fild_last_commit_with_job
is expected to only return the commit with the given job. Not any job.
If a commit ran some jobs, but not that specifically being checked on job_name
it should be skipped in favor of finding the next one.
So, I believe the fix in the comment here is misplaced.
If there is a bug, maybe we should fix in the function itself? But I re-read it and could not pinpoint a problem.
continue | ||
|
||
# Ensure there is some overlap in job coverage between suspected and older commit | ||
older_coverage = list( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this can only happen because you removed the check in https://github.com/pytorch/test-infra/pull/6983/files#diff-7174b1a731f38e3efb2765fac87a65ce7835d26a68cccd9c1e329e0b2070f1e2L285
If you keep the check, it is guaranteed that the job_name
is present in both commits. This is due checks in _fild_last_commit_with_job
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the check is different (older commit vs newer commit), but your point is correct
self, workflow_name: str, sha: str | ||
) -> Optional[CommitJobs]: | ||
"""Return CommitJobs for a workflow and head_sha if present in cache.""" | ||
for cj in self.get_workflow_commits(workflow_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this loops happens inside a loop, I suspect that when running it for a long period like 2 years for evaluation this would be a significant bottleneck in the code due the quadratic nature.
Maybe the search here could leverage a map...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
trie would be even better for prefix search, but I don't think that would be a bottleneck
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove this function, it is not being used anywhere
# No overlapping jobs -> insufficient comparable signal | ||
continue | ||
|
||
# Cross-workflow baseline check: if multiple workflows were provided, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you post stats with and without this? I suspect this might be overkill and removing lots of possible commits from the list.
But need to do other fixes before running it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no point, it's slop, removed
# Secondary verification: compare first failing vs previous on restarted runs. | ||
if do_revert: | ||
# Best-effort; skip if query fails or restarted runs not yet present | ||
with suppress(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMG
I do not believe we're suppressing all exceptions and not printing anything when one occurs, for a production code.
Smells like vibe coding, and using old versions of llama :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gpt-5 actually!
aws/lambda/pytorch-auto-revert/pytorch_auto_revert/tests/test_autorevert_detector.py
Show resolved
Hide resolved
left a few comments, they are major IMO. Lets work on those and evaluate stats changes before merging this change. Overall great catch, seems a problematic bug :P |
# Conflicts: # aws/lambda/pytorch-auto-revert/pytorch_auto_revert/autorevert_checker.py
@jeanschmidt, thanks for the review! I removed all unnecessary changes, preserving the fix and the "do revert" functionality. please take a look |
I double-checked the stats and they are misleading. Not that the numbers are not relevant, but the names and values displayed don't match. So I recalculated precision, recall and f1 scores. (What we want) And kept the other values that use non-ghf signals only for inverse-recall. stats are now:
|
Summary
Bug Fixed
classification_rule
regardless ofjob
conclusion
. Baseline commit33ec6e3
had multiple “success” shards labeled with
rule='pytest failure'
, which the detector misread as “older commit already has the same failure,” suppressing the pattern for
bbc0df1
/4fd5fab
.conclusion == 'failure'
wherever the detector compares rules (both for newer commit confirmation and older baseline exclusion). This prevents n
oise from success+rule rows and correctly flags commit-caused failures like the
ROCm case.
Testing
python -m pytorch_auto_revert autorevert-checker rocm --hours 82 --do-restart --dry-run
the actual culprit was correctly identified:
there are multiple patterns detected, because the failure was jumping across workflows: rocm and rocm-mi300