Skip to content

Commit e8dfdb6

Browse files
authored
[autorevert] Fix pacing query logic (#7274)
`Any` has an unexpected semantics in CH, it returns [first value](https://clickhouse.com/docs/sql-reference/aggregate-functions/reference/any), the correct way to check if any value is true is to use `countIf`. The effect of this bug was that pacing was not working in some rare cases when there are multiple events for commit and some were not matching the condition. Basically, when the first event goes out of the window, and second event is added, we get two rows: 0 and 1, and depending on the random order either would be returned by `any`. The correct way (among many) would use `countIf` instead. Testing: ``` SELECT (countIf(failed = 0 AND ts > now() - toIntervalSecond(5200)) > 0) AS has_success_within_window, any(failed = 0 AND ts > now() - toIntervalSecond(5200)) AS has_success_within_window_old FROM misc.autorevert_events_v2 WHERE repo = 'pytorch/pytorch' AND action = 'restart' AND dry_run = 0 AND commit_sha = 'b5c4f46bb9ede8dc6adf11975c93b9f285d9ed67' ``` result: ``` "has_success_within_window","has_success_within_window_old" "1","0" ``` more testing: ``` python -m pytorch_auto_revert --dry-run autorevert-checker Lint trunk pull inductor rocm rocm-mi300 --hours 18 --hud-html ```
1 parent 8baa877 commit e8dfdb6

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

aws/lambda/pytorch-auto-revert/pytorch_auto_revert/signal_actions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ def restart_stats(
113113
" SELECT\n"
114114
" count() AS total_restarts,\n"
115115
" maxIf(ts, failed = 1) AS last_failure_ts,\n"
116-
" any(failed = 0 AND ts > (now() - toIntervalSecond({pacing_sec:UInt32}))) "
116+
" (countIf(failed = 0 AND ts > (now() - toIntervalSecond({pacing_sec:UInt32}))) > 0) "
117117
" AS has_success_within_window\n"
118118
" FROM rows\n"
119119
" )\n"

0 commit comments

Comments
 (0)