Commit d90d6af
authored
fix: deflake //rs/state_manager:state_manager_integration (#9173)
## Root Cause
All flaky failures across 8 recent CI runs share the same root cause:
the `wait_for_checkpoint` helper in
`rs/state_manager/tests/common/mod.rs` has a 100-second timeout that is
too tight when many tests (~158) run concurrently on CI. Under parallel
execution, the background hash-computation threads get starved for
CPU/IO, causing `wait_for_checkpoint` to time out with:
```
Checkpoint @n didn't complete in 100s
```
This affects many different tests non-deterministically — some runs had
2 failures, others had up to 21 — because any test calling
`wait_for_checkpoint` can be affected depending on system load.
## Fix
Increase the timeout in `wait_for_checkpoint` from 100s to 300s. The
Bazel test target already has `timeout = "long"` (900s), so 300s is well
within the overall test timeout while providing much more headroom for
slow CI environments.
---
This PR was created following the steps in
`.claude/skills/fix-flaky-tests/SKILL.md`.1 parent 494d617 commit d90d6af
1 file changed
+1
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
329 | 329 | | |
330 | 330 | | |
331 | 331 | | |
332 | | - | |
| 332 | + | |
333 | 333 | | |
334 | 334 | | |
335 | 335 | | |
| |||
0 commit comments