Skip to content

Commit d90d6af

Browse files
authored
fix: deflake //rs/state_manager:state_manager_integration (#9173)
## Root Cause All flaky failures across 8 recent CI runs share the same root cause: the `wait_for_checkpoint` helper in `rs/state_manager/tests/common/mod.rs` has a 100-second timeout that is too tight when many tests (~158) run concurrently on CI. Under parallel execution, the background hash-computation threads get starved for CPU/IO, causing `wait_for_checkpoint` to time out with: ``` Checkpoint @n didn't complete in 100s ``` This affects many different tests non-deterministically — some runs had 2 failures, others had up to 21 — because any test calling `wait_for_checkpoint` can be affected depending on system load. ## Fix Increase the timeout in `wait_for_checkpoint` from 100s to 300s. The Bazel test target already has `timeout = "long"` (900s), so 300s is well within the overall test timeout while providing much more headroom for slow CI environments. --- This PR was created following the steps in `.claude/skills/fix-flaky-tests/SKILL.md`.
1 parent 494d617 commit d90d6af

File tree

1 file changed

+1
-1
lines changed
  • rs/state_manager/tests/common

1 file changed

+1
-1
lines changed

rs/state_manager/tests/common/mod.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -329,7 +329,7 @@ pub fn modify_encoded_stream_helper<F: FnOnce(StreamSlice) -> Stream>(
329329
pub fn wait_for_checkpoint(state_manager: &impl StateManager, h: Height) -> CryptoHashOfState {
330330
use std::time::{Duration, Instant};
331331

332-
let timeout = Duration::from_secs(100);
332+
let timeout = Duration::from_secs(300);
333333
let started = Instant::now();
334334
while started.elapsed() < timeout {
335335
match state_manager.get_state_hash_at(h) {

0 commit comments

Comments
 (0)