Commit e5fe380
authored
Flake Fix: In Reactivation Cache tests, wait for appropriate delays when confirming expected drainage status (#9352)
## What changed and why?
We were overriding the drainage grace period and refresh interval to 10s
and 30s respectively in these tests, but the helper function that waits
for those delays before checking drainage status was using the default
test grace period and test refresh interval, and was unaware of the
increased overrides.
Sometimes, the delay before verifying the status should have been 0s,
and sometimes the delay should have been 10s+30s, but each time it was
exactly 3s+3s, which was causing flakes.
I believe I have corrected this in all locations.
I expect this to fix 5 of the top 10 flaky tests in the most recent
[report](https://github.com/temporalio/temporal/actions/runs/22150195772)
- 34 failures:
TestDeploymentVersionSuiteV0/TestReactivationSignalCache_Deduplication_SignalWithStart
[1](https://github.com/temporalio/temporal/actions/runs/22147042860/job/64027488135)
[2](https://github.com/temporalio/temporal/actions/runs/22147042860/job/64027488134)
[3](https://github.com/temporalio/temporal/actions/runs/22123832710/job/63949687796)
- 31 failures:
TestDeploymentVersionSuiteV0/TestSignalWithStartWorkflowExecution_ReactivateVersionOnPinned
[1](https://github.com/temporalio/temporal/actions/runs/22147042860/job/64027488110)
[2](https://github.com/temporalio/temporal/actions/runs/22147042860/job/64027488003)
[3](https://github.com/temporalio/temporal/actions/runs/22123832710/job/63949687826)
- 27 failures:
TestDeploymentVersionSuiteV0/TestStartWorkflowExecution_ReactivateVersionOnPinned
[1](https://github.com/temporalio/temporal/actions/runs/22123832710/job/63949687766)
[2](https://github.com/temporalio/temporal/actions/runs/22112576512/job/63912535363)
[3](https://github.com/temporalio/temporal/actions/runs/22106207941/job/63889814713)
- 25 failures:
TestDeploymentVersionSuiteV0/TestReactivationSignalCache_Deduplication_UpdateOptions
[1](https://github.com/temporalio/temporal/actions/runs/22123832710/job/63949687826)
[2](https://github.com/temporalio/temporal/actions/runs/22123832710/job/63949687796)
[3](https://github.com/temporalio/temporal/actions/runs/22112576512/job/63912535362)
- 22 failures:
TestDeploymentVersionSuiteV0/TestReactivationSignalCache_Deduplication_StartWorkflow
[1](https://github.com/temporalio/temporal/actions/runs/22147042860/job/64027488160)
[2](https://github.com/temporalio/temporal/actions/runs/22147042860/job/64027488073)
[3](https://github.com/temporalio/temporal/actions/runs/22123832710/job/63949687769)
## How did you test it?
- [x] built
- [x] run locally and tested manually
- [x] covered by existing tests
- [ ] added new unit test(s)
- [ ] added new functional test(s)
<!-- CURSOR_SUMMARY -->
---
> [!NOTE]
> **Low Risk**
> Test-only timing/constant refactors; no production logic changes, with
the main risk being longer test runtimes or missed waits in remaining
call sites.
>
> **Overview**
> Fixes flakes in `worker_deployment_version_test.go` by making
drainage-status assertions wait for the *actual* configured
grace/refresh intervals rather than always using the default test
values.
>
> This refactors `checkVersionDrainageAndVersionStatus` to accept an
explicit `waitFor` duration (and updates call sites accordingly),
introduces named constants for short/long/extra-long intervals and cache
TTLs, and applies these longer waits in reactivation-cache deduplication
tests that override the drainage timings.
>
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
c863f99. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->1 parent 0fbc386 commit e5fe380
1 file changed
+73
-74
lines changed
0 commit comments