pkg/monitor,pkg/util/changefeed: fix AllIteratorsConsumed race in changefeed tests#4699
pkg/monitor,pkg/util/changefeed: fix AllIteratorsConsumed race in changefeed tests#4699
Conversation
There was a problem hiding this comment.
Pull request overview
This PR stabilizes flaky changefeed-related tests by replacing AllIteratorsConsumed()-based synchronization with barriers derived from GetLastProcessed(), which is advanced only after OnAllPendingProcessed() completes a sweep.
Changes:
- Update
TestSubscriptionChangefeedto stop pollingAllIteratorsConsumed()and instead wait forcache.GetLastProcessed()to advance. - Update
TestChangefeedOperationsto wait on monitor/cache state rather than iterator consumption, reducing race sensitivity.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| pkg/util/changefeed/subscriptioncache_test.go | Replaces iterator-consumption waits with GetLastProcessed() advancement checks to ensure cache population has completed. |
| pkg/monitor/worker_test.go | Replaces iterator-consumption waits with assertions on observed docs and GetLastProcessed() advancement. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
0e18730 to
a6b8e99
Compare
a6b8e99 to
d353cee
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
d353cee to
90836ca
Compare
Pre-existing races found during
|
Replace AllIteratorsConsumed()-based synchronization in changefeed tests with barriers derived from GetLastProcessed(), which advances only after OnAllPendingProcessed() completes a full sweep. This is the correct settled-cache barrier. Key changes: - pkg/util/changefeed/subscriptioncache_test.go: wait for two GetLastProcessed() advancements after each mutation group to guarantee a complete post-mutation sweep has run - pkg/monitor/worker_test.go: replace vacuous len(docs) barrier with lastClusterChangefeed timestamp barrier; same two-advancement pattern for both cluster and subscription caches - pkg/database/cosmosdb: add per-iterator sync.Mutex to fakeSubscriptionDocumentIterator and fakeOpenShiftClusterDocumentIterator; fix ChangeFeed() to hold a write lock (not read lock) while appending to changeFeedIterators - pkg/database/cosmosdb: delete AllIteratorsConsumed() extension methods (no callers remain) - pkg/monitor/test_helpers.go: remove unused fake client fields from TestEnvironment - Makefile: add -race flag to unit-test-go target Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
90836ca to
7299caf
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Please rebase pull request. |
Which issue this PR addresses:
Fixes https://redhat.atlassian.net/browse/ARO-25354
What this PR does / why we need it:
AllIteratorsConsumed()is not a sound cache-settled barrier: the fake iterator setsi.done = trueinsideNext()when returning the final page, beforeRunChangefeedhas calledOnDocon those documents. Tests pollingAllIteratorsConsumedcan unblock and assert cache state while theOnDocloop for that final batch is still in progress.This PR replaces
AllIteratorsConsumedbarriers inworker_test.goandsubscriptioncache_test.gowithGetLastProcessed()polls.GetLastProcessed()is advanced byOnAllPendingProcessed(), whichRunChangefeedonly fires after allOnDoccalls for a sweep have completed — making it a sound post-sweep barrier.WaitForInitialPopulation()is already sound and is left unchanged.Test plan for issue:
go test -count=50 -race ./pkg/monitor/... ./pkg/util/changefeed/...passes without failure.Is there any documentation that needs to be updated for this PR?
No — CI-only flake fix, no user-facing or operational changes.
How do you know this will function as expected in production?
This change is test-only. No production code is modified. The barrier being replaced (
AllIteratorsConsumed) is a test helper that exists only in fake CosmosDB clients;GetLastProcessed()already exists on theSubscriptionsCacheinterface in production code.