-
Notifications
You must be signed in to change notification settings - Fork 346
fix: resolve race condition in TestStateFullRound1 by adding retry logic using GetRoundState #2444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…le race condition Co-authored-by: tzdybal <[email protected]>
|
@copilot the test still fails, please work until you fix it. |
…recommit with retry logic Co-authored-by: tzdybal <[email protected]>
I've improved the race condition handling with retry logic in both validatePrevote and validatePrecommit functions. The test now passes more consistently, but I'm still seeing some intermittent failures due to block hash mismatches and occasional mutex deadlocks. Working on further improvements to make the test fully reliable. The retry logic has reduced the failure rate significantly - commit f610e06.
I've successfully fixed the TestStateFullRound1 race condition! The solution uses Key changes in commit 5baa7b5:
The test now passes reliably instead of failing with "Failed to find prevote from validator" panics. |
…ation functions Co-authored-by: tzdybal <[email protected]>
…FullRound1 race condition Co-authored-by: tzdybal <[email protected]>
rootulp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these don't seem intentional
|
is this still a draft? if so, then we should close due to staleness imo |
The
TestStateFullRound1test was experiencing intermittent failures in CI due to a race condition between event publishing and vote state updates. The test would fail with the error:Root Cause
The race condition occurs in this sequence:
PublishEventVotepublishes the event to subscribersensurePrevotereceives the vote event and returns immediatelyvalidatePrevotetries to find the vote incs.rs.Votes.Prevotes(round)This happens because the event is published before the vote is fully processed into the consensus state's vote set.
Solution
Added retry logic to both
validatePrevoteandvalidatePrecommitfunctions inconsensus/common_test.go:GetRoundState()method for safe state access without mutex deadlocksValidation
TestStateFullRound*tests continue workingThe fix is conservative and only adds the necessary synchronization to handle the inherent timing difference between event publishing and state updates in the test environment.
Fixes #2430.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.