cosim: Fix race-condition in cosim#798
Merged
Giftzwerg02 merged 4 commits intomasterfrom Mar 15, 2026
Merged
Conversation
f6aca3c to
5d4c0c0
Compare
5d4c0c0 to
7099315
Compare
Jozott00
approved these changes
Mar 14, 2026
7099315 to
74a7e2b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR should (finally) fix the flaky Ppc64Cosim tests which rarely appeared.
I found a race-condition, more specifically a lost wake-up, exactly as described here. I forgot to re-aquire or keep the mutex lock on the broker-side when sending a signal (or rather, when changing the dependent variable) to the client that it can write to the ring-buffer. In rare cases, the client checked if it can write (and saw it couldn't), then the broker sent a signal (which wasn't yet caught) and only then did the client start the conditional wait.
Simply keeping the lock while changing the counter fixes this problem, i.e.:
This PR also uses "robust" mutex locks to better check whether a QEMU client has crashed: