Skip to content

cosim: Fix race-condition in cosim#798

Merged
Giftzwerg02 merged 4 commits intomasterfrom
feature/handle-cosim-client-crash
Mar 15, 2026
Merged

cosim: Fix race-condition in cosim#798
Giftzwerg02 merged 4 commits intomasterfrom
feature/handle-cosim-client-crash

Conversation

@Giftzwerg02
Copy link
Contributor

@Giftzwerg02 Giftzwerg02 commented Mar 9, 2026

This PR should (finally) fix the flaky Ppc64Cosim tests which rarely appeared.

I found a race-condition, more specifically a lost wake-up, exactly as described here. I forgot to re-aquire or keep the mutex lock on the broker-side when sending a signal (or rather, when changing the dependent variable) to the client that it can write to the ring-buffer. In rare cases, the client checked if it can write (and saw it couldn't), then the broker sent a signal (which wasn't yet caught) and only then did the client start the conditional wait.

Simply keeping the lock while changing the counter fixes this problem, i.e.:

pub fn end_read(&mut self) -> Result<()> {
    self.read_idx += 1;
    self.count.fetch_sub(1, Ordering::SeqCst);
    self.notifierpost()?;
    self.notifier.unlock()?; // <- mutex lock is now kept until here
    Ok(())
}

This PR also uses "robust" mutex locks to better check whether a QEMU client has crashed:

PTHREAD_MUTEX_ROBUST
If a mutex is initialized with the PTHREAD_MUTEX_ROBUST attribute and its owner
dies without unlocking it, any future attempts to call pthread_mutex_lock(3) on
this mutex will succeed and return EOWNERDEAD to indicate that the original
owner no longer exists and the mutex is in an inconsistent state. Usually after
EOWNERDEAD is returned, the next owner should call pthread_mutex_consistent(3)
on the acquired mutex to make it consistent again before using it any further.

@github-actions github-actions bot added the enhancement New feature or request label Mar 9, 2026
@Giftzwerg02 Giftzwerg02 force-pushed the feature/handle-cosim-client-crash branch from f6aca3c to 5d4c0c0 Compare March 11, 2026 12:31
@Giftzwerg02 Giftzwerg02 requested review from Jozott00 March 11, 2026 13:25
@Giftzwerg02 Giftzwerg02 marked this pull request as ready for review March 11, 2026 14:32
@Giftzwerg02 Giftzwerg02 force-pushed the feature/handle-cosim-client-crash branch from 5d4c0c0 to 7099315 Compare March 11, 2026 14:38
@Giftzwerg02 Giftzwerg02 enabled auto-merge March 11, 2026 15:42
@Giftzwerg02 Giftzwerg02 force-pushed the feature/handle-cosim-client-crash branch from 7099315 to 74a7e2b Compare March 15, 2026 11:53
@Giftzwerg02 Giftzwerg02 merged commit e8a0c3b into master Mar 15, 2026
7 checks passed
@Giftzwerg02 Giftzwerg02 deleted the feature/handle-cosim-client-crash branch March 15, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants