fix: resolve race condition in quorum data recovery thread management#6787
Conversation
Replace check-then-set pattern with atomic compare-and-swap operation in StartQuorumDataRecoveryThread to prevent multiple threads from being started for the same quorum concurrently. The previous implementation had a race condition where multiple threads could pass the initial check before any of them set the flag, leading to potential resource conflicts and duplicate operations. This change ensures thread-safe access to fQuorumDataRecoveryThreadRunning using compare_exchange_strong, eliminating the race condition window.
✅ No Merge Conflicts DetectedThis PR currently has no conflicts with other open PRs. |
WalkthroughThe change updates the logic in the Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🧰 Additional context used📓 Path-based instructions (1)src/**/*.{cpp,h,cc,cxx,hpp}📄 CodeRabbit Inference Engine (CLAUDE.md)
Files:
🧠 Learnings (2)📓 Common learningssrc/llmq/quorums.cpp (3)Learnt from: kwvg Learnt from: kwvg Learnt from: knst ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
🔇 Additional comments (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
This bug was actually found by Qwen3-Coder 480B-A35B - an open source, open weight model, being used in |
…ry thread management 0e2811f fix: resolve race condition in quorum data recovery thread management (pasta) Pull request description: ## Summary This PR fixes a thread safety issue in the quorum data recovery system where multiple threads could be started for the same quorum due to a race condition. ### Problem The original code used a check-then-set pattern: ```cpp if (pQuorum->fQuorumDataRecoveryThreadRunning) { // Check return; } pQuorum->fQuorumDataRecoveryThreadRunning = true; // Set ``` Even though `fQuorumDataRecoveryThreadRunning` is declared as `std::atomic<bool>`, this pattern creates a race condition window where multiple threads can pass the check before any of them sets the flag. ### Solution Replace the check-then-set pattern with an atomic `compare_exchange_strong` operation: ```cpp bool expected = false; if (\!pQuorum->fQuorumDataRecoveryThreadRunning.compare_exchange_strong(expected, true)) { return; } ``` This ensures thread-safe access by atomically checking the current value and setting it to `true` only if it was previously `false`. ### Impact - Prevents multiple data recovery threads from being started for the same quorum - Eliminates potential resource conflicts and duplicate operations - Maintains the same functional behavior while ensuring thread safety ## Test plan - [x] Code compiles successfully - [x] Change maintains existing API and functionality - [x] Race condition eliminated through atomic operation Generated with [Claude Code](https://claude.ai/code) ACKs for top commit: UdjinM6: utACK 0e2811f Tree-SHA512: eff2798d535ba10d3baacb4b8aab731b6b0090d5f05c77c98beee07d116221184684d27bcacdbbf3cd8f63af952464dff2e7e2737c9c4c19f9fadef92424be81
Summary
This PR fixes a thread safety issue in the quorum data recovery system where multiple threads could be started for the same quorum due to a race condition.
Problem
The original code used a check-then-set pattern:
Even though
fQuorumDataRecoveryThreadRunningis declared asstd::atomic<bool>, this pattern creates a race condition window where multiple threads can pass the check before any of them sets the flag.Solution
Replace the check-then-set pattern with an atomic
compare_exchange_strongoperation:This ensures thread-safe access by atomically checking the current value and setting it to
trueonly if it was previouslyfalse.Impact
Test plan
Generated with Claude Code