Skip to content

fix: resolve race condition in quorum data recovery thread management#6787

Merged
PastaPastaPasta merged 1 commit intodashpay:developfrom
PastaPastaPasta:fix-quorum-thread-safety
Jul 30, 2025
Merged

fix: resolve race condition in quorum data recovery thread management#6787
PastaPastaPasta merged 1 commit intodashpay:developfrom
PastaPastaPasta:fix-quorum-thread-safety

Conversation

@PastaPastaPasta
Copy link
Member

Summary

This PR fixes a thread safety issue in the quorum data recovery system where multiple threads could be started for the same quorum due to a race condition.

Problem

The original code used a check-then-set pattern:

if (pQuorum->fQuorumDataRecoveryThreadRunning) {  // Check
    return;
}
pQuorum->fQuorumDataRecoveryThreadRunning = true;  // Set

Even though fQuorumDataRecoveryThreadRunning is declared as std::atomic<bool>, this pattern creates a race condition window where multiple threads can pass the check before any of them sets the flag.

Solution

Replace the check-then-set pattern with an atomic compare_exchange_strong operation:

bool expected = false;
if (\!pQuorum->fQuorumDataRecoveryThreadRunning.compare_exchange_strong(expected, true)) {
    return;
}

This ensures thread-safe access by atomically checking the current value and setting it to true only if it was previously false.

Impact

  • Prevents multiple data recovery threads from being started for the same quorum
  • Eliminates potential resource conflicts and duplicate operations
  • Maintains the same functional behavior while ensuring thread safety

Test plan

  • Code compiles successfully
  • Change maintains existing API and functionality
  • Race condition eliminated through atomic operation

Generated with Claude Code

Replace check-then-set pattern with atomic compare-and-swap operation
in StartQuorumDataRecoveryThread to prevent multiple threads from being
started for the same quorum concurrently.

The previous implementation had a race condition where multiple threads
could pass the initial check before any of them set the flag, leading
to potential resource conflicts and duplicate operations.

This change ensures thread-safe access to fQuorumDataRecoveryThreadRunning
using compare_exchange_strong, eliminating the race condition window.
@github-actions
Copy link

✅ No Merge Conflicts Detected

This PR currently has no conflicts with other open PRs.

@coderabbitai
Copy link

coderabbitai bot commented Jul 29, 2025

Walkthrough

The change updates the logic in the CQuorumManager::StartQuorumDataRecoveryThread method within src/llmq/quorums.cpp. Specifically, it replaces a non-atomic boolean check and assignment for the fQuorumDataRecoveryThreadRunning flag with an atomic compare_exchange_strong operation. This ensures that the flag is set in a thread-safe manner, preventing potential race conditions when checking or updating the thread's running status. No other parts of the method or related public interfaces are modified.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b3169d and 0e2811f.

📒 Files selected for processing (1)
  • src/llmq/quorums.cpp (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.{cpp,h,cc,cxx,hpp}

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/**/*.{cpp,h,cc,cxx,hpp}: Dash Core C++ codebase must be written in C++20 and require at least Clang 16 or GCC 11.1
Dash uses unordered_lru_cache for efficient caching with LRU eviction

Files:

  • src/llmq/quorums.cpp
🧠 Learnings (2)
📓 Common learnings
Learnt from: kwvg
PR: dashpay/dash#6543
File: src/wallet/receive.cpp:240-251
Timestamp: 2025-02-06T14:34:30.466Z
Learning: Pull request #6543 is focused on move-only changes and refactoring, specifically backporting from Bitcoin. Behavior changes should be proposed in separate PRs.
Learnt from: kwvg
PR: dashpay/dash#6718
File: test/functional/test_framework/test_framework.py:2102-2102
Timestamp: 2025-06-09T16:43:20.996Z
Learning: In the test framework consolidation PR (#6718), user kwvg prefers to limit functional changes to those directly related to MasternodeInfo, avoiding scope creep even for minor improvements like error handling consistency.
Learnt from: kwvg
PR: dashpay/dash#6504
File: src/llmq/quorums.cpp:224-224
Timestamp: 2024-12-29T17:43:41.755Z
Learning: The `CQuorumManager` is fully initialized by `LLMQContext`, addressing any concerns about the manager’s initialization sequence.
Learnt from: kwvg
PR: dashpay/dash#6504
File: src/llmq/context.cpp:42-43
Timestamp: 2025-01-02T21:50:00.967Z
Learning: LLMQContext manages concurrency for the `CInstantSendManager`. Previously, this was handled globally; now it's handled as a class member in `LLMQContext`, but the concurrency control remains consistent.
Learnt from: knst
PR: dashpay/dash#6691
File: src/test/llmq_params_tests.cpp:148-151
Timestamp: 2025-07-15T14:53:04.819Z
Learning: In the Dash Core LLMQ implementation, signingActiveQuorumCount is never 0 in the actual parameters defined in params.h, making division by zero scenarios unrealistic in the max_cycles() function.
src/llmq/quorums.cpp (3)

Learnt from: kwvg
PR: #6504
File: src/llmq/quorums.cpp:224-224
Timestamp: 2024-12-29T17:43:41.755Z
Learning: The CQuorumManager is fully initialized by LLMQContext, addressing any concerns about the manager’s initialization sequence.

Learnt from: kwvg
PR: #6504
File: src/llmq/context.cpp:42-43
Timestamp: 2025-01-02T21:50:00.967Z
Learning: LLMQContext manages concurrency for the CInstantSendManager. Previously, this was handled globally; now it's handled as a class member in LLMQContext, but the concurrency control remains consistent.

Learnt from: knst
PR: #6691
File: src/test/llmq_params_tests.cpp:148-151
Timestamp: 2025-07-15T14:53:04.819Z
Learning: In the Dash Core LLMQ implementation, signingActiveQuorumCount is never 0 in the actual parameters defined in params.h, making division by zero scenarios unrealistic in the max_cycles() function.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)
  • GitHub Check: linux64_sqlite-build / Build source
  • GitHub Check: mac-build / Build source
  • GitHub Check: arm-linux-build / Build source
  • GitHub Check: linux64_nowallet-build / Build source
  • GitHub Check: win64-build / Build source
  • GitHub Check: Lint / Run linters
🔇 Additional comments (1)
src/llmq/quorums.cpp (1)

926-927: LGTM! Excellent fix for the race condition.

The replacement of the check-then-set pattern with compare_exchange_strong correctly addresses the thread safety issue. This atomic operation ensures that only one thread can successfully transition the flag from false to true, preventing multiple data recovery threads from starting concurrently for the same quorum.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@PastaPastaPasta
Copy link
Member Author

This bug was actually found by Qwen3-Coder 480B-A35B - an open source, open weight model, being used in opencode. the issue was then validated by Claude Code, and a fix implemented by claude.

Copy link

@UdjinM6 UdjinM6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK 0e2811f

@UdjinM6 UdjinM6 added this to the 23 milestone Jul 29, 2025
@PastaPastaPasta PastaPastaPasta merged commit b137979 into dashpay:develop Jul 30, 2025
53 of 56 checks passed
knst pushed a commit to knst/dash that referenced this pull request Sep 18, 2025
…ry thread management

0e2811f fix: resolve race condition in quorum data recovery thread management (pasta)

Pull request description:

  ## Summary

  This PR fixes a thread safety issue in the quorum data recovery system where multiple threads could be started for the same quorum due to a race condition.

  ### Problem

  The original code used a check-then-set pattern:
  ```cpp
  if (pQuorum->fQuorumDataRecoveryThreadRunning) {  // Check
      return;
  }
  pQuorum->fQuorumDataRecoveryThreadRunning = true;  // Set
  ```

  Even though `fQuorumDataRecoveryThreadRunning` is declared as `std::atomic<bool>`, this pattern creates a race condition window where multiple threads can pass the check before any of them sets the flag.

  ### Solution

  Replace the check-then-set pattern with an atomic `compare_exchange_strong` operation:
  ```cpp
  bool expected = false;
  if (\!pQuorum->fQuorumDataRecoveryThreadRunning.compare_exchange_strong(expected, true)) {
      return;
  }
  ```

  This ensures thread-safe access by atomically checking the current value and setting it to `true` only if it was previously `false`.

  ### Impact

  - Prevents multiple data recovery threads from being started for the same quorum
  - Eliminates potential resource conflicts and duplicate operations
  - Maintains the same functional behavior while ensuring thread safety

  ## Test plan

  - [x] Code compiles successfully
  - [x] Change maintains existing API and functionality
  - [x] Race condition eliminated through atomic operation

  Generated with [Claude Code](https://claude.ai/code)

ACKs for top commit:
  UdjinM6:
    utACK 0e2811f

Tree-SHA512: eff2798d535ba10d3baacb4b8aab731b6b0090d5f05c77c98beee07d116221184684d27bcacdbbf3cd8f63af952464dff2e7e2737c9c4c19f9fadef92424be81
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants