Skip to content

Conversation

@fengjiachun
Copy link
Contributor

@fengjiachun fengjiachun commented Nov 7, 2025

Motivation:

Explain the context, and why you're making that change.
To make others understand what is the problem you're trying to solve.

As the title

Result:

Fixes #1232.

If there is no issue then describe the changes introduced by this PR.

Summary by CodeRabbit

Release Notes

  • New Features

    • Added read operation optimization that intelligently filters followers based on replication status, improving performance when followers have varying sync levels.
    • Added new maxReadIndexLag configuration parameter to control when this optimization is applied (disabled by default with -1).
  • Enhancements

    • Added efficient lock-free method for accessing replication state information.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 7, 2025

Walkthrough

Introduces an optimization to reduce lock contention during read index request handling by filtering followers based on replication lag. Adds a new lock-free method to check replication progress without acquiring locks, and updates configuration documentation to support the new filtering threshold.

Changes

Cohort / File(s) Change Summary
Read Index filtering logic
jraft-core/src/main/java/com/alipay/sofa/jraft/core/NodeImpl.java
Added filterPeersForReadIndex(allPeers) private helper method that filters followers based on configurable replication lag threshold (maxReadIndexLag). Modified ReadOnlySafe path to use filtered peers instead of broadcasting to all followers, with fallback to all peers if quorum cannot be maintained.
Lock-free replication progress check
jraft-core/src/main/java/com/alipay/sofa/jraft/core/Replicator.java
Added public static method getNextIndexUnsafe(ThreadId id) to retrieve next log index without acquiring lock, relying on volatile field visibility to avoid contention.
Configuration documentation
jraft-core/src/main/java/com/alipay/sofa/jraft/option/RaftOptions.java
Updated maxReadIndexLag field documentation to describe the optimization threshold for filtering followers based on replication lag, with -1 value to disable the optimization.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Leader
    participant FollowerA as Follower A<br/>(in-sync)
    participant FollowerB as Follower B<br/>(lagging)
    
    rect rgb(200, 220, 255)
    Note over Leader,FollowerB: Previous behavior: ReadIndex broadcast to all followers
    Client->>Leader: ReadIndex request
    Leader->>Leader: handleReadIndexRequest()
    Leader->>FollowerA: sendHeartbeat() + lock contention
    Leader->>FollowerB: sendHeartbeat() + lock contention
    end
    
    rect rgb(220, 240, 200)
    Note over Leader,FollowerB: New behavior: ReadIndex broadcast to filtered peers only
    Client->>Leader: ReadIndex request
    Leader->>Leader: handleReadIndexRequest()
    Leader->>Leader: filterPeersForReadIndex()<br/>using getNextIndexUnsafe()
    alt Lag within threshold
        Leader->>FollowerA: sendHeartbeat() (reduced contention)
    else Lag exceeds threshold
        Leader->>FollowerB: skip (no lock acquisition)
    end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • NodeImpl.java filtering logic: Requires careful review of the lag calculation formula (logLag = lastLogIndex - (nextIndex - 1)) and quorum fallback behavior to ensure correctness under various replication states
  • Lock-free method usage: Verify that getNextIndexUnsafe() is appropriately used in contexts where missing the most recent value is acceptable, and confirm volatile visibility guarantees are sufficient
  • Filtering threshold boundary conditions: Review edge cases where maxReadIndexLag < 0, peer count changes, or quorum requirements shift during execution

Poem

🐰 A leader once battled with lock-holding foes,
Till followers lagging were filtered by flows,
With lock-free checks swift and no bitter contests,
The heartbeat now whispers to only the best! 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title contains a typo ('sikp' instead of 'skip') and is vague about the optimization strategy, though it does reference the core objective of handling slow followers during read index operations. Correct the typo to 'skip' and consider clarifying the title to better reflect the optimization strategy, such as 'feat: skip slow followers for read index operations' to make the intent clearer.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed The PR implements code optimization to address lock contention by filtering slow followers during read index operations, directly addressing the primary objective raised in issue #1232.
Out of Scope Changes check ✅ Passed All changes are directly related to optimizing read index behavior with followers: new filtering logic in NodeImpl, a helper method in Replicator for lock-free nextIndex access, and documentation updates in RaftOptions.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/improve-readindex

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f56081 and f67736f.

📒 Files selected for processing (3)
  • jraft-core/src/main/java/com/alipay/sofa/jraft/core/NodeImpl.java (2 hunks)
  • jraft-core/src/main/java/com/alipay/sofa/jraft/core/Replicator.java (1 hunks)
  • jraft-core/src/main/java/com/alipay/sofa/jraft/option/RaftOptions.java (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: test_rpc_grpc_impl
  • GitHub Check: test_rheakv_pd
  • GitHub Check: test_jraft_core
  • GitHub Check: test_rheakv_core
  • GitHub Check: sca

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

发布follower导致leader卡住15秒

2 participants