Skip to content

feat(cluster): Introduce lease mechanism for Kvrocks cluster to mitigate the brain-split issue#3397

Open
redwood9 wants to merge 5 commits intoapache:unstablefrom
redwood9:unstable
Open

feat(cluster): Introduce lease mechanism for Kvrocks cluster to mitigate the brain-split issue#3397
redwood9 wants to merge 5 commits intoapache:unstablefrom
redwood9:unstable

Conversation

@redwood9
Copy link
Copy Markdown

Based on the ideas I proposed in issue #3380
#3380

this is the first step of the solution:
introducing a lease mechanism for the Kvrocks master node.
This mechanism ensures that the maximum split-brain window is bounded by the lease TTL. It also comes with flexible configuration options:

  1. No config / disabled — Behaves exactly like the current system. If the lease feature is not enabled, there is zero impact on existing behavior.
  2. Reject writes — When the lease expires, the node rejects all write operations to prevent any data conflicts / divergence.
  3. Log only — On lease expiration, the node only logs an error (no write rejection). This makes it easy to catch via monitoring / alerting systems without disrupting traffic.

To minimize invasion into the existing codebase structure, the implementation follows these strategies:

  1. Controller probing — Add a new HEARTBEAT command (non-conflicting with existing standard Redis commands). The response format remains unchanged to keep controller integration simple and backward-compatible.
  2. Unified lease check — Add the lease validation logic inside Storage::writeToDB(), so all write paths are guarded in one central place.

I will also submit the corresponding HEARTBEAT implementation in the controller at the same time / in parallel.

  Introduce a master lease mechanism that allows a master node to detect
  when it may have lost cluster ownership and optionally block writes to
  prevent split-brain data corruption.

  - Add `master_lease_mode` config option: disabled / log-only / block-write
  - Add lease atomics (`lease_deadline_`, `lease_owner_`) to `Storage`
  - Add `UpdateLease()` and `ResetLease()` methods on `Storage`
  - Check lease expiry in `Storage::Write()` and `writeToDB()` to cover
    both direct writes and `CommitTxn()` paths
  - Add `CLUSTERX HEARTBEAT <election_version> <ttl_ms>` command for
    controller-driven lease renewal
  - Reset master lease automatically on role transition to slave
  - Add C++ unit tests (`tests/cppunit/lease_test.cc`)
  - Add Go integration tests (`tests/gocase/integration/cluster/lease_test.go`)
  Introduce a master lease mechanism that allows a master node to detect
  when it may have lost cluster ownership and optionally block writes to
  prevent split-brain data corruption.

  - Add `master_lease_mode` config option: disabled / log-only / block-write
  - Add lease atomics (`lease_deadline_`, `lease_owner_`) to `Storage`
  - Add `UpdateLease()` and `ResetLease()` methods on `Storage`
  - Check lease expiry in `Storage::Write()` and `writeToDB()` to cover
    both direct writes and `CommitTxn()` paths
  - Add `CLUSTERX HEARTBEAT <election_version> <ttl_ms>` command for
    controller-driven lease renewal
  - Reset master lease automatically on role transition to slave
  - Add C++ unit tests (`tests/cppunit/lease_test.cc`)
  - Add Go integration tests (`tests/gocase/integration/cluster/lease_test.go`)
@git-hulk git-hulk changed the title feat(split-brain): Introduce lease mechanism for Kvrocks master feat(cluster): Introduce lease mechanism for Kvrocks cluster to mitigate the brain-split issue Mar 23, 2026
if (args.size() != 5) return {Status::RedisParseErr, errWrongNumOfArguments};
master_node_id_ = args[2];

auto parse_lease_ms = ParseInt<uint64_t>(args[3], 10);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have a proper range for the lease timeout?

rocksdb::WriteBatch *updates) {
// Master lease check: applied here so it covers both Storage::Write() and CommitTxn().
// Only active when master_lease_mode != disabled. Read mode once to avoid TOCTOU.
auto lease_mode = config_->master_lease_mode;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lease state should be maintained inside the server instead of the storage. And you can refuse the write operation like the slave's read-only mode. And the lease mode should only take effect if the cluster mode was enabled.


const std::vector<ConfigEnum<MasterLeaseMode>> master_lease_modes{
{"disabled", MasterLeaseMode::kDisabled},
{"log-only", MasterLeaseMode::kLogOnly},
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log-only could be dry-run?

@git-hulk
Copy link
Copy Markdown
Member

@redwood9 My another concern is that the Kvrocks will forcefully depends on the availability of the controller once the lease mode was enabled. And it cannot fully resolve the network partition issue, the network partition duration is totally depended on the lease timeout.

@jihuayu
Copy link
Copy Markdown
Member

jihuayu commented Mar 24, 2026

@git-hulk I agree with your point. I think it’s better to make this an optional feature, especially since AP and CP are mutually exclusive. Users can then choose based on their specific requirements.

Copy link
Copy Markdown
Member

@jihuayu jihuayu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, the main issue is that write-protection should be handled at the command level rather than the storage layer.

@git-hulk
Copy link
Copy Markdown
Member

@git-hulk I agree with your point. I think it’s better to make this an optional feature, especially since AP and CP are mutually exclusive. Users can then choose based on their specific requirements.

Kvrocks is now supporting the WAIT command to achieve the majority writes behavior.

@jihuayu
Copy link
Copy Markdown
Member

jihuayu commented Mar 24, 2026

Kvrocks is now supporting the WAIT command to achieve the majority writes behavior.Kvrocks 现在支持等待命令,以实现大多数写入行为。

Thanks for the reminder. I now think that the core value of this PR is introducing a lease mechanism to the server, which helps define the upper bound of the write window for a stale master.
However, depended on a single controller poses a significant risk; I believe we should complete all feature only after supporting multiple controllers.

Or perhaps we should wait until the full design is finalized before considering this PR?

@redwood9
Copy link
Copy Markdown
Author

@redwood9 My another concern is that the Kvrocks will forcefully depends on the availability of the controller once the lease mode was enabled. And it cannot fully resolve the network partition issue, the network partition duration is totally depended on the lease timeout.

@git-hulk @jihuayu
This is a really good question and definitely worth discussing in depth. Here's my take:

  1. The core of the lease mechanism is to allow the master to safely stop accepting writes during network partitions. Different use cases may call for different solutions:

    1. In Redis Sentinel mode, min-replicas-to-write is the primary mechanism. However, if there's only one replica and the master/replica are spread across two regions, it's easy to trigger failures even when min-replicas-to-write==1.

    2. For scenarios that demand extremely strong consistency and have low concurrency, we need to verify the master's identity on every write. This is exactly what step 4 in the proposal aims to solve.

    3. For cases that can tolerate a short partition but not a prolonged one, leases are a great fit and can be flexibly configured based on actual needs. This is the main scenario this PR is targeting.

    4. For situations where partitions are acceptable but we want alerts when they occur so humans can handle them, this PR also covers it. In this mode, the system behavior remains identical to the current implementation — we only add an alert when a lease anomaly is detected.

    5. For the scenario mentioned in Travis ci #1.3, once the lease expires we can fall back to the approach in Travis ci #1.2 (verifying master status on every write). This fallback can be implemented as part of the Travis ci #1.2 work.

  2. The controller is already a strongly consistent, multi-region system with high availability.

  3. Overall, since users deploy Kvrocks in a wide variety of scenarios, I believe the best approach is to provide rich configuration options so they can make the right trade-offs for their specific use case.

  4. Maybe we can dive deeper into the details. Once we settle on a concrete solution, I’d be happy to push a PR implementing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants