Replies: 8 comments 2 replies
-
|
Hi @yuzegao, thanks for your detailed solution. I personally preferred solution |
Beta Was this translation helpful? Give feedback.
-
|
I have one question about the solution A. I can see in the diagram that performs a CLUSTERX FAILOVER but I though, from the issues this comes from that is not related to clustering but to master-replica failover performed with redis sentinel with the failover command. That solution works with redis-sentinel command? |
Beta Was this translation helpful? Give feedback.
-
|
@git-hulk @ethervoid
|
Beta Was this translation helpful? Give feedback.
-
|
This seems more reasonable. I will develop it in cluster mode to implement lossless failover. If you have better suggestions, please let me know. |
Beta Was this translation helpful? Give feedback.
-
|
Understood sounds good to me, having a kvrocks FAILOVER command we can use to perform controlled manual failovers without losing data. Thank you for the clarification |
Beta Was this translation helpful? Give feedback.
-
|
After taking a look at Redis 'cluster failover' and 'client pause/unpause' commands, I prefer option A. |
Beta Was this translation helpful? Give feedback.
-
|
@zhixinwen 关注到你提交了wait命令和主从复制时的ack机制,想听听大佬的想法。 |
Beta Was this translation helpful? Give feedback.
-
|
I have completed the failover feature development and submitted a pull request (PR):#3295. |
Beta Was this translation helpful? Give feedback.


Uh oh!
There was an error while loading. Please reload this page.
-
Hi All,
In #2848, We plan to support the gracefully(No-data-loss) failover command.
Summary
Kvrocks today lacks a production-ready failover workflow that guarantees no data loss during manual promote master. This proposal compares two feasible approaches:
Both approaches are viable. This document outlines the design, tradeoffs, hoping to spark community discussion on which approach to adopt. In addition, this solution does not consider the processing of non-cluster mode.
Goals
Primary goal: Provide a failover mechanism to prevent data loss during manual maintenance (node migration, process upgrade).
Option A — In-process (Node-local) Failover
Concept
Enhance the Kvrocks server binary so that a given slave within a shard cooperates with the master to complete the master-slave reversal steps: checking the master-slave replication offset, pausing the master from writing, catching up the master and slave offsets, reversing the master-slave roles, and releasing the old master from writing.

Core Components/Steps
Advantages
Disadvantages/Risks
Option B — Controller-Based Failover
Concept
An external, highly available kvrocks controller is responsible for master-slave rollovers and cluster topology updates. The controller detects replication offsets between master and slave nodes, pauses writes to the master, updates the master and slave roles and cluster topology, and resumes writes to the old master. The controller also performs trade-off procedures for exceptions such as retries and rollbacks.

Core Components/Steps
Advantages
Disadvantages/Tradeoffs
Beta Was this translation helpful? Give feedback.
All reactions