-
Notifications
You must be signed in to change notification settings - Fork 944
Description
Describe the bug
We run Valkey on K8s as a statefulset. A statefulset has 3 pods, each with a Valkey container and a Sentinel container for managing HA. One of these pods is the master, and the other 2 are replicas. Pod IPs are ephemeral, so as pods cycle they will come up with a new IP address which Sentinel will detect as a new replica, marking the old one down. This causes stale replicas in the output of sentinel replicas master_name.
To mitigate this we configure K8s Services, one for each pod. We have an init container on the pods to find the ClusterIP for their respective Service, and then configures itself with replica-announce-ip for Valkey and announce-ip for Sentinel. This allows the replica IPs to remain the same as pods are cycled.
We recently migrated some of our workloads to Valkey 8.1.1 and are trialling the dual-channel-replication-enabled yes config. When enabled during a failover, the additional channel of type=rdb-channel appears as the pod IP. If sentinel were to be polling for replicas during this time, it will save this as a distinct replica. When the replica fully resyncs, it will use the announce IP address configured.
This duplication is bad for Valkey backed by Sentinel, as during a failover Sentinel would elect a new master from the list of replicas, and will instruct the new master to replicate from itself, which it cannot do.
Pod IPs
NAME READY STATUS RESTARTS AGE IP
redis-test-0-server-0 5/5 Running 0 8m12s 100.104.75.103
redis-test-0-server-1 5/5 Running 0 7m21s 100.111.250.31
redis-test-0-server-2 5/5 Running 0 12m 100.99.64.48
Service Cluster IPs
NAME TYPE CLUSTER-IP
redis-test-0-announce-0 ClusterIP 100.67.128.152
redis-test-0-announce-1 ClusterIP 100.70.100.164
redis-test-0-announce-2 ClusterIP 100.71.105.52
In this scenario, redis-test-0-server-1 has been promoted to a master after a failover. The output of info replication is:
# Replication
role:master
connected_slaves:2
slave0:ip=100.71.105.52,port=6379,state=online,offset=7448294713549,lag=1,type=replica
slave1:ip=100.104.75.103,port=6379,state=wait_bgsave,offset=0,lag=0,type=rdb-channel
- slave0 is the ClusterIP for redis-test-0-announce-2, this maps to pod redis-test-0-server-2.
- slave1 is the Pod IP for redis-test-0-server-0
When slave1 resyncs, the output of info replication is:
# Replication
role:master
connected_slaves:2
slave0:ip=100.71.105.52,port=6379,state=online,offset=7448295098696,lag=1,type=replica
slave1:ip=100.67.128.152,port=6379,state=online,offset=7448295102193,lag=1,type=replica
- slave0 is unchanged to what it was previously
- slave1 is now the ClusterIP for redis-test-0-announce-0, this maps to pod redis-test-0-server-0.
When executing sentinel replicas master_name | grep name -A1, there are 4 replicas.
/data $ redis-cli -p 26379 sentinel replicas master_name | grep name -A1
name
100.99.64.48:6379
--
name
100.104.75.103:6379
--
name
100.67.128.152:6379
--
name
100.71.105.52:6379
In order they are:
- pod IP for redis-test-0-server-2
- pod IP for redis-test-0-server-0
- ClusterIP for redis-test-0-announce-2 (redis-test-0-server-2)
- ClusterIP for redis-test-0-announce-0 (redis-test-0-server-0)
Issuing sentinel reset master_name fixes this list. But this is not an appropriate solution given that pods can cycle for any reason and would resync with the master with a new pod IP.
To reproduce
- Establish any environment with announce IPs, backed by Sentinel
- Enable
dual-channel-replication-enabled yes - Execute a failover
- Retrieve replicas from Sentinel via
sentinel failover master_name
Expected behavior
Stale/duplicate replicas are not persisted when a failover happens, or when pods are cycled.
Additional information
N/A