Skip to content

Commit 61aca75

Browse files
[Stable/Redis-ha] Add retry mechanism to splitbrain (#336)
* Add retry mechanism to splitbrain script Signed-off-by: Balázs Varga <balazs.varga@strivacity.com> * Update readme Signed-off-by: Balázs Varga <balazs.varga@strivacity.com> * Bump chart version Signed-off-by: Balázs Varga <balazs.varga@strivacity.com> * Incrementing Chart.yaml Signed-off-by: Aaron Layfield <aaron.layfield@gmail.com> --------- Signed-off-by: Balázs Varga <balazs.varga@strivacity.com> Signed-off-by: Aaron Layfield <aaron.layfield@gmail.com> Co-authored-by: Aaron Layfield <aaron.layfield@gmail.com>
1 parent 2d79d92 commit 61aca75

File tree

4 files changed

+25
-12
lines changed

4 files changed

+25
-12
lines changed

charts/redis-ha/Chart.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ keywords:
55
- redis
66
- keyvalue
77
- database
8-
version: 4.34.5
8+
version: 4.34.6
99
appVersion: 8.2.1
1010
description: This Helm chart provides a highly available Redis implementation with a master/slave configuration and uses Sentinel sidecars for failover management
1111
icon: https://img.icons8.com/external-tal-revivo-shadow-tal-revivo/24/external-redis-an-in-memory-data-structure-project-implementing-a-distributed-logo-shadow-tal-revivo.png

charts/redis-ha/README.md

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -464,14 +464,11 @@ Should your Pod require additional egress rules, define them in a `egressRules`
464464
Under not entirely known yet circumstances redis sentinel and its corresponding redis server reach a condition that this chart authors call "split brain" (for short). The observed behaviour is the following: the sentinel switches to the new re-elected master, but does not switch its redis server. Majority of original discussion on the problem has happened at the <https://github.com/DandyDeveloper/charts/issues/121>.
465465

466466
The proposed solution is currently implemented as a sidecar container that runs a bash script with the following logic:
467-
468-
1. Every `splitBrainDetection.interval` seconds a master (as known by sentinel) is determined
469-
1. If it is the current node: ensure the redis server's role is master as well.
470-
1. If it is not the current node: ensure the redis server also replicates from the same node.
471-
472-
If any of the checks above fails - the redis server reinitialisation happens (it regenerates configs the same way it's done during the pod init), and then the redis server is instructed to shutdown. Then kubernetes restarts the container immediately.
473-
474-
# Change Log
467+
1. At intervals defined by `splitBrainDetection.interval`, the sidecar checks which node is recognized as master by Sentinel.
468+
2. If the current pod is the master according to Sentinel, it verifies that the local Redis server is also running as master.
469+
3. If the current pod is not the master, it ensures the local Redis server is replicating from the correct master node.
470+
4. If any of these checks fail, the sidecar will retry the check at intervals defined by `splitBrainDetection.retryInterval`.
471+
5. If the checks continue to fail after the retry attempts, the sidecar triggers a reinitialization: it regenerates the Redis configuration and instructs the Redis server to shut down. Kubernetes will then automatically restart the container.
475472

476473
## 4.14.9 - ** POTENTIAL BREAKING CHANGE. **
477474
Introduced the ability to change the Haproxy Deployment container pod

charts/redis-ha/templates/_configs.tpl

Lines changed: 18 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -475,12 +475,28 @@
475475
if [ "$MASTER" = "$ANNOUNCE_IP" ]; then
476476
redis_role
477477
if [ "$ROLE" != "master" ]; then
478-
reinit
478+
echo "waiting for redis to become master"
479+
sleep {{ .Values.splitBrainDetection.retryInterval }}
480+
identify_master
481+
redis_role
482+
echo "Redis role is $ROLE, expected role is master. No need to reinitialize."
483+
if [ "$ROLE" != "master" ]; then
484+
echo "Redis role is $ROLE, expected role is master, reinitializing"
485+
reinit
486+
fi
479487
fi
480488
elif [ "${MASTER}" ]; then
481489
identify_redis_master
482490
if [ "$REDIS_MASTER" != "$MASTER" ]; then
483-
reinit
491+
echo "Redis master and local master are not the same. waiting."
492+
sleep {{ .Values.splitBrainDetection.retryInterval }}
493+
identify_master
494+
identify_redis_master
495+
echo "Redis master is ${MASTER}, expected master is ${REDIS_MASTER}. No need to reinitialize."
496+
if [ "${REDIS_MASTER}" != "${MASTER}" ]; then
497+
echo "Redis master is ${MASTER}, expected master is ${REDIS_MASTER}, reinitializing"
498+
reinit
499+
fi
484500
fi
485501
fi
486502
done
@@ -727,4 +743,3 @@
727743
fi
728744
echo "response=$response"
729745
{{- end }}
730-

charts/redis-ha/values.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1014,5 +1014,6 @@ networkPolicy:
10141014
splitBrainDetection:
10151015
# -- Interval between redis sentinel and server split brain checks (in seconds)
10161016
interval: 60
1017+
retryInterval: 10
10171018
# -- splitBrainDetection resources
10181019
resources: {}

0 commit comments

Comments
 (0)