Skip to content

Commit a4dd8b1

Browse files
committed
Fix non-Primary checks in liveness probe
The liveness probe did not parse non-Primary condition out of the mysql CLI command. Consequently, the liveness did not fail whe a galera node was disconnected from the primary partition and the galera pod could not restart automatically, leading to long delays before restart or sometimes full cluster disruption. Fix the way probes are handled and refactors bits to allow more precise conditions in startup/readiness/liveness probes. Jira: OSPRH-8862
1 parent 2d771bf commit a4dd8b1

File tree

1 file changed

+16
-6
lines changed

1 file changed

+16
-6
lines changed

templates/galera/bin/mysql_probe.sh

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,37 @@
11
#!/bin/bash
2-
set -eu
2+
set -u
33

44
# This secret is mounted by k8s and always up to date
55
read -s -u 3 3< /var/lib/secrets/dbpassword MYSQL_PWD || true
66
export MYSQL_PWD
77

88
PROBE_USER=root
9+
function mysql_status_check {
10+
local status=$1
11+
local expect=$2
12+
set -x
13+
mysql -u${PROBE_USER} -sNEe "show status like '${status}';" | tail -1 | grep -w -e "${expect}"
14+
}
915

1016
# Consider the pod has "started" once mysql is reachable
17+
# and is part of the primary partition
1118
if [ "$1" = "startup" ]; then
12-
mysql -u${PROBE_USER} -sNe "select(1);"
19+
mysql_status_check wsrep_cluster_status Primary
1320
exit $?
1421
fi
1522

16-
set -x
23+
# readiness and liveness probes are run by k8s only after start probe succeeded
1724

1825
case "$1" in
1926
readiness)
2027
# If the node is e.g. a donor, it cannot serve traffic
21-
mysql -u${PROBE_USER} -sNe "show status like 'wsrep_local_state_comment';" | grep -w -e Synced;;
28+
mysql_status_check wsrep_local_state_comment Synced
29+
;;
2230
liveness)
23-
# If the node is not in the primary partition, restart it
24-
mysql -u${PROBE_USER} -sNe "show status like 'wsrep_cluster_status';" | grep -w -e Primary;;
31+
# If the node is not in the primary partition, the failed liveness probe
32+
# will make k8s restart this pod
33+
mysql_status_check wsrep_cluster_status Primary
34+
;;
2535
*)
2636
echo "Invalid probe option '$1'"
2737
exit 1;;

0 commit comments

Comments
 (0)