Skip to content

Commit eaf1d17

Browse files
committed
Improve flaky chainsaw test for service failover
One chainsaw test consists in abruptly cutting one galera node away from the galera cluster and verify that the active endpoint moves to one of the remaining two galera instances. In doing so, we currently kill -9 the target mysqld server. By design, this can take by default up to 15s for the remaining galera nodes to acknowlege the node went away and react to that. This is a problem for the test as if the pod comes back online before the 15s, the galera cluster won't move the endpoint and the test will fail. To prevent flaky result in the unit test, use the STOP signal instead of the KILL signal. This doesn't kill the pod, and by default galera will mark the node as not responding after 3s, and switch the endpoint. This achieves the same result, which is to make sure that an unexpected disconnection still trigger a endpoint switch.
1 parent 0a36c2e commit eaf1d17

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

tests/chainsaw/tests/service/chainsaw-test.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ spec:
9191
content: |
9292
oc wait -n $NAMESPACE --for=jsonpath='{.status.readyReplicas}'=3 statefulset openstack-galera
9393
current=$ENDPOINT
94-
oc rsh -n $NAMESPACE $ENDPOINT killall -9 /usr/libexec/mysqld
94+
oc rsh -n $NAMESPACE $ENDPOINT killall -s STOP /usr/libexec/mysqld
9595
while [ "$current" = "$ENDPOINT" ]; do
9696
echo $(date) "$current" "$ENDPOINT"
9797
sleep 1

0 commit comments

Comments
 (0)