Skip to content

Commit 4beccf3

Browse files
committed
Do not start galera as joiner with 1-replica cluster
The mariadb operator checks for available pods in the galera statefulset to determine whether to start mysqld as a bootstrap or a joiner node on all the pods that remain to be started. When galera is deployed as a 1-replica cluster (e.g. in CI), there is a small time window after the statefulset has been probed and galera marked as 'bootstrapped', where the single pod can crash before being probed. If so, the operator will try to restart the pod as a 'joiner', which is invalid. Add a specific check for 1-replica deployments, so that the operator bails out and requeue the event when a pod is identified as a joiner. This allows the operator to reprobe the galera state restart the pod correctly, in order to avoid an unecessary error in the logs. Jira: OSPRH-7821
1 parent ff694b3 commit 4beccf3

File tree

1 file changed

+11
-2
lines changed

1 file changed

+11
-2
lines changed

controllers/galera_controller.go

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -691,7 +691,7 @@ func (r *GaleraReconciler) Reconcile(ctx context.Context, req ctrl.Request) (res
691691
// Note:
692692
// . A pod is available in the statefulset if the pod's readiness
693693
// probe returns true (i.e. galera is running in the pod and clustered)
694-
// . Cluster is bootstrapped if as soon as one pod is available
694+
// . Cluster is bootstrapped as soon as one pod is available
695695
instance.Status.Bootstrapped = statefulset.Status.AvailableReplicas > 0
696696

697697
if instance.Status.Bootstrapped {
@@ -708,8 +708,17 @@ func (r *GaleraReconciler) Reconcile(ctx context.Context, req ctrl.Request) (res
708708
}
709709
}
710710

711+
runningPods := getRunningPodsMissingGcomm(ctx, podList.Items, instance, helper, r.config)
712+
// Special case for 1-node deployment: if the statefulset reports 1 node is available
713+
// but the pod shows up in runningPods (i.e. NotReady), do not consider it a joiner.
714+
// Wait for the two statuses to re-sync after another k8s probe is run.
715+
if *instance.Spec.Replicas == 1 && len(runningPods) == 1 {
716+
log.Info("Galera node no longer running. Requeuing")
717+
return ctrl.Result{RequeueAfter: time.Duration(3) * time.Second}, nil
718+
}
719+
711720
// The other 'Running' pods can join the existing cluster.
712-
for _, pod := range getRunningPodsMissingGcomm(ctx, podList.Items, instance, helper, r.config) {
721+
for _, pod := range runningPods {
713722
name := pod.Name
714723
joinerURI := buildGcommURI(instance)
715724
log.Info("Pushing gcomm URI to joiner", "pod", name)

0 commit comments

Comments
 (0)