feat(operator): add custom probes in configmaps with override option by mvasilenko · Pull Request #500 · dragonflydb/dragonfly-operator

mvasilenko · 2026-03-29T11:51:46Z

Problem:

Currently CRD doesn't provide a way to customise pod readiness, liveness checks and startup probes - they are hardcoded in the operator binary.

Following suggestions #401 (comment) and dragonflydb/dragonfly#5921 (comment) opening PR to introduce custom pod readiness, liveness checks and startup probes.

Proposed solution:

3 ConfigMaps per Dragonfly instance with the default probe scripts, mounted into pods at /etc/dragonfly/probes/
3 new optional CR fields — customLivenessProbeConfigMap, customReadinessProbeConfigMap,
customStartupProbeConfigMap
default scripts are embedded into the operator binary //go:embed scripts/*.sh and written into the generated
ConfigMaps at reconcile time
RBAC ClusterRole updated to include configmaps CRUD verbs

Example:

custom liveness probe with connect timeout

apiVersion: v1
kind: ConfigMap
metadata:
  name: dragonfly-sample-probes
  namespace: default
data:
  liveness-check.sh: |
    #!/bin/sh
    RESPONSE=$(timeout 4 redis-cli -h localhost -p ${HEALTHCHECK_PORT:-9999} PING 2>/dev/null)
    case "$RESPONSE" in
      PONG|*LOADING*) exit 0 ;;
      *)              exit 1 ;;
    esac
---
apiVersion: dragonflydb.io/v1alpha1
kind: Dragonfly
metadata:
  name: dragonfly-sample
spec:
  replicas: 1
  customLivenessProbeConfigMap:
    name: dragonfly-sample-probes

Copilot

Pull request overview

Adds support for customizable liveness/readiness/startup probe scripts for Dragonfly pods by generating default script ConfigMaps per instance and allowing users to override each probe via new CR fields.

Changes:

Introduces three new optional CRD fields to reference custom probe ConfigMaps (liveness/readiness/startup).
Embeds default probe scripts in the operator and generates per-Dragonfly ConfigMaps, mounting them into pods and wiring probes to execute those scripts (adds a default StartupProbe).
Expands RBAC to allow ConfigMap CRUD and updates unit/e2e tests accordingly.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`api/v1alpha1/dragonfly_types.go`	Adds new `custom*ProbeConfigMap` fields to the Dragonfly spec.
`api/v1alpha1/zz_generated.deepcopy.go`	Updates deepcopy generation for the new spec fields.
`internal/resources/const.go`	Adds constants for probe ConfigMap names, script keys, mount paths, and volume names.
`internal/resources/scripts.go`	Embeds the default probe scripts into the operator binary.
`internal/resources/scripts/*.sh`	Adds default liveness/readiness/startup shell scripts.
`internal/resources/resources.go`	Generates probe ConfigMaps, mounts them, and updates probes to execute mounted scripts; adds StartupProbe and resolves probe port from args.
`internal/resources/resources_test.go`	Adds/extends unit tests covering port resolution, generated ConfigMaps, and probe volume/mount wiring.
`internal/controller/dragonfly_controller.go`	Updates kubebuilder RBAC markers to include ConfigMaps.
`config/rbac/role.yaml`	Adds ConfigMaps permissions to manager role.
`manifests/dragonfly-operator.yaml`	Updates rendered CRD schema and RBAC to include new fields and ConfigMap permissions.
`manifests/crd.yaml` / `config/crd/bases/dragonflydb.io_dragonflies.yaml`	Updates rendered CRD schemas with the new optional fields.
`e2e/util.go`	Adds a helper to wait for master pod existence.
`e2e/dragonfly_controller_test.go`	Adjusts lifecycle waits to use PhaseReady and adds explicit master-election waiting in one scenario.
`internal/controller/util_test.go`	Minor formatting alignment in a test struct.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-31T14:32:40Z

+	if df.Spec.CustomLivenessProbeConfigMap != nil {
+		livenessConfigMapName = df.Spec.CustomLivenessProbeConfigMap.Name
+	}
+	readinessConfigMapName := fmt.Sprintf("%s-%s", df.Name, ReadinessProbeConfigMapSuffix)
+	if df.Spec.CustomReadinessProbeConfigMap != nil {
+		readinessConfigMapName = df.Spec.CustomReadinessProbeConfigMap.Name
+	}
+	startupConfigMapName := fmt.Sprintf("%s-%s", df.Name, StartupProbeConfigMapSuffix)
+	if df.Spec.CustomStartupProbeConfigMap != nil {


When a custom probe ConfigMap reference is provided but name is empty (allowed by the CRD default/compatibility), this code will set the mounted ConfigMap name to an empty string, producing an invalid Pod spec (and the pod will stay Pending with a ConfigMap name "" not found). Guard the override with ref != nil && ref.Name != "" (or validate and surface a clear reconcile error/event) and fall back to the generated default name otherwise. Apply the same check for liveness/readiness/startup refs.

Suggested change

if df.Spec.CustomLivenessProbeConfigMap != nil {

livenessConfigMapName = df.Spec.CustomLivenessProbeConfigMap.Name

}

readinessConfigMapName := fmt.Sprintf("%s-%s", df.Name, ReadinessProbeConfigMapSuffix)

if df.Spec.CustomReadinessProbeConfigMap != nil {

readinessConfigMapName = df.Spec.CustomReadinessProbeConfigMap.Name

}

startupConfigMapName := fmt.Sprintf("%s-%s", df.Name, StartupProbeConfigMapSuffix)

if df.Spec.CustomStartupProbeConfigMap != nil {

if df.Spec.CustomLivenessProbeConfigMap != nil && df.Spec.CustomLivenessProbeConfigMap.Name != "" {

livenessConfigMapName = df.Spec.CustomLivenessProbeConfigMap.Name

}

readinessConfigMapName := fmt.Sprintf("%s-%s", df.Name, ReadinessProbeConfigMapSuffix)

if df.Spec.CustomReadinessProbeConfigMap != nil && df.Spec.CustomReadinessProbeConfigMap.Name != "" {

readinessConfigMapName = df.Spec.CustomReadinessProbeConfigMap.Name

}

startupConfigMapName := fmt.Sprintf("%s-%s", df.Name, StartupProbeConfigMapSuffix)

if df.Spec.CustomStartupProbeConfigMap != nil && df.Spec.CustomStartupProbeConfigMap.Name != "" {

Copilot · 2026-03-31T14:32:40Z

+		corev1.VolumeMount{
+			Name:      LivenessProbeVolumeName,
+			MountPath: ProbeMountPath + "/" + LivenessScriptKey,
+			SubPath:   LivenessScriptKey,
+		},


These probe scripts are mounted using SubPath. Kubernetes does not propagate ConfigMap updates into subPath mounts, so changing the generated/custom ConfigMap will not update the probe script inside already-running pods (until they restart). If you want ConfigMap updates to take effect, consider using a single projected volume (or mount each ConfigMap as a directory) and reference the scripts from that directory without SubPath.

Copilot · 2026-03-31T14:32:41Z

+								FailureThreshold:    60,
+								InitialDelaySeconds: 0,
+								PeriodSeconds:       5,
+								SuccessThreshold:    1,
+								TimeoutSeconds:      5,


The default StartupProbe here runs a script that currently fails while the server reports LOADING (see startup-check.sh), and with FailureThreshold=60 + PeriodSeconds=5 the pod gets at most ~5 minutes before kubelet restarts it. If snapshot/dataset loading can exceed this, pods may enter a restart loop. Consider increasing the allowed startup window substantially and/or changing the default startup script semantics so LOADING doesn't fail startup (leaving readiness to gate traffic).

Copilot · 2026-03-31T14:32:41Z

+func TestresolveDragonflyPort_Default(t *testing.T) {
+	assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort(nil))
+	assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{}))
+	assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--alsologtostderr"}))
+}
+
+func TestresolveDragonflyPort_CustomPort(t *testing.T) {
+	assert.Equal(t, int32(6380), resolveDragonflyPort([]string{"--port=6380"}))
+}
+
+func TestresolveDragonflyPort_InvalidPort(t *testing.T) {
+	// invalid value falls back to default
+	assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--port=notanumber"}))
+}
+
+func TestresolveDragonflyPort_OutOfRange(t *testing.T) {


These functions won’t run as Go tests because the name after Test must start with an uppercase letter (^Test[A-Z]). Rename to TestResolveDragonflyPort_* (capital R) so the resolveDragonflyPort behavior is actually exercised in CI.

Suggested change

func TestresolveDragonflyPort_Default(t *testing.T) {

assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort(nil))

assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{}))

assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--alsologtostderr"}))

}

func TestresolveDragonflyPort_CustomPort(t *testing.T) {

assert.Equal(t, int32(6380), resolveDragonflyPort([]string{"--port=6380"}))

}

func TestresolveDragonflyPort_InvalidPort(t *testing.T) {

// invalid value falls back to default

assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--port=notanumber"}))

}

func TestresolveDragonflyPort_OutOfRange(t *testing.T) {

func TestResolveDragonflyPort_Default(t *testing.T) {

assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort(nil))

assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{}))

assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--alsologtostderr"}))

}

func TestResolveDragonflyPort_CustomPort(t *testing.T) {

assert.Equal(t, int32(6380), resolveDragonflyPort([]string{"--port=6380"}))

}

func TestResolveDragonflyPort_InvalidPort(t *testing.T) {

// invalid value falls back to default

assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--port=notanumber"}))

}

func TestResolveDragonflyPort_OutOfRange(t *testing.T) {

Copilot · 2026-03-31T14:32:41Z

+func waitForMasterPod(ctx context.Context, c client.Client, name, namespace string, maxDuration time.Duration) error {
+	ctx, cancel := context.WithTimeout(ctx, maxDuration)
+	defer cancel()
+	for {
+		select {
+		case <-ctx.Done():
+			return fmt.Errorf("timed out waiting for master pod for %s", name)
+		default:
+			var pods corev1.PodList
+			if err := c.List(ctx, &pods, client.InNamespace(namespace), client.MatchingLabels{
+				resources.DragonflyNameLabelKey: name,
+				resources.RoleLabelKey:          resources.Master,
+			}); err == nil && len(pods.Items) > 0 {
+				return nil
+			}
+		}
+	}


waitForMasterPod busy-loops with no sleep/backoff and ignores list errors, which can peg CPU and hammer the API server (especially when the list call is failing). Add a small polling interval (e.g. time.Sleep or wait.PollUntilContextTimeout) and handle/return errors appropriately (or at least log + backoff) while waiting.

Copilot · 2026-03-31T14:32:42Z

+			waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseReady, 3*time.Minute)
+			waitForStatefulSetReady(ctx, k8sClient, name, namespace, 3*time.Minute)


The return values from waitForDragonflyPhase / waitForStatefulSetReady are ignored here, so timeouts/errors won’t fail the test at the point they occur (leading to harder-to-debug downstream failures). Capture the returned error and Expect(err).To(BeNil()) / Expect(...).To(Succeed()), consistent with other call sites in this suite.

Suggested change

waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseReady, 3*time.Minute)

waitForStatefulSetReady(ctx, k8sClient, name, namespace, 3*time.Minute)

err = waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseReady, 3*time.Minute)

Expect(err).To(BeNil())

err = waitForStatefulSetReady(ctx, k8sClient, name, namespace, 3*time.Minute)

Expect(err).To(BeNil())

miledxz

Overall looks good, left few comments @mvasilenko please take a look when you have time,

sorry for later reply, was over busy this week :)

miledxz · 2026-04-02T08:03:28Z

+	// ConfigMaps are appended before the StatefulSet so they exist when pods start
+	// and can mount the probe script volumes without getting stuck in Pending.
+	resources = append(resources,
+		generateProbeConfigMap(df, LivenessProbeConfigMapSuffix, LivenessScriptKey, defaultLivenessScript),
+		generateProbeConfigMap(df, ReadinessProbeConfigMapSuffix, ReadinessScriptKey, defaultReadinessScript),
+		generateProbeConfigMap(df, StartupProbeConfigMapSuffix, StartupScriptKey, defaultStartupScript),
+	)


These ConfigMaps are the first resources without a .Spec field to enter the reconcile loop. The existing resourceSpecsEqual() in dragonfly_instance.go compares objects by FieldByName("Spec") https://github.com/dragonflydb/dragonfly-operator/blob/main/internal/controller/dragonfly_instance.go#L742 — when that field doesn't exist (as with ConfigMaps, which store content in .Data), it returns true ("equal") and skips the update. The patch path has the same issue.

Impact on that is next,
if the operator ships updated probe scripts in a future release, existing ConfigMaps in the cluster will never be patched. Users would have to manually delete them.

This is a pre-existing gap in the reconciler, but this PR is the first to hit it.

I suggest fixing resourceSpecsEqual and the patch block in reconcileResources to handle ConfigMap .Data before merging — otherwise probe script upgrades will silently fail on existing clusters.

Suggestion:

In resourceSpecsEqual in dragonfly_instance.go, add before the FieldByName("Spec") block:

if cmDesired, ok := desired.(*corev1.ConfigMap); ok { if cmExisting, ok := existing.(*corev1.ConfigMap); ok { return reflect.DeepEqual(cmDesired.Data, cmExisting.Data) } }

or something similar,

WDYT @mvasilenko ?

also,

these three ConfigMaps are always generated even when the user has set customLivenessProbeConfigMap / customReadinessProbeConfigMap / customStartupProbeConfigMap.

The custom override only changes the volume reference (lines 436-447), but the default ConfigMaps are still created, sit unused in the namespace, and get reconciled every loop.

My suggestion — skip generating the default when a custom override is provided:

if df.Spec.CustomLivenessProbeConfigMap == nil { resources = append(resources, generateProbeConfigMap(df, LivenessProbeConfigMapSuffix, LivenessScriptKey, defaultLivenessScript)) } if df.Spec.CustomReadinessProbeConfigMap == nil { resources = append(resources, generateProbeConfigMap(df, ReadinessProbeConfigMapSuffix, ReadinessScriptKey, defaultReadinessScript)) } if df.Spec.CustomStartupProbeConfigMap == nil { resources = append(resources, generateProbeConfigMap(df, StartupProbeConfigMapSuffix, StartupScriptKey, defaultStartupScript)) }

miledxz · 2026-04-02T08:19:16Z

-							Env: append(df.Spec.Env, corev1.EnvVar{
-								Name:  "HEALTHCHECK_PORT",
-								Value: fmt.Sprintf("%d", DragonflyAdminPort),


HEALTHCHECK_PORT was removed from the container env here, but the PR description's example for custom probes still references ${HEALTHCHECK_PORT:-6379}. Anyone following the example will have their custom probes silently hit port 6379 instead of the admin port 9999 — which defeats the purpose on TLS clusters.

we can re-add HEALTHCHECK_PORT=9999 as an env var so custom scripts can reference it portably,
it is more user-friendly — it gives custom script authors a stable abstraction rather than requiring them to know internal port numbers

miledxz · 2026-04-02T08:22:52Z

+HOST="localhost"
+PORT=9999  # Dragonfly admin port — always plain-text, not user-configurable
+
+RESPONSE=$(redis-cli -h "$HOST" -p "$PORT" --no-auth-warning INFO persistence 2>/dev/null)


Liveness and startup scripts pass ${DFLY_requirepass:+-a "$DFLY_requirepass"} to redis-cli, but the readiness script does not:

readiness (no auth)

RESPONSE=$(redis-cli -h "$HOST" -p "$PORT" --no-auth-warning INFO persistence 2>/dev/null)

liveness (has auth)

RESPONSE=$(redis-cli -h "$HOST" -p "$PORT" --no-auth-warning
${DFLY_requirepass:+-a "$DFLY_requirepass"} PING 2>/dev/null)

Even if the admin port currently doesn't require auth, the scripts should be consistent. If admin port auth behavior ever changes, only the readiness probe would break. I suggest adding the same ${DFLY_requirepass:+-a "$DFLY_requirepass"} to the readiness script as well.

miledxz · 2026-04-02T08:24:55Z

+	// Note: AdditionalVolumes takes precedence — a volume named "liveness-probe",
+	// "readiness-probe", or "startup-probe" in spec.additionalVolumes will override
+	// the operator-generated probe volume for that slot.


The comment here correctly notes that additionalVolumes can override the probe volumes by matching the volume name. Combined with the new customXxxProbeConfigMap API fields, users now have two different ways to override probe scripts, and nothing prevents them from using both at once (which could conflict).

maybe worth considering of adding a validation webhook that rejects the CR if both mechanisms target the same probe, or just documenting clearly in the CRD field descriptions which mechanism takes precedence.

miledxz · 2026-04-02T08:28:08Z

+							// Probe semantics during dataset LOADING (large snapshot restore):
+						//   StartupProbe  — succeeds on any PING response (PONG or LOADING); prevents
+						//                   liveness from firing before the process is up.
+						//   LivenessProbe — succeeds on any PING response (PONG or LOADING); must NOT
+						//                   restart a pod that is mid-restore, as that aborts the load
+						//                   and creates a crash loop (see issues #426, #508).
+						//   ReadinessProbe — fails on LOADING; gates traffic until the dataset is fully
+						//                   loaded and Dragonfly can serve requests.
+						ReadinessProbe: &corev1.Probe{


@mvasilenko worth running go fmt for this file

mvasilenko · 2026-04-03T11:26:17Z

@miledxz addressed comments, appreciate re-review.

Thanks for your patience — this is my first contribution

Abhra303 · 2026-04-08T07:34:28Z

Hey, can you fix the conflicts please?

mvasilenko · 2026-04-08T12:09:35Z

updated branch with new main, could you check please?

Abhra303

Hey, see some comments. Looks OK overall. But my main concern is when the operator is upgraded, all existing resources will be updated immediately as the operator reconciler will see the configmaps are missing and statefulset has different probe than desired. This is somewhat breaking the compatibility.

Abhra303 · 2026-04-10T09:59:02Z

-			waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseResourcesCreated, 2*time.Minute)
-			waitForStatefulSetReady(ctx, k8sClient, name, namespace, 2*time.Minute)
-
+			// Wait for master election, then for all replicas to be Ready


these e2e fix is already merged. Please update the PR with the latest main branch.

Abhra303 · 2026-04-10T09:59:29Z

+// waitForMasterPod polls until at least one pod with role=master exists. Use this
+// after waitForStatefulSetReady to guarantee the lifecycle controller has finished
+// master election before the test tries to connect.
+func waitForMasterPod(ctx context.Context, c client.Client, name, namespace string, maxDuration time.Duration) error {


these are merged. Please update the PR.

Abhra303 · 2026-04-10T10:01:11Z

 	dflyUserGroup int64 = 999
 )

+func generateProbeConfigMap(df *resourcesv1.Dragonfly, suffix, key, script string) *corev1.ConfigMap {


pass configmap name instead of suffix and assign it to Name under ObjectMeta.

Abhra303 · 2026-04-10T10:04:30Z

-											"/bin/sh",
-											"/usr/local/bin/healthcheck.sh",
-										},
+										Command: []string{"/bin/sh", ProbeMountPath + "/" + ReadinessScriptKey},


Instead of concatenating strings every time, maybe use a helper function that accepts probe file name.

Abhra303 · 2026-04-10T10:05:17Z

-											"/bin/sh",
-											"/usr/local/bin/healthcheck.sh",
-										},
+										Command: []string{"/bin/sh", ProbeMountPath + "/" + LivenessScriptKey},


Instead of concatenating strings every time, maybe use a helper function that accepts probe file name.

Abhra303 · 2026-04-10T10:07:34Z

+	// Probe script volumes — the corresponding ConfigMaps are generated and appended to resources below.
+	livenessConfigMapName := fmt.Sprintf("%s-%s", df.Name, LivenessProbeConfigMapSuffix)
+	if df.Spec.CustomLivenessProbeConfigMap != nil && df.Spec.CustomLivenessProbeConfigMap.Name != "" {
+		livenessConfigMapName = df.Spec.CustomLivenessProbeConfigMap.Name
+	}
+	readinessConfigMapName := fmt.Sprintf("%s-%s", df.Name, ReadinessProbeConfigMapSuffix)
+	if df.Spec.CustomReadinessProbeConfigMap != nil && df.Spec.CustomReadinessProbeConfigMap.Name != "" {
+		readinessConfigMapName = df.Spec.CustomReadinessProbeConfigMap.Name
+	}
+	startupConfigMapName := fmt.Sprintf("%s-%s", df.Name, StartupProbeConfigMapSuffix)
+	if df.Spec.CustomStartupProbeConfigMap != nil && df.Spec.CustomStartupProbeConfigMap.Name != "" {
+		startupConfigMapName = df.Spec.CustomStartupProbeConfigMap.Name
+	}
+
+	statefulset.Spec.Template.Spec.Volumes = append(statefulset.Spec.Template.Spec.Volumes,
+		corev1.Volume{
+			Name: LivenessProbeVolumeName,
+			VolumeSource: corev1.VolumeSource{
+				ConfigMap: &corev1.ConfigMapVolumeSource{
+					LocalObjectReference: corev1.LocalObjectReference{Name: livenessConfigMapName},
+				},
+			},
+		},
+		corev1.Volume{
+			Name: ReadinessProbeVolumeName,
+			VolumeSource: corev1.VolumeSource{
+				ConfigMap: &corev1.ConfigMapVolumeSource{
+					LocalObjectReference: corev1.LocalObjectReference{Name: readinessConfigMapName},
+				},
+			},
+		},
+		corev1.Volume{
+			Name: StartupProbeVolumeName,
+			VolumeSource: corev1.VolumeSource{
+				ConfigMap: &corev1.ConfigMapVolumeSource{
+					LocalObjectReference: corev1.LocalObjectReference{Name: startupConfigMapName},
+				},
+			},
+		},
+	)
+
+	statefulset.Spec.Template.Spec.Containers[0].VolumeMounts = append(
+		statefulset.Spec.Template.Spec.Containers[0].VolumeMounts,
+		corev1.VolumeMount{
+			Name:      LivenessProbeVolumeName,
+			MountPath: ProbeMountPath + "/" + LivenessScriptKey,
+			SubPath:   LivenessScriptKey,
+		},
+		corev1.VolumeMount{
+			Name:      ReadinessProbeVolumeName,
+			MountPath: ProbeMountPath + "/" + ReadinessScriptKey,
+			SubPath:   ReadinessScriptKey,
+		},
+		corev1.VolumeMount{
+			Name:      StartupProbeVolumeName,
+			MountPath: ProbeMountPath + "/" + StartupScriptKey,
+			SubPath:   StartupScriptKey,
+		},
+	)
+


could you please move all this to a helper function. The function is getting too big.

Abhra303 · 2026-04-10T10:13:03Z

+	// and can mount the probe script volumes without getting stuck in Pending.
+	// Skip generating a default when the user has pointed to their own ConfigMap.
+	if df.Spec.CustomLivenessProbeConfigMap == nil || df.Spec.CustomLivenessProbeConfigMap.Name == "" {
+		resources = append(resources, generateProbeConfigMap(df, LivenessProbeConfigMapSuffix, LivenessScriptKey, defaultLivenessScript))


pass the exact name i.e. fmt.Sprintf("%s-%s,....) here.

Abhra303 · 2026-04-10T10:13:20Z

+		resources = append(resources, generateProbeConfigMap(df, LivenessProbeConfigMapSuffix, LivenessScriptKey, defaultLivenessScript))
+	}
+	if df.Spec.CustomReadinessProbeConfigMap == nil || df.Spec.CustomReadinessProbeConfigMap.Name == "" {
+		resources = append(resources, generateProbeConfigMap(df, ReadinessProbeConfigMapSuffix, ReadinessScriptKey, defaultReadinessScript))


pass the exact name here instead of the suffix.

Abhra303 · 2026-04-10T10:13:34Z

+		resources = append(resources, generateProbeConfigMap(df, ReadinessProbeConfigMapSuffix, ReadinessScriptKey, defaultReadinessScript))
+	}
+	if df.Spec.CustomStartupProbeConfigMap == nil || df.Spec.CustomStartupProbeConfigMap.Name == "" {
+		resources = append(resources, generateProbeConfigMap(df, StartupProbeConfigMapSuffix, StartupScriptKey, defaultStartupScript))


pass the exact name instead of the suffix.

Abhra303 · 2026-04-10T10:20:40Z

 //+kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch;create;update;patch;delete
 //+kubebuilder:rbac:groups="",resources=pods,verbs=get;list;watch;create;update;patch;delete
 //+kubebuilder:rbac:groups="",resources=events,verbs=create;patch
+//+kubebuilder:rbac:groups="",resources=configmaps,verbs=get;list;watch;create;update;patch;delete


You also need to add .Owns(&corev1.ConfigMap{},...) in SetupWithManager function in the dragonfly_controller.go. Else, reconciler won't be triggered when configmaps are deleted/updated.

…tion

…ated them

mvasilenko · 2026-04-16T16:27:54Z

Thanks for the review, addressed comments.

As for the main concern, I've updated and tested the code - see gist, it should be no-op for existing users and only on CRD patch with opt-in for custom probes, pods would be recreated.

Another gap I found with current CM-based approach - on accidental probe's ConfigMap deletion, referenced in Dragonfly CR spec - the probes break and pods start failing.

So we could add finalizers like dragonflydb.io/probe-configmap-protection to user-provided probe CMs or webhooks for CMs deletion prevention, wdyt?

mvasilenko mentioned this pull request Mar 29, 2026

Custom configmap-based health check probe scripts with override support #499

Open

miledxz requested a review from Copilot March 31, 2026 14:23

Copilot started reviewing on behalf of miledxz March 31, 2026 14:23 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

miledxz reviewed Apr 2, 2026

View reviewed changes

mvasilenko force-pushed the feat/custom-probe-scripts branch 2 times, most recently from 9839739 to ab7f485 Compare April 3, 2026 09:48

mvasilenko requested a review from miledxz April 3, 2026 11:26

Abhra303 linked an issue Apr 8, 2026 that may be closed by this pull request

Custom configmap-based health check probe scripts with override support #499

Open

Abhra303 requested changes Apr 10, 2026

View reviewed changes

mvasilenko added 2 commits April 10, 2026 14:44

feat(operator): custom probe scripts from configmaps with override op…

a0d4647

…tion

refactor as per comments

749ade0

mvasilenko force-pushed the feat/custom-probe-scripts branch from a6f1727 to 749ade0 Compare April 12, 2026 08:50

mvasilenko mentioned this pull request Apr 12, 2026

feat(operator): address Abhra303 review on custom probe configmaps mvasilenko/dragonfly-operator#13

Merged

3 tasks

mvasilenko added 3 commits April 14, 2026 00:16

dont update statefulsets by default

ba72ea4

add tests - dont generate configmaps by operator, if user already cre…

0789610

…ated them

ci: trigger tests

430b033

mvasilenko requested a review from Abhra303 April 16, 2026 16:28

		waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseReady, 3*time.Minute)
		waitForStatefulSetReady(ctx, k8sClient, name, namespace, 3*time.Minute)

-			waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseReady, 3*time.Minute)
-			waitForStatefulSetReady(ctx, k8sClient, name, namespace, 3*time.Minute)
+			err = waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseReady, 3*time.Minute)
+			Expect(err).To(BeNil())
+			err = waitForStatefulSetReady(ctx, k8sClient, name, namespace, 3*time.Minute)
+			Expect(err).To(BeNil())

Conversation

mvasilenko commented Mar 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

miledxz left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

readiness (no auth)

liveness (has auth)

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mvasilenko commented Apr 3, 2026

Uh oh!

Abhra303 commented Apr 8, 2026

Uh oh!

mvasilenko commented Apr 8, 2026

Uh oh!

Abhra303 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mvasilenko commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

mvasilenko commented Mar 29, 2026 •

edited

Loading

mvasilenko commented Apr 16, 2026 •

edited

Loading