feat(operator): add custom probes in configmaps with override option#500
feat(operator): add custom probes in configmaps with override option#500mvasilenko wants to merge 5 commits intodragonflydb:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for customizable liveness/readiness/startup probe scripts for Dragonfly pods by generating default script ConfigMaps per instance and allowing users to override each probe via new CR fields.
Changes:
- Introduces three new optional CRD fields to reference custom probe ConfigMaps (liveness/readiness/startup).
- Embeds default probe scripts in the operator and generates per-Dragonfly ConfigMaps, mounting them into pods and wiring probes to execute those scripts (adds a default StartupProbe).
- Expands RBAC to allow ConfigMap CRUD and updates unit/e2e tests accordingly.
Reviewed changes
Copilot reviewed 16 out of 17 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
api/v1alpha1/dragonfly_types.go |
Adds new custom*ProbeConfigMap fields to the Dragonfly spec. |
api/v1alpha1/zz_generated.deepcopy.go |
Updates deepcopy generation for the new spec fields. |
internal/resources/const.go |
Adds constants for probe ConfigMap names, script keys, mount paths, and volume names. |
internal/resources/scripts.go |
Embeds the default probe scripts into the operator binary. |
internal/resources/scripts/*.sh |
Adds default liveness/readiness/startup shell scripts. |
internal/resources/resources.go |
Generates probe ConfigMaps, mounts them, and updates probes to execute mounted scripts; adds StartupProbe and resolves probe port from args. |
internal/resources/resources_test.go |
Adds/extends unit tests covering port resolution, generated ConfigMaps, and probe volume/mount wiring. |
internal/controller/dragonfly_controller.go |
Updates kubebuilder RBAC markers to include ConfigMaps. |
config/rbac/role.yaml |
Adds ConfigMaps permissions to manager role. |
manifests/dragonfly-operator.yaml |
Updates rendered CRD schema and RBAC to include new fields and ConfigMap permissions. |
manifests/crd.yaml / config/crd/bases/dragonflydb.io_dragonflies.yaml |
Updates rendered CRD schemas with the new optional fields. |
e2e/util.go |
Adds a helper to wait for master pod existence. |
e2e/dragonfly_controller_test.go |
Adjusts lifecycle waits to use PhaseReady and adds explicit master-election waiting in one scenario. |
internal/controller/util_test.go |
Minor formatting alignment in a test struct. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if df.Spec.CustomLivenessProbeConfigMap != nil { | ||
| livenessConfigMapName = df.Spec.CustomLivenessProbeConfigMap.Name | ||
| } | ||
| readinessConfigMapName := fmt.Sprintf("%s-%s", df.Name, ReadinessProbeConfigMapSuffix) | ||
| if df.Spec.CustomReadinessProbeConfigMap != nil { | ||
| readinessConfigMapName = df.Spec.CustomReadinessProbeConfigMap.Name | ||
| } | ||
| startupConfigMapName := fmt.Sprintf("%s-%s", df.Name, StartupProbeConfigMapSuffix) | ||
| if df.Spec.CustomStartupProbeConfigMap != nil { |
There was a problem hiding this comment.
When a custom probe ConfigMap reference is provided but name is empty (allowed by the CRD default/compatibility), this code will set the mounted ConfigMap name to an empty string, producing an invalid Pod spec (and the pod will stay Pending with a ConfigMap name "" not found). Guard the override with ref != nil && ref.Name != "" (or validate and surface a clear reconcile error/event) and fall back to the generated default name otherwise. Apply the same check for liveness/readiness/startup refs.
| if df.Spec.CustomLivenessProbeConfigMap != nil { | |
| livenessConfigMapName = df.Spec.CustomLivenessProbeConfigMap.Name | |
| } | |
| readinessConfigMapName := fmt.Sprintf("%s-%s", df.Name, ReadinessProbeConfigMapSuffix) | |
| if df.Spec.CustomReadinessProbeConfigMap != nil { | |
| readinessConfigMapName = df.Spec.CustomReadinessProbeConfigMap.Name | |
| } | |
| startupConfigMapName := fmt.Sprintf("%s-%s", df.Name, StartupProbeConfigMapSuffix) | |
| if df.Spec.CustomStartupProbeConfigMap != nil { | |
| if df.Spec.CustomLivenessProbeConfigMap != nil && df.Spec.CustomLivenessProbeConfigMap.Name != "" { | |
| livenessConfigMapName = df.Spec.CustomLivenessProbeConfigMap.Name | |
| } | |
| readinessConfigMapName := fmt.Sprintf("%s-%s", df.Name, ReadinessProbeConfigMapSuffix) | |
| if df.Spec.CustomReadinessProbeConfigMap != nil && df.Spec.CustomReadinessProbeConfigMap.Name != "" { | |
| readinessConfigMapName = df.Spec.CustomReadinessProbeConfigMap.Name | |
| } | |
| startupConfigMapName := fmt.Sprintf("%s-%s", df.Name, StartupProbeConfigMapSuffix) | |
| if df.Spec.CustomStartupProbeConfigMap != nil && df.Spec.CustomStartupProbeConfigMap.Name != "" { |
| corev1.VolumeMount{ | ||
| Name: LivenessProbeVolumeName, | ||
| MountPath: ProbeMountPath + "/" + LivenessScriptKey, | ||
| SubPath: LivenessScriptKey, | ||
| }, |
There was a problem hiding this comment.
These probe scripts are mounted using SubPath. Kubernetes does not propagate ConfigMap updates into subPath mounts, so changing the generated/custom ConfigMap will not update the probe script inside already-running pods (until they restart). If you want ConfigMap updates to take effect, consider using a single projected volume (or mount each ConfigMap as a directory) and reference the scripts from that directory without SubPath.
| FailureThreshold: 60, | ||
| InitialDelaySeconds: 0, | ||
| PeriodSeconds: 5, | ||
| SuccessThreshold: 1, | ||
| TimeoutSeconds: 5, |
There was a problem hiding this comment.
The default StartupProbe here runs a script that currently fails while the server reports LOADING (see startup-check.sh), and with FailureThreshold=60 + PeriodSeconds=5 the pod gets at most ~5 minutes before kubelet restarts it. If snapshot/dataset loading can exceed this, pods may enter a restart loop. Consider increasing the allowed startup window substantially and/or changing the default startup script semantics so LOADING doesn't fail startup (leaving readiness to gate traffic).
| func TestresolveDragonflyPort_Default(t *testing.T) { | ||
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort(nil)) | ||
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{})) | ||
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--alsologtostderr"})) | ||
| } | ||
|
|
||
| func TestresolveDragonflyPort_CustomPort(t *testing.T) { | ||
| assert.Equal(t, int32(6380), resolveDragonflyPort([]string{"--port=6380"})) | ||
| } | ||
|
|
||
| func TestresolveDragonflyPort_InvalidPort(t *testing.T) { | ||
| // invalid value falls back to default | ||
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--port=notanumber"})) | ||
| } | ||
|
|
||
| func TestresolveDragonflyPort_OutOfRange(t *testing.T) { |
There was a problem hiding this comment.
These functions won’t run as Go tests because the name after Test must start with an uppercase letter (^Test[A-Z]). Rename to TestResolveDragonflyPort_* (capital R) so the resolveDragonflyPort behavior is actually exercised in CI.
| func TestresolveDragonflyPort_Default(t *testing.T) { | |
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort(nil)) | |
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{})) | |
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--alsologtostderr"})) | |
| } | |
| func TestresolveDragonflyPort_CustomPort(t *testing.T) { | |
| assert.Equal(t, int32(6380), resolveDragonflyPort([]string{"--port=6380"})) | |
| } | |
| func TestresolveDragonflyPort_InvalidPort(t *testing.T) { | |
| // invalid value falls back to default | |
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--port=notanumber"})) | |
| } | |
| func TestresolveDragonflyPort_OutOfRange(t *testing.T) { | |
| func TestResolveDragonflyPort_Default(t *testing.T) { | |
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort(nil)) | |
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{})) | |
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--alsologtostderr"})) | |
| } | |
| func TestResolveDragonflyPort_CustomPort(t *testing.T) { | |
| assert.Equal(t, int32(6380), resolveDragonflyPort([]string{"--port=6380"})) | |
| } | |
| func TestResolveDragonflyPort_InvalidPort(t *testing.T) { | |
| // invalid value falls back to default | |
| assert.Equal(t, int32(DragonflyPort), resolveDragonflyPort([]string{"--port=notanumber"})) | |
| } | |
| func TestResolveDragonflyPort_OutOfRange(t *testing.T) { |
| func waitForMasterPod(ctx context.Context, c client.Client, name, namespace string, maxDuration time.Duration) error { | ||
| ctx, cancel := context.WithTimeout(ctx, maxDuration) | ||
| defer cancel() | ||
| for { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return fmt.Errorf("timed out waiting for master pod for %s", name) | ||
| default: | ||
| var pods corev1.PodList | ||
| if err := c.List(ctx, &pods, client.InNamespace(namespace), client.MatchingLabels{ | ||
| resources.DragonflyNameLabelKey: name, | ||
| resources.RoleLabelKey: resources.Master, | ||
| }); err == nil && len(pods.Items) > 0 { | ||
| return nil | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
waitForMasterPod busy-loops with no sleep/backoff and ignores list errors, which can peg CPU and hammer the API server (especially when the list call is failing). Add a small polling interval (e.g. time.Sleep or wait.PollUntilContextTimeout) and handle/return errors appropriately (or at least log + backoff) while waiting.
| waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseReady, 3*time.Minute) | ||
| waitForStatefulSetReady(ctx, k8sClient, name, namespace, 3*time.Minute) |
There was a problem hiding this comment.
The return values from waitForDragonflyPhase / waitForStatefulSetReady are ignored here, so timeouts/errors won’t fail the test at the point they occur (leading to harder-to-debug downstream failures). Capture the returned error and Expect(err).To(BeNil()) / Expect(...).To(Succeed()), consistent with other call sites in this suite.
| waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseReady, 3*time.Minute) | |
| waitForStatefulSetReady(ctx, k8sClient, name, namespace, 3*time.Minute) | |
| err = waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseReady, 3*time.Minute) | |
| Expect(err).To(BeNil()) | |
| err = waitForStatefulSetReady(ctx, k8sClient, name, namespace, 3*time.Minute) | |
| Expect(err).To(BeNil()) |
miledxz
left a comment
There was a problem hiding this comment.
Overall looks good, left few comments @mvasilenko please take a look when you have time,
sorry for later reply, was over busy this week :)
| // ConfigMaps are appended before the StatefulSet so they exist when pods start | ||
| // and can mount the probe script volumes without getting stuck in Pending. | ||
| resources = append(resources, | ||
| generateProbeConfigMap(df, LivenessProbeConfigMapSuffix, LivenessScriptKey, defaultLivenessScript), | ||
| generateProbeConfigMap(df, ReadinessProbeConfigMapSuffix, ReadinessScriptKey, defaultReadinessScript), | ||
| generateProbeConfigMap(df, StartupProbeConfigMapSuffix, StartupScriptKey, defaultStartupScript), | ||
| ) |
There was a problem hiding this comment.
These ConfigMaps are the first resources without a .Spec field to enter the reconcile loop. The existing resourceSpecsEqual() in dragonfly_instance.go compares objects by FieldByName("Spec") https://github.com/dragonflydb/dragonfly-operator/blob/main/internal/controller/dragonfly_instance.go#L742 — when that field doesn't exist (as with ConfigMaps, which store content in .Data), it returns true ("equal") and skips the update. The patch path has the same issue.
Impact on that is next,
if the operator ships updated probe scripts in a future release, existing ConfigMaps in the cluster will never be patched. Users would have to manually delete them.
This is a pre-existing gap in the reconciler, but this PR is the first to hit it.
I suggest fixing resourceSpecsEqual and the patch block in reconcileResources to handle ConfigMap .Data before merging — otherwise probe script upgrades will silently fail on existing clusters.
Suggestion:
In resourceSpecsEqual in dragonfly_instance.go, add before the FieldByName("Spec") block:
if cmDesired, ok := desired.(*corev1.ConfigMap); ok {
if cmExisting, ok := existing.(*corev1.ConfigMap); ok {
return reflect.DeepEqual(cmDesired.Data, cmExisting.Data)
}
}
or something similar,
WDYT @mvasilenko ?
There was a problem hiding this comment.
also,
these three ConfigMaps are always generated even when the user has set customLivenessProbeConfigMap / customReadinessProbeConfigMap / customStartupProbeConfigMap.
The custom override only changes the volume reference (lines 436-447), but the default ConfigMaps are still created, sit unused in the namespace, and get reconciled every loop.
My suggestion — skip generating the default when a custom override is provided:
if df.Spec.CustomLivenessProbeConfigMap == nil {
resources = append(resources, generateProbeConfigMap(df, LivenessProbeConfigMapSuffix, LivenessScriptKey, defaultLivenessScript))
}
if df.Spec.CustomReadinessProbeConfigMap == nil {
resources = append(resources, generateProbeConfigMap(df, ReadinessProbeConfigMapSuffix, ReadinessScriptKey, defaultReadinessScript))
}
if df.Spec.CustomStartupProbeConfigMap == nil {
resources = append(resources, generateProbeConfigMap(df, StartupProbeConfigMapSuffix, StartupScriptKey, defaultStartupScript))
}
| Env: append(df.Spec.Env, corev1.EnvVar{ | ||
| Name: "HEALTHCHECK_PORT", | ||
| Value: fmt.Sprintf("%d", DragonflyAdminPort), |
There was a problem hiding this comment.
HEALTHCHECK_PORT was removed from the container env here, but the PR description's example for custom probes still references ${HEALTHCHECK_PORT:-6379}. Anyone following the example will have their custom probes silently hit port 6379 instead of the admin port 9999 — which defeats the purpose on TLS clusters.
we can re-add HEALTHCHECK_PORT=9999 as an env var so custom scripts can reference it portably,
it is more user-friendly — it gives custom script authors a stable abstraction rather than requiring them to know internal port numbers
| HOST="localhost" | ||
| PORT=9999 # Dragonfly admin port — always plain-text, not user-configurable | ||
|
|
||
| RESPONSE=$(redis-cli -h "$HOST" -p "$PORT" --no-auth-warning INFO persistence 2>/dev/null) |
There was a problem hiding this comment.
Liveness and startup scripts pass ${DFLY_requirepass:+-a "$DFLY_requirepass"} to redis-cli, but the readiness script does not:
readiness (no auth)
RESPONSE=$(redis-cli -h "$HOST" -p "$PORT" --no-auth-warning INFO persistence 2>/dev/null)
liveness (has auth)
RESPONSE=$(redis-cli -h "$HOST" -p "$PORT" --no-auth-warning
${DFLY_requirepass:+-a "$DFLY_requirepass"} PING 2>/dev/null)
Even if the admin port currently doesn't require auth, the scripts should be consistent. If admin port auth behavior ever changes, only the readiness probe would break. I suggest adding the same ${DFLY_requirepass:+-a "$DFLY_requirepass"} to the readiness script as well.
| // Note: AdditionalVolumes takes precedence — a volume named "liveness-probe", | ||
| // "readiness-probe", or "startup-probe" in spec.additionalVolumes will override | ||
| // the operator-generated probe volume for that slot. |
There was a problem hiding this comment.
The comment here correctly notes that additionalVolumes can override the probe volumes by matching the volume name. Combined with the new customXxxProbeConfigMap API fields, users now have two different ways to override probe scripts, and nothing prevents them from using both at once (which could conflict).
maybe worth considering of adding a validation webhook that rejects the CR if both mechanisms target the same probe, or just documenting clearly in the CRD field descriptions which mechanism takes precedence.
| // Probe semantics during dataset LOADING (large snapshot restore): | ||
| // StartupProbe — succeeds on any PING response (PONG or LOADING); prevents | ||
| // liveness from firing before the process is up. | ||
| // LivenessProbe — succeeds on any PING response (PONG or LOADING); must NOT | ||
| // restart a pod that is mid-restore, as that aborts the load | ||
| // and creates a crash loop (see issues #426, #508). | ||
| // ReadinessProbe — fails on LOADING; gates traffic until the dataset is fully | ||
| // loaded and Dragonfly can serve requests. | ||
| ReadinessProbe: &corev1.Probe{ |
9839739 to
ab7f485
Compare
|
@miledxz addressed comments, appreciate re-review. Thanks for your patience — this is my first contribution |
|
Hey, can you fix the conflicts please? |
|
updated branch with new main, could you check please? |
Abhra303
left a comment
There was a problem hiding this comment.
Hey, see some comments. Looks OK overall. But my main concern is when the operator is upgraded, all existing resources will be updated immediately as the operator reconciler will see the configmaps are missing and statefulset has different probe than desired. This is somewhat breaking the compatibility.
| waitForDragonflyPhase(ctx, k8sClient, name, namespace, controller.PhaseResourcesCreated, 2*time.Minute) | ||
| waitForStatefulSetReady(ctx, k8sClient, name, namespace, 2*time.Minute) | ||
|
|
||
| // Wait for master election, then for all replicas to be Ready |
There was a problem hiding this comment.
these e2e fix is already merged. Please update the PR with the latest main branch.
| // waitForMasterPod polls until at least one pod with role=master exists. Use this | ||
| // after waitForStatefulSetReady to guarantee the lifecycle controller has finished | ||
| // master election before the test tries to connect. | ||
| func waitForMasterPod(ctx context.Context, c client.Client, name, namespace string, maxDuration time.Duration) error { |
There was a problem hiding this comment.
these are merged. Please update the PR.
| dflyUserGroup int64 = 999 | ||
| ) | ||
|
|
||
| func generateProbeConfigMap(df *resourcesv1.Dragonfly, suffix, key, script string) *corev1.ConfigMap { |
There was a problem hiding this comment.
pass configmap name instead of suffix and assign it to Name under ObjectMeta.
| "/bin/sh", | ||
| "/usr/local/bin/healthcheck.sh", | ||
| }, | ||
| Command: []string{"/bin/sh", ProbeMountPath + "/" + ReadinessScriptKey}, |
There was a problem hiding this comment.
Instead of concatenating strings every time, maybe use a helper function that accepts probe file name.
| "/bin/sh", | ||
| "/usr/local/bin/healthcheck.sh", | ||
| }, | ||
| Command: []string{"/bin/sh", ProbeMountPath + "/" + LivenessScriptKey}, |
There was a problem hiding this comment.
Instead of concatenating strings every time, maybe use a helper function that accepts probe file name.
| // Probe script volumes — the corresponding ConfigMaps are generated and appended to resources below. | ||
| livenessConfigMapName := fmt.Sprintf("%s-%s", df.Name, LivenessProbeConfigMapSuffix) | ||
| if df.Spec.CustomLivenessProbeConfigMap != nil && df.Spec.CustomLivenessProbeConfigMap.Name != "" { | ||
| livenessConfigMapName = df.Spec.CustomLivenessProbeConfigMap.Name | ||
| } | ||
| readinessConfigMapName := fmt.Sprintf("%s-%s", df.Name, ReadinessProbeConfigMapSuffix) | ||
| if df.Spec.CustomReadinessProbeConfigMap != nil && df.Spec.CustomReadinessProbeConfigMap.Name != "" { | ||
| readinessConfigMapName = df.Spec.CustomReadinessProbeConfigMap.Name | ||
| } | ||
| startupConfigMapName := fmt.Sprintf("%s-%s", df.Name, StartupProbeConfigMapSuffix) | ||
| if df.Spec.CustomStartupProbeConfigMap != nil && df.Spec.CustomStartupProbeConfigMap.Name != "" { | ||
| startupConfigMapName = df.Spec.CustomStartupProbeConfigMap.Name | ||
| } | ||
|
|
||
| statefulset.Spec.Template.Spec.Volumes = append(statefulset.Spec.Template.Spec.Volumes, | ||
| corev1.Volume{ | ||
| Name: LivenessProbeVolumeName, | ||
| VolumeSource: corev1.VolumeSource{ | ||
| ConfigMap: &corev1.ConfigMapVolumeSource{ | ||
| LocalObjectReference: corev1.LocalObjectReference{Name: livenessConfigMapName}, | ||
| }, | ||
| }, | ||
| }, | ||
| corev1.Volume{ | ||
| Name: ReadinessProbeVolumeName, | ||
| VolumeSource: corev1.VolumeSource{ | ||
| ConfigMap: &corev1.ConfigMapVolumeSource{ | ||
| LocalObjectReference: corev1.LocalObjectReference{Name: readinessConfigMapName}, | ||
| }, | ||
| }, | ||
| }, | ||
| corev1.Volume{ | ||
| Name: StartupProbeVolumeName, | ||
| VolumeSource: corev1.VolumeSource{ | ||
| ConfigMap: &corev1.ConfigMapVolumeSource{ | ||
| LocalObjectReference: corev1.LocalObjectReference{Name: startupConfigMapName}, | ||
| }, | ||
| }, | ||
| }, | ||
| ) | ||
|
|
||
| statefulset.Spec.Template.Spec.Containers[0].VolumeMounts = append( | ||
| statefulset.Spec.Template.Spec.Containers[0].VolumeMounts, | ||
| corev1.VolumeMount{ | ||
| Name: LivenessProbeVolumeName, | ||
| MountPath: ProbeMountPath + "/" + LivenessScriptKey, | ||
| SubPath: LivenessScriptKey, | ||
| }, | ||
| corev1.VolumeMount{ | ||
| Name: ReadinessProbeVolumeName, | ||
| MountPath: ProbeMountPath + "/" + ReadinessScriptKey, | ||
| SubPath: ReadinessScriptKey, | ||
| }, | ||
| corev1.VolumeMount{ | ||
| Name: StartupProbeVolumeName, | ||
| MountPath: ProbeMountPath + "/" + StartupScriptKey, | ||
| SubPath: StartupScriptKey, | ||
| }, | ||
| ) | ||
|
|
There was a problem hiding this comment.
could you please move all this to a helper function. The function is getting too big.
| // and can mount the probe script volumes without getting stuck in Pending. | ||
| // Skip generating a default when the user has pointed to their own ConfigMap. | ||
| if df.Spec.CustomLivenessProbeConfigMap == nil || df.Spec.CustomLivenessProbeConfigMap.Name == "" { | ||
| resources = append(resources, generateProbeConfigMap(df, LivenessProbeConfigMapSuffix, LivenessScriptKey, defaultLivenessScript)) |
There was a problem hiding this comment.
pass the exact name i.e. fmt.Sprintf("%s-%s,....) here.
| resources = append(resources, generateProbeConfigMap(df, LivenessProbeConfigMapSuffix, LivenessScriptKey, defaultLivenessScript)) | ||
| } | ||
| if df.Spec.CustomReadinessProbeConfigMap == nil || df.Spec.CustomReadinessProbeConfigMap.Name == "" { | ||
| resources = append(resources, generateProbeConfigMap(df, ReadinessProbeConfigMapSuffix, ReadinessScriptKey, defaultReadinessScript)) |
There was a problem hiding this comment.
pass the exact name here instead of the suffix.
| resources = append(resources, generateProbeConfigMap(df, ReadinessProbeConfigMapSuffix, ReadinessScriptKey, defaultReadinessScript)) | ||
| } | ||
| if df.Spec.CustomStartupProbeConfigMap == nil || df.Spec.CustomStartupProbeConfigMap.Name == "" { | ||
| resources = append(resources, generateProbeConfigMap(df, StartupProbeConfigMapSuffix, StartupScriptKey, defaultStartupScript)) |
There was a problem hiding this comment.
pass the exact name instead of the suffix.
| //+kubebuilder:rbac:groups="",resources=services,verbs=get;list;watch;create;update;patch;delete | ||
| //+kubebuilder:rbac:groups="",resources=pods,verbs=get;list;watch;create;update;patch;delete | ||
| //+kubebuilder:rbac:groups="",resources=events,verbs=create;patch | ||
| //+kubebuilder:rbac:groups="",resources=configmaps,verbs=get;list;watch;create;update;patch;delete |
There was a problem hiding this comment.
You also need to add .Owns(&corev1.ConfigMap{},...) in SetupWithManager function in the dragonfly_controller.go. Else, reconciler won't be triggered when configmaps are deleted/updated.
a6f1727 to
749ade0
Compare
|
Thanks for the review, addressed comments. As for the main concern, I've updated and tested the code - see gist, it should be no-op for existing users and only on CRD patch with opt-in for custom probes, pods would be recreated. Another gap I found with current CM-based approach - on accidental probe's ConfigMap deletion, referenced in Dragonfly CR spec - the probes break and pods start failing. So we could add finalizers like |
Problem:
Currently CRD doesn't provide a way to customise pod readiness, liveness checks and startup probes - they are hardcoded in the operator binary.
Following suggestions #401 (comment) and dragonflydb/dragonfly#5921 (comment) opening PR to introduce custom pod readiness, liveness checks and startup probes.
Proposed solution:
/etc/dragonfly/probes/customLivenessProbeConfigMap,customReadinessProbeConfigMap,customStartupProbeConfigMap//go:embed scripts/*.shand written into the generatedConfigMaps at reconcile time
Example:
custom
livenessprobe with connect timeout