Skip to content

Commit 697229b

Browse files
jiangzhopeter-toth
authored andcommitted
[SPARK-54328] Add configurable startupProbe and enhance liveness/readiness probes in Helm chart
### What changes were proposed in this pull request? This PR adds a configurable `startupProbe` and enhances the existing `livenessProbe` and `readinessProbe` with additional configurable parameters in the Helm chart for the `spark-kubernetes-operator`. ### Why are the changes needed? Previously, these values were either using Kubernetes defaults or not configured at all. This change makes them explicitly configurable via Helm values, giving operators more control over pod lifecycle management in different cluster environments (small clusters vs large production clusters). ### Does this PR introduce _any_ user-facing change? Yes. Users can now configure these probe settings in their `values.yaml`. ### How was this patch tested? E2E coverage for default value, and local dry-run for value overrides. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #417 from jiangzho/probe_override. Authored-by: Zhou JIANG <[email protected]> Signed-off-by: Peter Toth <[email protected]>
1 parent b324d82 commit 697229b

File tree

3 files changed

+43
-2
lines changed

3 files changed

+43
-2
lines changed

build-tools/helm/spark-kubernetes-operator/templates/_helpers.tpl

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -125,10 +125,13 @@ spark.kubernetes.operator.watchedNamespaces={{ include "spark-operator.workloadN
125125
Readiness Probe properties overrides
126126
*/}}
127127
{{- define "spark-operator.readinessProbe.failureThreshold" -}}
128-
{{- default 30 .Values.operatorDeployment.operatorPod.operatorContainer.probes.startupProbe.failureThreshold }}
128+
{{- default 30 .Values.operatorDeployment.operatorPod.operatorContainer.probes.readinessProbe.failureThreshold }}
129129
{{- end }}
130130
{{- define "spark-operator.readinessProbe.periodSeconds" -}}
131-
{{- default 10 .Values.operatorDeployment.operatorPod.operatorContainer.probes.startupProbe.periodSeconds }}
131+
{{- default 10 .Values.operatorDeployment.operatorPod.operatorContainer.probes.readinessProbe.periodSeconds }}
132+
{{- end }}
133+
{{- define "spark-operator.readinessProbe.timeoutSeconds" -}}
134+
{{- default 1 .Values.operatorDeployment.operatorPod.operatorContainer.probes.readinessProbe.timeoutSeconds }}
132135
{{- end }}
133136

134137
{{/*
@@ -140,6 +143,28 @@ Liveness Probe properties override
140143
{{- define "spark-operator.livenessProbe.periodSeconds" -}}
141144
{{- default 10 .Values.operatorDeployment.operatorPod.operatorContainer.probes.livenessProbe.periodSeconds }}
142145
{{- end }}
146+
{{- define "spark-operator.livenessProbe.failureThreshold" -}}
147+
{{- default 1 .Values.operatorDeployment.operatorPod.operatorContainer.probes.livenessProbe.failureThreshold }}
148+
{{- end }}
149+
{{- define "spark-operator.livenessProbe.timeoutSeconds" -}}
150+
{{- default 1 .Values.operatorDeployment.operatorPod.operatorContainer.probes.livenessProbe.timeoutSeconds }}
151+
{{- end }}
152+
153+
{{/*
154+
Startup Probe properties override
155+
*/}}
156+
{{- define "spark-operator.startupProbe.initialDelaySeconds" -}}
157+
{{- default 0 .Values.operatorDeployment.operatorPod.operatorContainer.probes.startupProbe.initialDelaySeconds }}
158+
{{- end }}
159+
{{- define "spark-operator.startupProbe.failureThreshold" -}}
160+
{{- default 30 .Values.operatorDeployment.operatorPod.operatorContainer.probes.startupProbe.failureThreshold }}
161+
{{- end }}
162+
{{- define "spark-operator.startupProbe.periodSeconds" -}}
163+
{{- default 10 .Values.operatorDeployment.operatorPod.operatorContainer.probes.startupProbe.periodSeconds }}
164+
{{- end }}
165+
{{- define "spark-operator.startupProbe.timeoutSeconds" -}}
166+
{{- default 1 .Values.operatorDeployment.operatorPod.operatorContainer.probes.startupProbe.timeoutSeconds }}
167+
{{- end }}
143168

144169
{{/*
145170
Readiness Probe property overrides

build-tools/helm/spark-kubernetes-operator/templates/spark-operator.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -116,18 +116,29 @@ spec:
116116
resources:
117117
{{- toYaml . | nindent 12 }}
118118
{{- end }}
119+
startupProbe:
120+
httpGet:
121+
port: probe-port
122+
path: /healthz
123+
initialDelaySeconds: {{ include "spark-operator.startupProbe.initialDelaySeconds" . }}
124+
failureThreshold: {{ include "spark-operator.startupProbe.failureThreshold" . }}
125+
periodSeconds: {{ include "spark-operator.startupProbe.periodSeconds" . }}
126+
timeoutSeconds: {{ include "spark-operator.startupProbe.timeoutSeconds" . }}
119127
readinessProbe:
120128
httpGet:
121129
port: probe-port
122130
path: /readyz
123131
failureThreshold: {{ include "spark-operator.readinessProbe.failureThreshold" . }}
124132
periodSeconds: {{ include "spark-operator.readinessProbe.periodSeconds" . }}
133+
timeoutSeconds: {{ include "spark-operator.readinessProbe.timeoutSeconds" . }}
125134
livenessProbe:
126135
httpGet:
127136
port: probe-port
128137
path: /healthz
129138
initialDelaySeconds: {{ include "spark-operator.livenessProbe.initialDelaySeconds" . }}
130139
periodSeconds: {{ include "spark-operator.livenessProbe.periodSeconds" . }}
140+
failureThreshold: {{ include "spark-operator.livenessProbe.failureThreshold" . }}
141+
timeoutSeconds: {{ include "spark-operator.livenessProbe.timeoutSeconds" . }}
131142
{{- with .Values.operatorDeployment.operatorPod.operatorContainer.securityContext }}
132143
securityContext:
133144
{{- toYaml . | nindent 12 }}

build-tools/helm/spark-kubernetes-operator/values.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,14 @@ operatorDeployment:
6161
livenessProbe:
6262
periodSeconds: 10
6363
initialDelaySeconds: 30
64+
failureThreshold: 1
65+
timeoutSeconds: 1
6466
startupProbe:
6567
failureThreshold: 30
6668
periodSeconds: 10
69+
readinessProbe:
70+
failureThreshold: 1
71+
periodSeconds: 10
6772
metrics:
6873
port: 19090
6974
# By default, operator container is configured to comply restricted standard

0 commit comments

Comments
 (0)