Skip to content

Commit f2280e9

Browse files
authored
feat(chart): probe checks for Distributor and Router (#2272)
1 parent dc9336d commit f2280e9

15 files changed

+360
-25
lines changed

charts/selenium-grid/README.md

Lines changed: 37 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,10 @@ This chart enables the creation of a Selenium Grid Server in Kubernetes.
2323
* [Configuration `global.K8S_PUBLIC_IP`](#configuration-globalk8s_public_ip)
2424
* [Configuration of Nodes](#configuration-of-nodes)
2525
* [Container ports and Service ports](#container-ports-and-service-ports)
26-
* [Probes](#probes)
26+
* [Configuration of Probes](#configuration-of-probes)
27+
* [Node Probes](#node-probes)
28+
* [Distributor Probes](#distributor-probes)
29+
* [Router Probes](#router-probes)
2730
* [Configuration extra scripts mount to container](#configuration-extra-scripts-mount-to-container)
2831
* [Configuration of video recorder and video uploader](#configuration-of-video-recorder-and-video-uploader)
2932
* [Video recorder](#video-recorder)
@@ -299,20 +302,20 @@ ingress-nginx:
299302
### Configuration global
300303
For now, global configuration supported is:
301304
302-
| Parameter | Default | Description |
303-
|------------------------------------------------|-------------------------|------------------------------------------|
304-
| `global.K8S_PUBLIC_IP` | `""` | Public IP of the host running K8s |
305-
| `global.seleniumGrid.imageRegistry` | `selenium` | Distribution registry to pull images |
306-
| `global.seleniumGrid.imageTag` | `4.21.0-20240522` | Image tag for all selenium components |
307-
| `global.seleniumGrid.nodesImageTag` | `4.21.0-20240522` | Image tag for browser's nodes |
308-
| `global.seleniumGrid.videoImageTag` | `ffmpeg-6.1.1-20240522` | Image tag for browser's video recorder |
309-
| `global.seleniumGrid.imagePullSecret` | `""` | Pull secret to be used for all images |
310-
| `global.seleniumGrid.imagePullSecret` | `""` | Pull secret to be used for all images |
311-
| `global.seleniumGrid.affinity` | `{}` | Affinity assigned globally |
312-
| `global.seleniumGrid.logLevel` | `INFO` | Set log level for all components |
313-
| `global.seleniumGrid.defaultNodeStartupProbe` | `exec` | Default startup probe method in Nodes |
314-
| `global.seleniumGrid.defaultNodeLivenessProbe` | `exec` | Default liveness probe method in Nodes |
315-
| `global.seleniumGrid.stdoutProbeLog` | `true` | Enable probe logs output in kubectl logs |
305+
| Parameter | Default | Description |
306+
|-----------------------------------------------------|-------------------------|---------------------------------------------|
307+
| `global.K8S_PUBLIC_IP` | `""` | Public IP of the host running K8s |
308+
| `global.seleniumGrid.imageRegistry` | `selenium` | Distribution registry to pull images |
309+
| `global.seleniumGrid.imageTag` | `4.21.0-20240522` | Image tag for all selenium components |
310+
| `global.seleniumGrid.nodesImageTag` | `4.21.0-20240522` | Image tag for browser's nodes |
311+
| `global.seleniumGrid.videoImageTag` | `ffmpeg-6.1.1-20240522` | Image tag for browser's video recorder |
312+
| `global.seleniumGrid.imagePullSecret` | `""` | Pull secret to be used for all images |
313+
| `global.seleniumGrid.affinity` | `{}` | Affinity assigned globally |
314+
| `global.seleniumGrid.logLevel` | `INFO` | Set log level for all components |
315+
| `global.seleniumGrid.defaultNodeStartupProbe` | `exec` | Default startup probe method in Nodes |
316+
| `global.seleniumGrid.defaultNodeLivenessProbe` | `exec` | Default liveness probe method in Nodes |
317+
| `global.seleniumGrid.defaultComponentLivenessProbe` | `exec` | Default liveness probe method in Components |
318+
| `global.seleniumGrid.stdoutProbeLog` | `true` | Enable probe logs output in kubectl logs |
316319

317320
#### Configuration `global.K8S_PUBLIC_IP`
318321

@@ -379,7 +382,9 @@ edgeNode:
379382
protocol: TCP
380383
```
381384

382-
#### Probes
385+
### Configuration of Probes
386+
387+
#### Node Probes
383388

384389
By default, `startupProbe` is enabled and `readinessProbe` and `livenessProbe` are disabled. You can enable/disable them via `.startupProbe.enabled` `.readinessProbe.enabled` `.livenessProbe.enabled` in respective node type.
385390

@@ -411,6 +416,22 @@ edgeNode:
411416
periodSeconds: 5
412417
```
413418
419+
#### Distributor Probes
420+
421+
By default, `startupProbe`, `readinessProbe` and `livenessProbe` are enabled for this component in both full distributed and Hub-Nodes mode.
422+
423+
There is a script in chart `configs/distributor/distributorProbe.sh` is loaded into ConfigMap and mounted to the container is used by `livenessProbe`. You can customize the script via `--set-file distributorConfigMap.extraScripts.distributorProbe\.sh=/path/to/your_script.sh` or set via YAML values.
424+
425+
There are some reports on a scenario that would be difficult to reproduce or rare: `Grid UI is accessible but no nodes can be fetched or registered. Or something like there are few requests in session queue but could not be accepted. After restarting the Distributor, the issue is resolved`. Based on that, a proactive approach to do automatic restart whenever detecting it is not healthy via `livenessProbe` and the condition check is executed. The script queries GraphQL endpoint to get `sessionCount`, and `sessionQueueSize`. If the `sessionQueueSize` is greater than 0 and `sessionCount` is 0 until the `failureThreshold`, the Distributor will be restarted. You can adjust the threshold as well as interval via probe settings.
426+
427+
#### Router Probes
428+
429+
By default, `startupProbe`, `readinessProbe` and `livenessProbe` are enabled for this component in full distributed mode.
430+
431+
There is a script in chart `configs/router/routerProbe.sh` loaded into ConfigMap and mounted to the container is used by `livenessProbe`. You can customize the script via `--set-file routerConfigMap.extraScripts.routerProbe\.sh=/path/to/your_script.sh` or set via YAML values.
432+
433+
The script checks GraphQL endpoint is reachable. If the `http_code` is not `200` until the `failureThreshold`, the Router will be restarted. You can adjust the threshold as well as interval via probe settings.
434+
414435
### Configuration extra scripts mount to container
415436

416437
This is supported for containers of browser node, video recorder and video uploader. By default, in these containers, there are scripts, config files implemented. In case you want to customize or replace them with your own implementation. Instead of forking the chart, use volume mount. Now, from your external files, you can insert them into ConfigMap via Helm CLI `--set-file` or compose them in your own YAML values file and pass to Helm CLI `--values` when deploying chart. Any files name that you defined will be picked up into ConfigMap and mounted to the container.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
#!/bin/bash
2+
3+
max_time=3
4+
retry_time=3
5+
probe_name="Probe.${1:-"Liveness"}"
6+
ts_format=${SE_LOG_TIMESTAMP_FORMAT:-"+%T.%3N"}
7+
8+
if [ -n "${ROUTER_USERNAME}" ] && [ -n "${ROUTER_PASSWORD}" ]; then
9+
BASIC_AUTH="${ROUTER_USERNAME}:${ROUTER_PASSWORD}@"
10+
fi
11+
12+
if [ -z "${SE_GRID_GRAPHQL_URL}" ] && [ -n "${SE_HUB_HOST:-${SE_ROUTER_HOST}}" ] && [ -n "${SE_HUB_PORT:-${SE_ROUTER_PORT}}" ]; then
13+
SE_GRID_GRAPHQL_URL="${SE_SERVER_PROTOCOL}://${BASIC_AUTH}${SE_HUB_HOST:-${SE_ROUTER_HOST}}:${SE_HUB_PORT:-${SE_ROUTER_PORT}}${SE_SUB_PATH}/graphql"
14+
elif [ -z "${SE_GRID_GRAPHQL_URL}" ]; then
15+
echo "$(date ${ts_format}) DEBUG [${probe_name}] - Could not construct GraphQL endpoint, it can be set directly via SE_GRID_GRAPHQL_URL. Bypass the probe checks for now."
16+
exit 0
17+
fi
18+
19+
GRAPHQL_PRE_CHECK=$(curl --noproxy "*" -m ${max_time} -k -X POST -H "Content-Type: application/json" --data '{"query":"{ grid { sessionCount } }"}' -s -o /dev/null -w "%{http_code}" ${SE_GRID_GRAPHQL_URL})
20+
21+
if [ ${GRAPHQL_PRE_CHECK} -ne 200 ]; then
22+
echo "$(date ${ts_format}) DEBUG [${probe_name}] - GraphQL endpoint ${SE_GRID_GRAPHQL_URL} is not reachable. Status code: ${GRAPHQL_PRE_CHECK}."
23+
exit 1
24+
fi
25+
26+
SESSION_QUEUE_SIZE=$(curl --noproxy "*" --retry ${retry_time} -m ${max_time} -k -X POST -H "Content-Type: application/json" --data '{"query":"{ grid { sessionQueueSize } }"}' -s ${SE_GRID_GRAPHQL_URL} | jq -r '.data.grid.sessionQueueSize')
27+
28+
SESSION_COUNT=$(curl --noproxy "*" --retry ${retry_time} -m ${max_time} -k -X POST -H "Content-Type: application/json" --data '{"query": "{ grid { sessionCount } }"}' -s ${SE_GRID_GRAPHQL_URL} | jq -r '.data.grid.sessionCount')
29+
30+
MAX_SESSION=$(curl --noproxy "*" --retry ${retry_time} -m ${max_time} -k -X POST -H "Content-Type: application/json" --data '{"query":"{ grid { maxSession } }"}' -s ${SE_GRID_GRAPHQL_URL} | jq -r '.data.grid.maxSession')
31+
32+
if [ ${SESSION_QUEUE_SIZE} -gt 0 ] && [ ${SESSION_COUNT} -eq 0 ]; then
33+
echo "$(date ${ts_format}) DEBUG [${probe_name}] - Session Queue Size: ${SESSION_QUEUE_SIZE}, Session Count: ${SESSION_COUNT}, Max Session: ${MAX_SESSION}"
34+
echo "$(date ${ts_format}) DEBUG [${probe_name}] - It seems the Distributor is delayed in processing a new session in the queue. Probe checks failed."
35+
exit 1
36+
else
37+
echo "$(date ${ts_format}) DEBUG [${probe_name}] - Distributor is healthy."
38+
exit 0
39+
fi
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/bin/bash
2+
3+
max_time=3
4+
retry_time=3
5+
probe_name="Probe.${1:-"Liveness"}"
6+
ts_format=${SE_LOG_TIMESTAMP_FORMAT:-"+%T.%3N"}
7+
8+
if [ -n "${ROUTER_USERNAME}" ] && [ -n "${ROUTER_PASSWORD}" ]; then
9+
BASIC_AUTH="${ROUTER_USERNAME}:${ROUTER_PASSWORD}@"
10+
fi
11+
12+
if [ -z "${SE_GRID_GRAPHQL_URL}" ] && [ -n "${SE_HUB_HOST:-${SE_ROUTER_HOST}}" ] && [ -n "${SE_HUB_PORT:-${SE_ROUTER_PORT}}" ]; then
13+
SE_GRID_GRAPHQL_URL="${SE_SERVER_PROTOCOL}://${BASIC_AUTH}${SE_HUB_HOST:-${SE_ROUTER_HOST}}:${SE_HUB_PORT:-${SE_ROUTER_PORT}}${SE_SUB_PATH}/graphql"
14+
elif [ -z "${SE_GRID_GRAPHQL_URL}" ]; then
15+
echo "$(date ${ts_format}) DEBUG [${probe_name}] - Could not construct GraphQL endpoint, it can be set directly via SE_GRID_GRAPHQL_URL. Bypass the probe checks for now."
16+
exit 0
17+
fi
18+
19+
GRAPHQL_PRE_CHECK=$(curl --noproxy "*" -m ${max_time} -k -X POST -H "Content-Type: application/json" --data '{"query":"{ grid { sessionCount } }"}' -s -o /dev/null -w "%{http_code}" ${SE_GRID_GRAPHQL_URL})
20+
21+
if [ ${GRAPHQL_PRE_CHECK} -ne 200 ]; then
22+
echo "$(date ${ts_format}) DEBUG [${probe_name}] - GraphQL endpoint ${SE_GRID_GRAPHQL_URL} is not reachable. Status code: ${GRAPHQL_PRE_CHECK}."
23+
exit 1
24+
else
25+
echo "$(date ${ts_format}) DEBUG [${probe_name}] - GraphQL endpoint is healthy."
26+
exit 0
27+
fi

charts/selenium-grid/templates/_nameHelpers.tpl

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,20 @@ Service Account fullname
154154
{{- tpl (default (include "seleniumGrid.component.name" (list "selenium-serviceaccount" $)) .Values.serviceAccount.nameOverride) $ | trunc 63 | trimSuffix "-" -}}
155155
{{- end -}}
156156

157+
{{/*
158+
Distributor ConfigMap fullname
159+
*/}}
160+
{{- define "seleniumGrid.distributor.configmap.fullname" -}}
161+
{{- tpl (default (include "seleniumGrid.component.name" (list "selenium-distributor-config" $)) .Values.distributorConfigMap.nameOverride) $ | trunc 63 | trimSuffix "-" -}}
162+
{{- end -}}
163+
164+
{{/*
165+
Router ConfigMap fullname
166+
*/}}
167+
{{- define "seleniumGrid.router.configmap.fullname" -}}
168+
{{- tpl (default (include "seleniumGrid.component.name" (list "selenium-router-config" $)) .Values.routerConfigMap.nameOverride) $ | trunc 63 | trimSuffix "-" -}}
169+
{{- end -}}
170+
157171
{{/*
158172
Recorder ConfigMap fullname
159173
*/}}
Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,31 @@
1+
apiVersion: v1
2+
kind: ConfigMap
3+
metadata:
4+
name: {{ template "seleniumGrid.distributor.configmap.fullname" $ }}
5+
namespace: {{ .Release.Namespace }}
6+
{{- with .Values.distributorConfigMap.annotations }}
7+
annotations: {{- toYaml . | nindent 4 }}
8+
{{- end }}
9+
labels:
10+
{{- include "seleniumGrid.commonLabels" . | nindent 4 }}
11+
{{- with .Values.customLabels }}
12+
{{- toYaml . | nindent 4 }}
13+
{{- end }}
14+
data:
15+
SE_GRID_GRAPHQL_URL: '{{ include "seleniumGrid.graphqlURL" $ }}'
16+
{{- $fileProceeded := list -}}
17+
{{- range $path, $_ := .Files.Glob $.Values.distributorConfigMap.extraScriptsImportFrom }}
18+
{{- $fileName := base $path -}}
19+
{{- $value := index $.Values.distributorConfigMap.extraScripts $fileName -}}
20+
{{- if empty $value }}
21+
{{- $fileName | nindent 2 -}}: {{- toYaml ($.Files.Get $path) | indent 4 }}
22+
{{- else }}
23+
{{- $fileName | nindent 2 -}}: {{- toYaml $value | indent 4 }}
24+
{{- end }}
25+
{{- $fileProceeded = append $fileProceeded $fileName -}}
26+
{{- end }}
27+
{{- range $fileName, $value := .Values.distributorConfigMap.extraScripts }}
28+
{{- if not (has $fileName $fileProceeded) }}
29+
{{- $fileName | nindent 2 -}}: {{- toYaml (default "" $value) | indent 4 }}
30+
{{- end }}
31+
{{- end }}

charts/selenium-grid/templates/distributor-deployment.yaml

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ spec:
2424
checksum/event-bus-configmap: {{ include (print $.Template.BasePath "/event-bus-configmap.yaml") . | sha256sum }}
2525
checksum/logging-configmap: {{ include (print $.Template.BasePath "/logging-configmap.yaml") . | sha256sum }}
2626
checksum/server-configmap: {{ include (print $.Template.BasePath "/server-configmap.yaml") . | sha256sum }}
27+
checksum/distributor-configmap: {{ include (print $.Template.BasePath "/distributor-configmap.yaml") . | sha256sum }}
2728
checksum/secrets: {{ include (print $.Template.BasePath "/secrets.yaml") . | sha256sum }}
2829
{{- with .Values.components.distributor.annotations }}
2930
{{- toYaml . | nindent 8 }}
@@ -63,6 +64,8 @@ spec:
6364
{{- tpl (toYaml .) $ | nindent 12 }}
6465
{{- end }}
6566
envFrom:
67+
- configMapRef:
68+
name: {{ template "seleniumGrid.distributor.configmap.fullname" . }}
6669
- configMapRef:
6770
name: {{ template "seleniumGrid.eventBus.configmap.fullname" . }}
6871
- configMapRef:
@@ -75,6 +78,11 @@ spec:
7578
{{- toYaml . | nindent 12 }}
7679
{{- end }}
7780
volumeMounts:
81+
{{- range $fileName, $value := $.Values.distributorConfigMap.extraScripts }}
82+
- name: {{ tpl (default (include "seleniumGrid.distributor.configmap.fullname" $) $.Values.distributorConfigMap.scriptVolumeMountName) $ | quote }}
83+
mountPath: {{ $.Values.distributorConfigMap.extraScriptsDirectory }}/{{ $fileName }}
84+
subPath: {{ $fileName }}
85+
{{- end }}
7886
{{- if .Values.tls.enabled }}
7987
- name: {{ include "seleniumGrid.tls.fullname" . | quote }}
8088
mountPath: {{ .Values.serverConfigMap.certVolumeMountPath | quote }}
@@ -83,6 +91,57 @@ spec:
8391
ports:
8492
- containerPort: {{ .Values.components.distributor.port }}
8593
protocol: TCP
94+
{{- if .Values.components.distributor.startupProbe.enabled }}
95+
{{- with .Values.components.distributor.startupProbe }}
96+
startupProbe:
97+
{{- if (ne (include "seleniumGrid.probe.fromUserDefine" (dict "values" . "root" $)) "{}") }}
98+
{{- include "seleniumGrid.probe.fromUserDefine" (dict "values" . "root" $) | nindent 10 }}
99+
{{- else }}
100+
httpGet:
101+
scheme: {{ default (include "seleniumGrid.probe.httpGet.schema" $) .schema }}
102+
path: {{ .path }}
103+
port: {{ default ($.Values.components.distributor.port) .port }}
104+
{{- end }}
105+
{{- if (ne (include "seleniumGrid.probe.settings" .) "{}") }}
106+
{{- include "seleniumGrid.probe.settings" . | nindent 12 }}
107+
{{- end }}
108+
{{- end }}
109+
{{- end }}
110+
{{- if .Values.components.distributor.readinessProbe.enabled }}
111+
{{- with .Values.components.distributor.readinessProbe }}
112+
readinessProbe:
113+
{{- if (ne (include "seleniumGrid.probe.fromUserDefine" (dict "values" . "root" $)) "{}") }}
114+
{{- include "seleniumGrid.probe.fromUserDefine" (dict "values" . "root" $) | nindent 10 }}
115+
{{- else }}
116+
httpGet:
117+
scheme: {{ default (include "seleniumGrid.probe.httpGet.schema" $) .schema }}
118+
path: {{ .path }}
119+
port: {{ default ($.Values.components.distributor.port) .port }}
120+
{{- end }}
121+
{{- if (ne (include "seleniumGrid.probe.settings" .) "{}") }}
122+
{{- include "seleniumGrid.probe.settings" . | nindent 12 }}
123+
{{- end }}
124+
{{- end }}
125+
{{- end }}
126+
{{- if .Values.components.distributor.livenessProbe.enabled }}
127+
{{- with .Values.components.distributor.livenessProbe }}
128+
livenessProbe:
129+
{{- if (ne (include "seleniumGrid.probe.fromUserDefine" (dict "values" . "root" $)) "{}") }}
130+
{{- include "seleniumGrid.probe.fromUserDefine" (dict "values" . "root" $) | nindent 10 }}
131+
{{- else if eq $.Values.global.seleniumGrid.defaultComponentLivenessProbe "exec" }}
132+
exec:
133+
command: ["bash", "-c", "{{ $.Values.distributorConfigMap.extraScriptsDirectory }}/distributorProbe.sh Liveness {{ include "seleniumGrid.probe.stdout" $ }}"]
134+
{{- else }}
135+
httpGet:
136+
scheme: {{ default (include "seleniumGrid.probe.httpGet.schema" $) .schema }}
137+
path: {{ .path }}
138+
port: {{ default ($.Values.components.distributor.port) .port }}
139+
{{- end }}
140+
{{- if (ne (include "seleniumGrid.probe.settings" .) "{}") }}
141+
{{- include "seleniumGrid.probe.settings" . | nindent 12 }}
142+
{{- end }}
143+
{{- end }}
144+
{{- end }}
86145
{{- with .Values.components.distributor.resources }}
87146
resources: {{- toYaml . | nindent 12 }}
88147
{{- end }}
@@ -107,6 +166,10 @@ spec:
107166
priorityClassName: {{ . }}
108167
{{- end }}
109168
volumes:
169+
- name: {{ tpl (default (include "seleniumGrid.distributor.configmap.fullname" $) $.Values.distributorConfigMap.scriptVolumeMountName) $ | quote }}
170+
configMap:
171+
name: {{ template "seleniumGrid.distributor.configmap.fullname" $ }}
172+
defaultMode: {{ $.Values.distributorConfigMap.defaultMode }}
110173
{{- if .Values.tls.enabled }}
111174
- name: {{ include "seleniumGrid.tls.fullname" . | quote }}
112175
secret:

0 commit comments

Comments
 (0)