Skip to content

Make the health probes configurable #302

@dvarrazzo

Description

@dvarrazzo

Hello,

it appears that the LAPI probes are not configurable, as they are hardcoded in the template without conditionals or values apart from the TLS setting:

livenessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
httpGet:
path: /health
port: lapi
{{- if .Values.tls.enabled }}
scheme: HTTPS
{{- else }}
scheme: HTTP
{{- end }}
readinessProbe:
failureThreshold: 3
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
httpGet:
path: /health
port: lapi
{{- if .Values.tls.enabled }}
scheme: HTTPS
{{- else }}
scheme: HTTP
{{- end }}
startupProbe:
failureThreshold: 30
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
httpGet:
path: /health
port: lapi
{{- if .Values.tls.enabled }}
scheme: HTTPS
{{- else }}
scheme: HTTP
{{- end }}

The default configuration is also problematic because it doesn't give the time for the api to be considered unavailable and it gets killed immediately. It would be a better practice to have the threshold of the liveness probe higher than the readiness one: from the docs:

A common pattern for liveness probes is to use the same low-cost HTTP endpoint as for readiness probes, but with a higher failureThreshold. This ensures that the pod is observed as not-ready for some period of time before it is hard killed.

More in general, a liveness probe is not necessarily a good idea, unless an application is known to be prone to deadlock or some other unrecoverable state requiring a restart. Try to DDG for why is a livenessProbe a bad idea for some interesting reading.

In our use case, we need to disable the liveness probe because we are trying to investigate a network issue in the cluster. When the LAPI is killed, it stays for a long time in Terminating state (in the minutes - we will try look into why) and, because it uses ReadWriteOnce PVCs, it stops a new pod to start, causing an unnecessarily long downtime.

So, IMO, the LAPI probes should be configurable - as it is in the agent deployment by the way!

{{ if .Values.agent.livenessProbe }}
livenessProbe:
{{ toYaml .Values.agent.livenessProbe | indent 10 }}
{{ end }}
{{ if .Values.agent.readinessProbe }}
readinessProbe:
{{ toYaml .Values.agent.readinessProbe | indent 10 }}
{{ end }}
{{ if .Values.agent.startupProbe }}
startupProbe:
{{ toYaml .Values.agent.startupProbe | indent 10 }}
{{ end }}
{{ end }}

If you are open to it I can try and propose a PR.

Thank you very much!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions