-
Notifications
You must be signed in to change notification settings - Fork 53
Description
Hello,
it appears that the LAPI probes are not configurable, as they are hardcoded in the template without conditionals or values apart from the TLS setting:
helm-charts/charts/crowdsec/templates/lapi-deployment.yaml
Lines 119 to 157 in 7b556d5
| livenessProbe: | |
| failureThreshold: 3 | |
| periodSeconds: 10 | |
| successThreshold: 1 | |
| timeoutSeconds: 5 | |
| httpGet: | |
| path: /health | |
| port: lapi | |
| {{- if .Values.tls.enabled }} | |
| scheme: HTTPS | |
| {{- else }} | |
| scheme: HTTP | |
| {{- end }} | |
| readinessProbe: | |
| failureThreshold: 3 | |
| periodSeconds: 10 | |
| successThreshold: 1 | |
| timeoutSeconds: 5 | |
| httpGet: | |
| path: /health | |
| port: lapi | |
| {{- if .Values.tls.enabled }} | |
| scheme: HTTPS | |
| {{- else }} | |
| scheme: HTTP | |
| {{- end }} | |
| startupProbe: | |
| failureThreshold: 30 | |
| periodSeconds: 10 | |
| successThreshold: 1 | |
| timeoutSeconds: 5 | |
| httpGet: | |
| path: /health | |
| port: lapi | |
| {{- if .Values.tls.enabled }} | |
| scheme: HTTPS | |
| {{- else }} | |
| scheme: HTTP | |
| {{- end }} |
The default configuration is also problematic because it doesn't give the time for the api to be considered unavailable and it gets killed immediately. It would be a better practice to have the threshold of the liveness probe higher than the readiness one: from the docs:
A common pattern for liveness probes is to use the same low-cost HTTP endpoint as for readiness probes, but with a higher failureThreshold. This ensures that the pod is observed as not-ready for some period of time before it is hard killed.
More in general, a liveness probe is not necessarily a good idea, unless an application is known to be prone to deadlock or some other unrecoverable state requiring a restart. Try to DDG for why is a livenessProbe a bad idea for some interesting reading.
In our use case, we need to disable the liveness probe because we are trying to investigate a network issue in the cluster. When the LAPI is killed, it stays for a long time in Terminating state (in the minutes - we will try look into why) and, because it uses ReadWriteOnce PVCs, it stops a new pod to start, causing an unnecessarily long downtime.
So, IMO, the LAPI probes should be configurable - as it is in the agent deployment by the way!
helm-charts/charts/crowdsec/templates/agent-deployment.yaml
Lines 162 to 174 in 7b556d5
| {{ if .Values.agent.livenessProbe }} | |
| livenessProbe: | |
| {{ toYaml .Values.agent.livenessProbe | indent 10 }} | |
| {{ end }} | |
| {{ if .Values.agent.readinessProbe }} | |
| readinessProbe: | |
| {{ toYaml .Values.agent.readinessProbe | indent 10 }} | |
| {{ end }} | |
| {{ if .Values.agent.startupProbe }} | |
| startupProbe: | |
| {{ toYaml .Values.agent.startupProbe | indent 10 }} | |
| {{ end }} | |
| {{ end }} |
If you are open to it I can try and propose a PR.
Thank you very much!