-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Current behaviour
New restarter-enabled StatefulSet have their replica 0 restarted after the initial rollout is complete.
Expected Behaviour
The initial rollout should be completed "normally" with no extra restarts.
Why does this happen?
There is a race condition between Kubernetes' StatefulSet controller creating the first replica Pod and commons' StatefulSet restart controller adding the restart trigger labels. If the restarter loses the race then the first replica is created without the metadata, triggering a restart once it is added.
What can we do about it?
Add a mutating webhook (see the spike) that adds the relevant metadata. The webhook must not replace the existing controller, since webhook delivery is not reliable.
However, webhook delivery requires a bunch of extra infrastructure that we do not currently have, namely:
- We must implement the K8s webhook HTTPS API (https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/)
- We must generate a TLS certificate and write that into the MWC
Definition of done
- Webhook certificate is managed (provisioned and renewed) by the commons operator
- The Webhook should apply the same
podTemplate
annotations as the controller currently does- Initial STS rollout does not cause a restart (
STS.metadata.generation
should stay1
)
- Initial STS rollout does not cause a restart (
- Controller must still apply metadata if the webhook is disabled and/or fails (at the cost of still doing the extra restart in this case)
- The webhook must fail open (
failurePolicy: Ignore
) - A Kuttl test should verify the above (maybe minus the
failurePolicy
)
Original ticket
The StatefulSet of a Superset cluster is immediately restarted after its creation. This should not be necessary and should be prevented.$ kubectl describe statefulset simple-superset-node-default
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulDelete 3m31s statefulset-controller delete Pod simple-superset-node-default-0 in StatefulSet simple-superset-node-default successful
Normal SuccessfulCreate 2m59s (x2 over 3m31s) statefulset-controller create Pod simple-superset-node-default-0 in StatefulSet simple-superset-node-default successful
After the restart, the Superset pods are annotated as follows:
annotations:
configmap.restarter.stackable.tech/simple-superset-node-default: cf60300e-0c45-4ee2-b60c-de53b0084182/21998
secret.restarter.stackable.tech/simple-superset-credentials: e0e4b781-46f9-44f4-80c8-a0876b91ed8b/16909
This could be an indication that the restart controller of the commons-operator is involved.
The commons-operator is busy while the StatefulSet is restarted:
2022-09-16T10:02:50.228971Z INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.236144Z INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.239371Z INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:50.255219Z INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:50.258511Z INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.266146Z INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.274647Z INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:51.433621Z INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:51.449009Z INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default