Skip to content

StatefulSet restarter always restarts replica 0 immediately after initial rollout #111

@siegfriedweber

Description

@siegfriedweber

Current behaviour

New restarter-enabled StatefulSet have their replica 0 restarted after the initial rollout is complete.

Expected Behaviour

The initial rollout should be completed "normally" with no extra restarts.

Why does this happen?

There is a race condition between Kubernetes' StatefulSet controller creating the first replica Pod and commons' StatefulSet restart controller adding the restart trigger labels. If the restarter loses the race then the first replica is created without the metadata, triggering a restart once it is added.

What can we do about it?

Add a mutating webhook (see the spike) that adds the relevant metadata. The webhook must not replace the existing controller, since webhook delivery is not reliable.

However, webhook delivery requires a bunch of extra infrastructure that we do not currently have, namely:

  1. We must implement the K8s webhook HTTPS API (https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/)
  2. We must generate a TLS certificate and write that into the MWC

Definition of done

  • Webhook certificate is managed (provisioned and renewed) by the commons operator
  • The Webhook should apply the same podTemplate annotations as the controller currently does
    • Initial STS rollout does not cause a restart (STS.metadata.generation should stay 1)
  • Controller must still apply metadata if the webhook is disabled and/or fails (at the cost of still doing the extra restart in this case)
  • The webhook must fail open (failurePolicy: Ignore)
  • A Kuttl test should verify the above (maybe minus the failurePolicy)
Original ticket The StatefulSet of a Superset cluster is immediately restarted after its creation. This should not be necessary and should be prevented.
$ kubectl describe statefulset simple-superset-node-default
...
Events:
  Type    Reason            Age                    From                    Message
  ----    ------            ----                   ----                    -------
  Normal  SuccessfulDelete  3m31s                  statefulset-controller  delete Pod simple-superset-node-default-0 in StatefulSet simple-superset-node-default successful
  Normal  SuccessfulCreate  2m59s (x2 over 3m31s)  statefulset-controller  create Pod simple-superset-node-default-0 in StatefulSet simple-superset-node-default successful

After the restart, the Superset pods are annotated as follows:

annotations:
  configmap.restarter.stackable.tech/simple-superset-node-default: cf60300e-0c45-4ee2-b60c-de53b0084182/21998
  secret.restarter.stackable.tech/simple-superset-credentials: e0e4b781-46f9-44f4-80c8-a0876b91ed8b/16909

This could be an indication that the restart controller of the commons-operator is involved.

The commons-operator is busy while the StatefulSet is restarted:

2022-09-16T10:02:50.228971Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.236144Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.239371Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:50.255219Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:50.258511Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.266146Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:50.274647Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default
2022-09-16T10:02:51.433621Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="pod.restarter.commons.stackable.tech" object=Pod.v1./simple-superset-node-default-0.default
2022-09-16T10:02:51.449009Z  INFO stackable_operator::logging::controller: Reconciled object controller.name="statefulset.restarter.commons.stackable.tech" object=StatefulSet.v1.apps/simple-superset-node-default.default

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions