Skip to content

KubeadmConfig changes should be reconciled for machine pools, triggering instance recreation #8858

@AndiDog

Description

@AndiDog

What would you like to be added (User Story)?

As operator, I want KubeadmConfig.spec.{files,preKubeadmCommands,...} changes to have an effect on MachinePool-creates nodes, resulting in server instance recreation.

Detailed Description

Discussed in office hours 2023-06-14 (notes, main points copied into this issue below).

Situation: MachinePool manifest references AWSMachinePool and KubeadmConfig (a very regular machine pool config)

Expectation: Changing KubeadmConfig.spec.* should lead to recreating (“rolling”) nodes. With infra provider CAPA, nothing happens at the moment. Here's why.

  • Problem 1: CAPI’s KubeadmConfigReconciler does not immediately update the bootstrap secret once KubeadmConfig.spec changes, but only once it rotates the bootstrap token (purpose: new machine pool-created nodes can join the cluster later on). This means several minutes of waiting for reconciliation.

    • Suggestion: Simple bug fix. @AndiDog has a draft implementation that always considers updating the secret, not only if the token must be refreshed. In the meantime, users can work around by creating a new KubeadmConfig object.
  • Problem 2: CAPA (and likely all other infra providers) does not watch the bootstrap secret, so it cannot immediately react to KubeadmConfig.spec changes either.

    • @AndiDog Should it even directly watch the secret? What should the CAPI ⇔ CAPx contract be?
    • @fabriziopandini: Watching secrets can blow up memory [of the kubectl client]. Think of the UX and possible solutions first.
    • @CecileRobertMichon: Maybe change MachinePool.spec.template.spec.bootstrap.dataSecretName every time because that triggers reconciliation for the MachinePool object (machinepool_controller_phases.go code).
    • @sbueringer: For MachinePool support in ClusterClass we have to decide what the “ideal” way to rollout BootstrapConfig is
  • Problem 3: The bootstrap secret contains both the “how to set up this server init data” (e.g. cloud-init / ignition) and the random bootstrap token by which nodes join the cluster. If only the token gets refreshed (DefaultTokenTTL is 15 minutes), we don’t want nodes to be recreated, since that would recreate all nodes every few minutes.

Anything else you would like to add?

Label(s) to be applied

/kind feature
/area bootstrap
/area machinepool

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/bootstrapIssues or PRs related to bootstrap providersarea/machinepoolIssues or PRs related to machinepoolskind/featureCategorizes issue or PR as related to a new feature.triage/acceptedIndicates an issue or PR is ready to be actively worked on.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions