- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.4k
Description
What would you like to be added (User Story)?
As operator, I want KubeadmConfig.spec.{files,preKubeadmCommands,...} changes to have an effect on MachinePool-creates nodes, resulting in server instance recreation.
Detailed Description
Discussed in office hours 2023-06-14 (notes, main points copied into this issue below).
Situation: MachinePool manifest references AWSMachinePool and KubeadmConfig (a very regular machine pool config)
Expectation: Changing KubeadmConfig.spec.* should lead to recreating (“rolling”) nodes. With infra provider CAPA, nothing happens at the moment. Here's why.
- 
Problem 1: CAPI’s KubeadmConfigReconcilerdoes not immediately update the bootstrap secret onceKubeadmConfig.specchanges, but only once it rotates the bootstrap token (purpose: new machine pool-created nodes can join the cluster later on). This means several minutes of waiting for reconciliation.- Suggestion: Simple bug fix. @AndiDog has a draft implementation that always considers updating the secret, not only if the token must be refreshed. In the meantime, users can work around by creating a new KubeadmConfig object.
 
- 
Problem 2: CAPA (and likely all other infra providers) does not watch the bootstrap secret, so it cannot immediately react to KubeadmConfig.specchanges either.- @AndiDog Should it even directly watch the secret? What should the CAPI ⇔ CAPx contract be?
- @fabriziopandini: Watching secrets can blow up memory [of the kubectl client]. Think of the UX and possible solutions first.
- @CecileRobertMichon: Maybe change MachinePool.spec.template.spec.bootstrap.dataSecretNameevery time because that triggers reconciliation for theMachinePoolobject (machinepool_controller_phases.go code).
- @sbueringer: For MachinePoolsupport inClusterClasswe have to decide what the “ideal” way to rollout BootstrapConfig is
 
- 
Problem 3: The bootstrap secret contains both the “how to set up this server init data” (e.g. cloud-init / ignition) and the random bootstrap token by which nodes join the cluster. If only the token gets refreshed ( DefaultTokenTTLis 15 minutes), we don’t want nodes to be recreated, since that would recreate all nodes every few minutes.- @AndiDog CAPA does not trigger a node rollout (AWS ASG instance refresh) right now (Updating MachinePool, AWSMachinePool, and KubeadmConfig resources does not trigger an ASG instanceRefresh cluster-api-provider-aws#4071) because it has no logic to detect when the bootstrap config changed. How can CAPA (or another infra provider) tell that something apart from the bootstrap token has changed? We have a CAPA PR (Trigger machine pool instance refresh also on user data change (unless it's only the bootstrap token) cluster-api-provider-aws#4245) that checks “is there a difference apart from only the bootstrap token value” but that is very hacky and format/provider-specific. Maybe a "checksum without bootstrap token" provided by CAPI could help the infra provider?
- @CecileRobertMichon: Split bootstrap token vs. other init data?
 
Anything else you would like to add?
- CAPA-specific issue: Updating MachinePool, AWSMachinePool, and KubeadmConfig resources does not trigger an ASG instanceRefresh cluster-api-provider-aws#4071. The problem applies to all infra providers, though.
- This issue asks for mutable KubeadmConfig, while another proposal wants to make it immutable even for machine pool (breaking change): Make KubeadmConfigTemplate immutable #4910.
Label(s) to be applied
/kind feature
/area bootstrap
/area machinepool