diff --git a/docs/proposals/20240807-in-place-updates-implementation-notes.md b/docs/proposals/20240807-in-place-updates-implementation-notes.md index fb9c61e43cd8..2c01054cd4f6 100644 --- a/docs/proposals/20240807-in-place-updates-implementation-notes.md +++ b/docs/proposals/20240807-in-place-updates-implementation-notes.md @@ -86,3 +86,173 @@ sequenceDiagram MS2 (NewMS)-->>MS Controller: Yes, M1! MS Controller->>M1: Remove annotation ".../pending-acknowledge-move": "" ``` + +## Notes about in-place update implementation for KubeadmControlPlane + +- In-place updates respect the existing control plane update strategy: + - KCP controller uses `rollingUpdate` strategy with `maxSurge` (0 or 1) + - When `maxSurge` is 0, no new machines are created during rollout - only in-place updates or scale down + - When `maxSurge` is 1: + - Controller first scales up by creating one new machine to maximize fault tolerance + - Once at `maxReplicas` (desiredReplicas + 1), evaluates whether to in-place update or scale down old machines + - For each old machine needing rollout: if eligible for in-place update, performs in-place; otherwise scales down + - This pattern repeats until all machines are up-to-date, then scales back to desired replica count + +- The implementation respects the existing set of responsibilities: + - KCP controller manages control plane Machines directly + - KCP controller enforces `maxSurge` limits during rolling updates + - KCP controller decides when to scale up, scale down, or perform in-place updates + - KCP controller runs preflight checks to ensure control plane is stable before in-place updates + - KCP controller calls `CanUpdateMachine` hook to verify if extensions can handle the changes + - When in-place update is possible, KCP controller triggers the update by writing desired state + +- The in-place update decision flow: + - If `currentReplicas < maxReplicas` (desiredReplicas + maxSurge), scale up first to maximize fault tolerance + - If `currentReplicas >= maxReplicas`, select a machine needing rollout and evaluate options: + - Check if selected Machine is eligible for in-place update (determined by `UpToDate` function) + - Check if we already have enough up-to-date replicas (if `currentUpToDateReplicas >= desiredReplicas`, skip in-place and scale down) + - Run preflight checks to ensure control plane stability + - Call `CanUpdateMachine` hook on registered runtime extensions + - If all checks pass, trigger in-place update; otherwise, fall back to scale down/recreate + - This flow repeats on each reconciliation until all machines are up-to-date + +- Orchestration of in-place updates uses two key annotations: + - `in-place-updates.internal.cluster.x-k8s.io/update-in-progress` - Marks Machine as undergoing in-place update + - `runtime.cluster.x-k8s.io/pending-hooks` - Tracks pending `UpdateMachine` runtime hook + +Following schemas provide an overview of the in-place update workflow for KCP. + +Workflow #1: KCP controller determines that a Machine can be updated in-place and triggers the update. + +```mermaid +sequenceDiagram + autonumber + participant KCP Controller + participant RX as Runtime Extension + participant M1 as Machine + participant IM1 as InfraMachine + participant KC1 as KubeadmConfig + + KCP Controller->>KCP Controller: Select Machine for rollout + KCP Controller->>KCP Controller: Run preflight checks on control plane + KCP Controller->>RX: CanUpdateMachine(current, desired)? + RX-->>KCP Controller: Yes, with patches to indicate supported changes + + KCP Controller->>M1: Set annotation "update-in-progress": "" + KCP Controller->>IM1: Apply desired InfraMachine spec
Set annotation "update-in-progress": "" + KCP Controller->>KC1: Apply desired KubeadmConfig spec
Set annotation "update-in-progress": "" + KCP Controller->>M1: Apply desired Machine spec
Set annotation "pending-hooks": "UpdateMachine" +``` + +Workflow #2: Machine controller detects the pending `UpdateMachine` hook and calls the runtime extension to perform the update. + +```mermaid +sequenceDiagram + autonumber + participant Machine Controller + participant RX as Runtime Extension + participant M1 as Machine + participant IM1 as InfraMachine + participant KC1 as KubeadmConfig + + Machine Controller-->>M1: Has "update-in-progress" and "pending-hooks: UpdateMachine"? + M1-->>Machine Controller: Yes! + + Machine Controller->>RX: UpdateMachine(desired state) + RX-->>Machine Controller: Status: InProgress, RetryAfterSeconds: 30 + + Note over Machine Controller: Wait and retry + + Machine Controller->>RX: UpdateMachine(desired state) + RX-->>Machine Controller: Status: Done + + Machine Controller->>IM1: Remove annotation "update-in-progress" + Machine Controller->>KC1: Remove annotation "update-in-progress" + Machine Controller->>M1: Remove annotation "update-in-progress"
Remove "UpdateMachine" from "pending-hooks" +``` + +Workflow #3: KCP controller waits for in-place update to complete before proceeding with further operations. + +```mermaid +sequenceDiagram + autonumber + participant KCP Controller + participant M1 as Machine + + KCP Controller-->>M1: Is in-place update in progress? + M1-->>KCP Controller: Yes! ("update-in-progress" or "pending-hooks: UpdateMachine") + + Note over KCP Controller: Wait for update to complete
Requeue on Machine changes + + KCP Controller-->>M1: Is in-place update in progress? + M1-->>KCP Controller: No! (annotations removed) + + Note over KCP Controller: Continue with next Machine rollout or other operations +``` + +## Notes about managedFields refactoring for in-place updates (KCP/MS) + +To enable correct in-place updates of BootstrapConfigs and InfraMachines, CAPI v1.12 introduced a refactored managedFields structure. This change was necessary because: + +- **Previously** (CAPI <= v1.11): BootstrapConfigs/InfraMachines were only created, never updated +- **Now** (CAPI >= v1.12): BootstrapConfigs/InfraMachines need to be updated during in-place updates using Server-Side Apply (SSA) +- **Why SSA**: Required for proper handling of co-ownership of fields and to enable unsetting fields during updates + +### Two field managers approach + +The refactoring uses **two separate field managers** to enable different responsibilities: + +1. **Metadata manager** (`capi-kubeadmcontrolplane-metadata` / `capi-machineset-metadata`): + - Continuously syncs labels and annotations + - Updates on every reconciliation via `syncMachines` + +2. **Spec manager** (`capi-kubeadmcontrolplane` / `capi-machineset`): + - Manages the spec and in-place update specific annotations + - Updates only when creating objects or triggering in-place updates + +### ManagedFields structure comparison + +**CAPI <= v1.11** (legacy): +- Machine: + - spec + labels + annotations => `capi-kubeadmcontrolplane` / `capi-machineset` (Apply) +- BootstrapConfig / InfraMachine: + - labels + annotations => `capi-kubeadmcontrolplane` / `capi-machineset` (Apply) + - spec => `manager` (Update) + +**CAPI >= v1.12** (new): +- Machine (unchanged): + - spec + labels + annotations => `capi-kubeadmcontrolplane` / `capi-machineset` (Apply) +- BootstrapConfig / InfraMachine: + - labels + annotations => `capi-kubeadmcontrolplane-metadata` / `capi-machineset-metadata` (Apply) + - spec => `capi-kubeadmcontrolplane` / `capi-machineset` (Apply) + +### Object creation workflow (CAPI >= v1.12) + +When creating new BootstrapConfig/InfraMachine: + +1. **Initial creation**: + - Apply BootstrapConfig/InfraMachine with spec (manager: `capi-kubeadmcontrolplane` / `capi-machineset`) + - Remove managedFields for labels + annotations + - Result: labels/annotations are orphaned, spec is owned + +2. **First syncMachines call** (happens immediately after): + - Apply labels + annotations (manager: `capi-kubeadmcontrolplane-metadata` / `capi-machineset-metadata`) + - Result: Final desired managedField structure is established + +3. **Ready for operations**: + - Continuous `syncMachines` calls update labels/annotations without affecting spec + - In-place updates can now properly update spec fields and unset fields as needed + +### In-place update object modifications + +When triggering in-place updates: + +1. Apply BootstrapConfig/InfraMachine with: + - Updated spec (owned by spec manager) + - `update-in-progress` annotation (owned by spec manager) + - For InfraMachine: `cloned-from` annotations (owned by spec manager) + +2. Result after in-place update trigger: + - labels + annotations => metadata manager + - spec => spec manager + - in-progress / cloned-from annotations => spec manager