-Ongoing control plane resiliency requires a spare KCP node. When KCP node fails remediation and is marked `Unhealthy`, a deprovisioning of the node occures, and it's then swapped with a suitable healthy Management Plane server. This Management Plane server becomes the new spare KCP node. The failed KCP node is updated and labeled as a Management Plane node. Once the label changes, an attempt to provision the newly labeled management plane node occurs. If it fails to provision, the management plane remediation process takes over. If it continues to fail to provision or run successfully, it's left in an unhealthy state for the customer to fix. The unhealthy condition surfaces to the Bare Metal Machine's (BMM) `detailedStatus` fields in Azure, and clears through a BMM Replace action.
0 commit comments