Skip to content

Commit f2f149e

Browse files
author
Andrew
committed
Updates for clarity and to add in info on swapped management node remediation
1 parent dad343f commit f2f149e

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/operator-nexus/concepts-rack-resiliency.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ Remediation Process:
8181
- Remediation of a Management Plane node is to attempt one reboot and then one reprovisioning attempt. If those steps fail, the node is marked `Unhealthy`.
8282
- Remediation of a KCP node is to attempt one reboot. If the reboot fails, the node is marked `Unhealthy` which triggers the immediate provisioning of the spare KCP node.
8383

84-
A spare KCP node is required to ensure ongoing control plane resiliency. When KCP node fails remediation and is marked `Unhealthy`, it's deprovisioned and then swapped with a suitable healthy Management Plane host. This Management Plane host becomes the new spare KCP node. The failed KCP node is updated and labeled as a Management Plane node. If it continues to fail to provision or run successfully, it's left in an unhealthy state for the customer to fix the underlying issue. The unhealthy condition surfaces to the Bare Metal Machine's (BMM) `detailedStatus` fields in Azure, and clears through a BMM Replace action.
84+
Ongoing control plane resiliency requires a spare KCP node. When KCP node fails remediation and is marked `Unhealthy`, a deprovisioning of the node occures, and it's then swapped with a suitable healthy Management Plane server. This Management Plane server becomes the new spare KCP node. The failed KCP node is updated and labeled as a Management Plane node. Once the label changes, an attempt to provision the newly labeled management plane node occurs. If it fails to provision, the management plane remediation process takes over. If it continues to fail to provision or run successfully, it's left in an unhealthy state for the customer to fix. The unhealthy condition surfaces to the Bare Metal Machine's (BMM) `detailedStatus` fields in Azure, and clears through a BMM Replace action.
8585

8686
## Related Links
8787

0 commit comments

Comments
 (0)