Skip to content

Commit 865c093

Browse files
Adjust for feedback
1 parent 7977379 commit 865c093

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

articles/operator-nexus/troubleshoot-kubernetes-cluster-node-cordoned.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,9 @@ The purpose of this guide is to troubleshoot a Kubernetes Cluster when 1 or more
1919

2020
## Typical Cause
2121

22-
After a runtime upgrade, before a Baremetal Machine is shut down for reimaging, the machine lifecycle controller will cordon and drain Virtual Machine resources scheduled to that Baremetal Machine. Once the Baremetal Machine resolves the reimaging process, the expectation is that the machine lifecycle controller reschedules Virtual Machines to that Baremetal Machine. It would then uncordon the Virtual Machine, with the Kubernetes Cluster Node it supports reflecting the appropriate state `Ready`.
22+
During a Nexus Cluster runtime upgrade on a Baremetal Machine hosting Tenant workloads, the system will cordon and drain Virtual Machine resources scheduled to that Baremetal Machine, prior to reimaging and shutting down the Baremetal Machine. Once the Baremetal Machine completes the runtime upgrade, the expectation is that the system reschedules Virtual Machines to that Baremetal Machine. It would then uncordon the Virtual Machine, with the Kubernetes Cluster Node it supports reflecting the appropriate state `Ready`.
2323

24-
However, a race condition may occur wherein the machine lifecycle controller fails to find Virtual Machines that should be scheduled to that Baremetal Machine. Each Virtual Machine is deployed using a virt-launcher pod. This race condition happens when the virt-launcher pod's image pull job isn't yet complete. Only after the image pull job is complete will the pod be schedulable to a Baremetal Machine. When the machine lifecycle controller examines these virt-launcher pods during the uncordon action execution, it can't find which Baremetal Machine the pod. Therefore the machine lifecycle controller skips uncordoning that Virtual Machine that that pod represents.
24+
However, a race condition may occur wherein the system fails to find Virtual Machines that should be scheduled to that Baremetal Machine. Each Virtual Machine is deployed using a virt-launcher pod. This race condition happens when the virt-launcher pod's image pull job isn't yet complete. Only after the image pull job is complete will the pod be schedulable to a Baremetal Machine. When the system examines these virt-launcher pods during the uncordon action execution, it can't find which Baremetal Machine the pod. Therefore the system skips uncordoning that Virtual Machine that that pod represents.
2525

2626
## Procedure
2727

0 commit comments

Comments
 (0)