Merge pull request #281812 from syzehra/syzehra/cordon

prmerger-automator[bot] · web-flow · commit df89337fa73b · 2024-08-14T19:02:14.000Z
[operator-nexus] [NC-3.12] Add FAQ for cordon and drain feature for NC 3.12 release
diff --git a/articles/operator-nexus/howto-cluster-runtime-upgrade.md b/articles/operator-nexus/howto-cluster-runtime-upgrade.md
@@ -17,7 +17,7 @@ This how-to guide explains the steps for installing the required Azure CLI and e
 ## Prerequisites
 
 1. The [Install Azure CLI][installation-instruction] must be installed.
-2. The `networkcloud` CLI extension is required.  If the `networkcloud` extension isn't installed, it can be installed following the steps listed [here](https://github.com/MicrosoftDocs/azure-docs-pr/blob/main/articles/operator-nexus/howto-install-cli-extensions.md).
+2. The `networkcloud` CLI extension is required. If the `networkcloud` extension isn't installed, it can be installed following the steps listed [here](https://github.com/MicrosoftDocs/azure-docs-pr/blob/main/articles/operator-nexus/howto-install-cli-extensions.md).
 3. Access to the Azure portal for the target cluster to be upgraded.
 4. You must be logged in to the same subscription as your target cluster via `az login`
 5. Target cluster must be in a running state, with all control plane nodes healthy and 80+% of compute nodes in a running and healthy state.
@@ -142,7 +142,26 @@ If the rack's spec wasn't updated to the upgraded runtime version before the har
 
 ### After a runtime upgrade, the cluster shows "Failed" Provisioning State
 
-During a runtime upgrade the cluster will enter a state of `Upgrading`  In the event of a failure of the runtime upgrade, for reasons related to the resources, the cluster will go into a `Failed` provisioning state.  This state could be linked to the lifecycle of the components related to the cluster (e.g StorageAppliance) and might be necessary to diagnose the failure with Microsoft support.
+During a runtime upgrade, the cluster enters a state of `Upgrading`. In the event of a failure of the runtime upgrade, the cluster will go into a `Failed` provisioning state. Failures during upgrade may be caused by infrastructure components (e.g the Storage Appliance). In some scenarios, it may be necessary to diagnose the failure with Microsoft support.
+
+### Impact on Nexus Kubernetes tenant workloads during cluster runtime upgrade
+
+During a runtime upgrade, impacted Nexus Kubernetes cluster nodes are cordoned and drained before the Bare Metal Hosts (BMH) are upgraded. Cordoning the cluster node prevents new pods from being scheduled on it and draining the cluster node allows pods that are running tenant workloads a chance to shift to another available cluster node, which helps to reduce the impact on services. The draining mechanism's effectiveness is contingent on the available capacity within the Nexus Kubernetes cluster. If the cluster is nearing full capacity and lacks space for the pods to relocate, they transition into a Pending state following the draining process. 
+
+Once the cordon and drain process of the tenant cluster node is completed, the upgrade of the BMH proceeds. Each tenant cluster node is allowed up to 10 minutes for the draining process to complete, after which the BMH upgrade will begin. This guarantees the BMH upgrade will make progress. BMHs are upgraded one rack at a time, and upgrades are performed in parallel within the same rack. The BMH upgrade does not wait for tenant resources to come online before continuing with the runtime upgrade of BMHs in the rack being upgraded. The benefit of this is that the maximum overall wait time for a rack upgrade is kept at 10 minutes regardless of how many nodes are available. This maximum wait time is specific to the cordon and drain procedure and is not applied to the overall upgrade procedure. Upon completion of each BMH upgrade, the Nexus Kubernetes cluster node starts, rejoins the cluster, and is uncordoned, allowing pods to be scheduled on the node once again.
+
+It's important to note that the Nexus Kubernetes cluster node won't be shut down after the cordon and drain process. The BMH is rebooted with the new image as soon as all the Nexus Kubernetes cluster nodes are cordoned and drained, after 10 minutes if the drain process isn't completed. Additionally, the cordon and drain is not initiated for power-off or restart actions of the BMH; it's exclusively activated only during a runtime upgrade.
+
+It is important to note that following the runtime upgrade, there could be instance where a Nexus Kubernetes Cluster node remains cordoned. For such scenario, you can manually uncordon the node by executing the following commands via(./includes/kubernetes-cluster/cluster-connect.md)
+
+```
+kubectl get nodes  | grep SchedulingDisabled > /dev/null
+if [ $? -eq 0 ]; then
+for node in $(kubectl get nodes | grep SchedulingDisabled | awk '{print $1}'); do
+    kubectl uncordon $node
+done
+fi
+```
 
 <!-- LINKS - External -->
 [installation-instruction]: https://aka.ms/azcli