Skip to content

Commit dfc1f07

Browse files
authored
Merge pull request #1 from miniroy/update-naks-cordon-timeout-value
Update Nexus kubernetes cluster node cordon and drain timeout value
2 parents 4e02a13 + b5c25fe commit dfc1f07

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

articles/operator-nexus/concepts-cluster-upgrade-overview.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,9 +63,9 @@ Details on how to run an upgrade with rack pause are located [here](./howto-clus
6363

6464
During a runtime upgrade, impacted Nexus Kubernetes Cluster nodes are cordoned and drained before the servers are upgraded. Cordoning the Kubernetes Cluster node prevents new pods from being scheduled on it. Draining the Kubernetes Cluster node allows pods that are running tenant workloads a chance to shift to another available Kubernetes Cluster node, which helps to reduce the disruption on services. The draining mechanism's effectiveness is contingent on the available capacity within the Nexus Kubernetes Cluster. If the Kubernetes Cluster is nearing full capacity and lacks space for the pods to relocate, they transition into a Pending state following the draining process.
6565

66-
Once the cordon and drain process of the tenant cluster node is completed, the upgrade of the server proceeds. Each tenant cluster node is allowed up to 10 minutes for the draining process to complete, after which the server upgrade begins. This process guarantees the server upgrade makes progress. Servers are upgraded one rack at a time, and upgrades are performed in parallel within the same rack. The server upgrade doesn't wait for tenant resources to come online before continuing with the runtime upgrade of servers in the rack being upgraded. The benefit of this is that the maximum overall wait time for a rack upgrade is kept at 10 minutes regardless of how many nodes are available. This maximum wait time is specific to the cordon and drain procedure and isn't applied to the overall upgrade procedure. Upon completion of each server upgrade, the Nexus Kubernetes cluster node starts, rejoins the cluster, and is uncordoned, allowing pods to be scheduled on the node once again.
66+
Once the cordon and drain process of the tenant cluster node is completed, the upgrade of the server proceeds. Each tenant cluster node is allowed up to 20 minutes for the draining process to complete, after which the server upgrade begins. This process guarantees the server upgrade makes progress. Servers are upgraded one rack at a time, and upgrades are performed in parallel within the same rack. The server upgrade doesn't wait for tenant resources to come online before continuing with the runtime upgrade of servers in the rack being upgraded. The benefit of this is that the maximum overall wait time for a rack upgrade is kept at 20 minutes regardless of how many nodes are available. This maximum wait time is specific to the cordon and drain procedure and isn't applied to the overall upgrade procedure. Upon completion of each server upgrade, the Nexus Kubernetes cluster node starts, rejoins the cluster, and is uncordoned, allowing pods to be scheduled on the node once again.
6767

68-
It's important to note that the Nexus Kubernetes cluster node won't be shut down after the cordon and drain process. The server is rebooted with the new image as soon as all the Nexus Kubernetes cluster nodes are cordoned and drained, after 10 minutes if the drain process isn't completed. Additionally, the cordon and drain isn't initiated for power-off or restart actions of the server; it exclusively activates only during a runtime upgrade.
68+
It's important to note that the Nexus Kubernetes cluster node won't be shut down after the cordon and drain process. The server is rebooted with the new image as soon as all the Nexus Kubernetes cluster nodes are cordoned and drained, after 20 minutes if the drain process isn't completed. Additionally, the cordon and drain isn't initiated for power-off or restart actions of the server; it exclusively activates only during a runtime upgrade.
6969

7070
It's important to note that following the runtime upgrade, there could be instance where a Nexus Kubernetes Cluster node remains cordoned. For such scenario, you locate uncordon nodes by executing the following command.
7171

@@ -80,4 +80,4 @@ When a server is upgraded to utilize a new OS, the BMM keysets have to be re-est
8080

8181
## Servers not upgraded successfully
8282

83-
A server remains unavailable if they fail upgrade or provisioning from possible hardware issue during reboot or issue with cloud-init (networking, chronyd, etc.). The underlying condition needs to be resolved and either baremetalmachine replace/reimage would need to be executed. Uncordoning the server manually won't resolve the issues.
83+
A server remains unavailable if they fail upgrade or provisioning from possible hardware issue during reboot or issue with cloud-init (networking, chronyd, etc.). The underlying condition needs to be resolved and either baremetalmachine replace/reimage would need to be executed. Uncordoning the server manually won't resolve the issues.

0 commit comments

Comments
 (0)