diff --git a/modules/configuring-custom-machine-config-pools-parallel-upgrades.adoc b/modules/configuring-custom-machine-config-pools-parallel-upgrades.adoc new file mode 100644 index 000000000000..37f9300ab569 --- /dev/null +++ b/modules/configuring-custom-machine-config-pools-parallel-upgrades.adoc @@ -0,0 +1,147 @@ +// Module included in the following assemblies: +// +// * openshift-docs/updating/preparing_for_updates/updating-cluster-prepare.adoc + +:_mod-docs-content-type: PROCEDURE +[id="configuring-custom-machine-config-pools-parallel-upgrades_{context}"] += Configuring custom machine config pools for parallel upgrades + +You can accelerate upgrades in large clusters by partitioning worker nodes into custom Machine Config Pools (MCPs) that align with Kubernetes failure domains (KFDs). This approach changes the default sequential node-by-node update process into a parallel partition-based strategy. + +This strategy is particularly relevant for bare-metal and on-premise environments where you cannot add temporary surge capacity during upgrades. By updating all nodes in a single failure domain simultaneously, you can reduce upgrade time while keeping other failure domains available to maintain High Availability (HA) for workloads. + +Configure each custom MCP with `maxUnavailable: 100%` so that all nodes in that pool to update at the same time. This setting applies only to the nodes in the selected MCP, not to the entire cluster. + +Plan the number and size of custom MCPs based on your cluster topology, workload distribution, and application availability requirements, including Pod Disruption Budgets (PDBs). + +.Prerequisites + +* You have access to the cluster as a user with the `cluster-admin` role. +* You have the OpenShift CLI (`oc`) installed. +* You have identified the failure domains (zones) in your cluster topology (for example, `worker-0`, `worker-1`, `worker-2`, `worker-3`). +* You have ensured that the cluster has sufficient spare capacity to support the workload if one failure domain is unavailable. + +.Procedure + +. Create a YAML file for each custom MCP that corresponds to a failure domain in your cluster, as in the following example: ++ +[source,yaml] +---- +apiVersion: machineconfiguration.openshift.io/v1 +kind: MachineConfigPool +metadata: + name: worker-0 + labels: + machineconfiguration.openshift.io/role: worker-0 +spec: + machineConfigSelector: + matchExpressions: + - key: machineconfiguration.openshift.io/role + operator: In + values: [ worker, worker-0 ] + paused: true + maxUnavailable: 100% + nodeSelector: + matchLabels: + node-role.kubernetes.io/worker-0: "" +---- ++ +Ensure the following configurations are present: + +* *maxUnavailable*: Set this value to `100%`. This setting enables the Machine Config Operator (MCO) to update all nodes in the single failure domain simultaneously, significantly reducing upgrade time for the zone. + +* *paused*: Set this value to `true`. Pausing the pool prevents unintended updates from starting immediately. You unpause the pool only when you are ready to upgrade that specific zone. + +* *nodeSelector*: Define a label to identify the nodes that belong to this pool (for example, `node-role.kubernetes.io/worker-0`). + +. Apply the `topology.kubernetes.io/zone` label to identify the KFD for the Kubernetes scheduler, and the custom node role label (for example, `worker-0`) to assign the node to the MCP by running the following command: ++ +[source,terminal] +---- +$ oc label node node-role.kubernetes.io/worker-0="" topology.kubernetes.io/zone= --overwrite +---- ++ +[IMPORTANT] +==== +Apply the `topology.kubernetes.io/zone` label before scheduling workloads on the cluster. This label guides the Kubernetes scheduler in distributing application replicas across failure domains. If you apply the label after workloads are already running, you might need to reschedule those workloads to achieve the intended distribution. Applying this label during cluster installation or node scaling is recommended, but you can also add it after installation. +==== + +. Verify that the worker nodes have been partitioned from the default worker pool into custom MCPs. The default worker pool must display as empty, and each custom MCP must contain the nodes assigned to its corresponding failure domain. ++ +[source,terminal] +---- +$ oc get mcp -o custom-columns=NAME:.metadata.name,MACHINECOUNT:.status.machineCount +---- ++ +.Example output +[source,terminal] +---- +NAME MACHINECOUNT +master 3 +worker 0 +worker-0 2 +worker-1 2 +worker-2 2 +worker-3 2 +---- + +. Verify that the worker nodes are correctly assigned to the custom MCPs and that the pools are created successfully. ++ +[source,terminal] +---- +$ oc get nodes --show-labels=false -L 'topology.kubernetes.io/zone' +---- ++ +.Example output +---- +NAME STATUS ROLES AGE VERSION ZONE +master-0.topology.kubernetes.io/zone=kfd0 Ready control-plane,master 27h v1.31.13 +worker-0.topology.kubernetes.io/zone=kfd0 Ready worker,worker-0 27h v1.31.13 kfd0 +worker-1.topology.kubernetes.io/zone=kfd1 Ready worker,worker-1 27h v1.31.13 kfd0 +worker-2.topology.kubernetes.io/zone=kfd2 Ready worker,worker-2 27h v1.31.13 kfd2 +worker-3.topology.kubernetes.io/zone=kfd3 Ready worker,worker-3 27h v1.31.13 kfd3 +---- + +. Schedule workloads on the cluster only after you verify that nodes are labeled correctly and distributed across Kubernetes failure domains. + +. Keep each custom MCP paused until you are ready to upgrade the cluster. ++ +[source,terminal] +---- +$ oc get mcp +---- + +. Upgrade the control plane by using the standard cluster upgrade process. ++ +[source,terminal] +---- +$ oc adm upgrade +---- + +. Unpause one custom MCP at a time to begin upgrading that failure domain. ++ +[source,terminal] +---- +$ oc patch mcp --type=merge -p '{"spec":{"paused":false}}' +---- + +. Wait until the MCP update completes before unpausing the next custom MCP. ++ +[source,terminal] +---- +$ oc get mcp +---- + +. Repeat this process for each custom MCP until all worker pools are upgraded. ++ +[source,terminal] +---- +$ oc patch mcp --type=merge -p '{"spec":{"paused":false}}' +---- + +. Verify that all custom MCPs report a completed update before completing the cluster upgrade. ++ +[source,terminal] +---- +$ oc get mcp +---- \ No newline at end of file diff --git a/updating/preparing_for_updates/updating-cluster-prepare.adoc b/updating/preparing_for_updates/updating-cluster-prepare.adoc index 0039094cfaec..7516cc9b86f7 100644 --- a/updating/preparing_for_updates/updating-cluster-prepare.adoc +++ b/updating/preparing_for_updates/updating-cluster-prepare.adoc @@ -51,5 +51,8 @@ include::modules/update-best-practices.adoc[leveloffset=+1] .Additional resources * xref:../../updating/understanding_updates/intro-to-updates.adoc#understanding_clusteroperator_conditiontypes_understanding-openshift-updates[Understanding cluster Operator condition types] +// Configuring custom machine config pools for parallel upgrades +include::modules/configuring-custom-machine-config-pools-parallel-upgrades.adoc[leveloffset=+1] + // Minimizing worker node deployment time include::modules/minimizing-worker-node-deployment-time.adoc[leveloffset=+1]