openshift · sr1kar99 · May 14, 2026 · ocpdocs-vale-bot · May 14, 2026 · ocpdocs-vale-bot
diff --git a/modules/configuring-custom-machine-config-pools-parallel-upgrades.adoc b/modules/configuring-custom-machine-config-pools-parallel-upgrades.adoc
@@ -0,0 +1,147 @@
+// Module included in the following assemblies:
+//
+// * openshift-docs/updating/preparing_for_updates/updating-cluster-prepare.adoc
+
+:_mod-docs-content-type: PROCEDURE
+[id="configuring-custom-machine-config-pools-parallel-upgrades_{context}"]
+= Configuring custom machine config pools for parallel upgrades
+
+You can accelerate upgrades in large clusters by partitioning worker nodes into custom Machine Config Pools (MCPs) that align with Kubernetes failure domains (KFDs). This approach changes the default sequential node-by-node update process into a parallel partition-based strategy.
+
+This strategy is particularly relevant for bare-metal and on-premise environments where you cannot add temporary surge capacity during upgrades. By updating all nodes in a single failure domain simultaneously, you can reduce upgrade time while keeping other failure domains available to maintain High Availability (HA) for workloads.
+
+Configure each custom MCP with `maxUnavailable: 100%` so that all nodes in that pool to update at the same time. This setting applies only to the nodes in the selected MCP, not to the entire cluster.
+
+Plan the number and size of custom MCPs based on your cluster topology, workload distribution, and application availability requirements, including Pod Disruption Budgets (PDBs).
+
+.Prerequisites
+
+* You have access to the cluster as a user with the `cluster-admin` role.
+* You have the OpenShift CLI (`oc`) installed.
+* You have identified the failure domains (zones) in your cluster topology (for example, `worker-0`, `worker-1`, `worker-2`, `worker-3`).
+* You have ensured that the cluster has sufficient spare capacity to support the workload if one failure domain is unavailable.
+
+.Procedure
+
+. Create a YAML file for each custom MCP that corresponds to a failure domain in your cluster, as in the following example:
++
+[source,yaml]
+----
+apiVersion: machineconfiguration.openshift.io/v1
+kind: MachineConfigPool
+metadata:
+  name: worker-0
+  labels:
+    machineconfiguration.openshift.io/role: worker-0
+spec:
+  machineConfigSelector:
+    matchExpressions:
+      - key: machineconfiguration.openshift.io/role
+        operator: In
+        values: [ worker, worker-0 ]
+  paused: true
+  maxUnavailable: 100%
+  nodeSelector:
+    matchLabels:
+      node-role.kubernetes.io/worker-0: ""
+----
++
+Ensure the following configurations are present:
+
+* *maxUnavailable*: Set this value to `100%`. This setting enables the Machine Config Operator (MCO) to update all nodes in the single failure domain simultaneously, significantly reducing upgrade time for the zone.
+
+* *paused*: Set this value to `true`. Pausing the pool prevents unintended updates from starting immediately. You unpause the pool only when you are ready to upgrade that specific zone.
+
+* *nodeSelector*: Define a label to identify the nodes that belong to this pool (for example, `node-role.kubernetes.io/worker-0`).
+
+. Apply the `topology.kubernetes.io/zone` label to identify the KFD for the Kubernetes scheduler, and the custom node role label (for example, `worker-0`) to assign the node to the MCP by running the following command:
++
+[source,terminal]
+----
+$ oc label node <node_name> node-role.kubernetes.io/worker-0="" topology.kubernetes.io/zone=<zone_name> --overwrite
+----
++
+[IMPORTANT]
+====
+Apply the `topology.kubernetes.io/zone` label before scheduling workloads on the cluster. This label guides the Kubernetes scheduler in distributing application replicas across failure domains. If you apply the label after workloads are already running, you might need to reschedule those workloads to achieve the intended distribution. Applying this label during cluster installation or node scaling is recommended, but you can also add it after installation.
+====
+
+. Verify that the worker nodes have been partitioned from the default worker pool into custom MCPs. The default worker pool must display as empty, and each custom MCP must contain the nodes assigned to its corresponding failure domain.
++
+[source,terminal]
+----
+$ oc get mcp -o custom-columns=NAME:.metadata.name,MACHINECOUNT:.status.machineCount
+----
++
+.Example output
+[source,terminal]
+----
+NAME       MACHINECOUNT
+master     3
+worker     0
+worker-0   2
+worker-1   2
+worker-2   2
+worker-3   2
+----
+
+. Verify that the worker nodes are correctly assigned to the custom MCPs and that the pools are created successfully.
++
+[source,terminal]
+----
+$ oc get nodes --show-labels=false -L 'topology.kubernetes.io/zone'
+----
++
+.Example output
+----
+NAME                                        STATUS   ROLES                  AGE   VERSION   ZONE
+master-0.topology.kubernetes.io/zone=kfd0   Ready    control-plane,master   27h   v1.31.13
+worker-0.topology.kubernetes.io/zone=kfd0   Ready    worker,worker-0        27h   v1.31.13  kfd0
+worker-1.topology.kubernetes.io/zone=kfd1   Ready    worker,worker-1        27h   v1.31.13  kfd0
+worker-2.topology.kubernetes.io/zone=kfd2   Ready    worker,worker-2        27h   v1.31.13  kfd2
+worker-3.topology.kubernetes.io/zone=kfd3   Ready    worker,worker-3        27h   v1.31.13  kfd3
+----
+
+. Schedule workloads on the cluster only after you verify that nodes are labeled correctly and distributed across Kubernetes failure domains.
+
+. Keep each custom MCP paused until you are ready to upgrade the cluster.
++
+[source,terminal]
+----
+$ oc get mcp
+----
+
+. Upgrade the control plane by using the standard cluster upgrade process.
++
+[source,terminal]
+----
+$ oc adm upgrade
+----
+
+. Unpause one custom MCP at a time to begin upgrading that failure domain.
++
+[source,terminal]
+----
+$ oc patch mcp <custom_mcp_name> --type=merge -p '{"spec":{"paused":false}}'
+----
+
+. Wait until the MCP update completes before unpausing the next custom MCP.
++
+[source,terminal]
+----
+$ oc get mcp <custom_mcp_name>
+----
+
+. Repeat this process for each custom MCP until all worker pools are upgraded.
++
+[source,terminal]
+----
+$ oc patch mcp <next_custom_mcp_name> --type=merge -p '{"spec":{"paused":false}}'
+----
+
+. Verify that all custom MCPs report a completed update before completing the cluster upgrade.
++
+[source,terminal]
+----
+$ oc get mcp
+----
diff --git a/updating/preparing_for_updates/updating-cluster-prepare.adoc b/updating/preparing_for_updates/updating-cluster-prepare.adoc
@@ -51,5 +51,8 @@ include::modules/update-best-practices.adoc[leveloffset=+1]
 .Additional resources
 * xref:../../updating/understanding_updates/intro-to-updates.adoc#understanding_clusteroperator_conditiontypes_understanding-openshift-updates[Understanding cluster Operator condition types]
 
+// Configuring custom machine config pools for parallel upgrades
+include::modules/configuring-custom-machine-config-pools-parallel-upgrades.adoc[leveloffset=+1]
+
 // Minimizing worker node deployment time
 include::modules/minimizing-worker-node-deployment-time.adoc[leveloffset=+1]