Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions modules/configuring-custom-machine-config-pools-parallel-upgrades.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
// Module included in the following assemblies:
//
// * openshift-docs/updating/preparing_for_updates/updating-cluster-prepare.adoc

:_mod-docs-content-type: PROCEDURE
[id="configuring-custom-machine-config-pools-parallel-upgrades_{context}"]
= Configuring custom machine config pools for parallel upgrades
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 [error] AsciiDocDITA.ShortDescription: Assign [role="_abstract"] to a paragraph to use it as in DITA.


You can accelerate upgrades in large clusters by partitioning worker nodes into custom Machine Config Pools (MCPs) that align with Kubernetes failure domains (KFDs). This approach changes the default sequential node-by-node update process into a parallel partition-based strategy.

This strategy is particularly relevant for bare-metal and on-premise environments where you cannot add temporary surge capacity during upgrades. By updating all nodes in a single failure domain simultaneously, you can reduce upgrade time while keeping other failure domains available to maintain High Availability (HA) for workloads.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 [error] RedHat.TermsErrors: Use 'bare metal' rather than 'bare-metal'. For more information, see RedHat.TermsErrors.


Configure each custom MCP with `maxUnavailable: 100%` so that all nodes in that pool to update at the same time. This setting applies only to the nodes in the selected MCP, not to the entire cluster.

Plan the number and size of custom MCPs based on your cluster topology, workload distribution, and application availability requirements, including Pod Disruption Budgets (PDBs).

.Prerequisites

* You have access to the cluster as a user with the `cluster-admin` role.
* You have the OpenShift CLI (`oc`) installed.
* You have identified the failure domains (zones) in your cluster topology (for example, `worker-0`, `worker-1`, `worker-2`, `worker-3`).
* You have ensured that the cluster has sufficient spare capacity to support the workload if one failure domain is unavailable.

.Procedure

. Create a YAML file for each custom MCP that corresponds to a failure domain in your cluster, as in the following example:
+
[source,yaml]
----
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: worker-0
labels:
machineconfiguration.openshift.io/role: worker-0
spec:
machineConfigSelector:
matchExpressions:
- key: machineconfiguration.openshift.io/role
operator: In
values: [ worker, worker-0 ]
paused: true
maxUnavailable: 100%
nodeSelector:
matchLabels:
node-role.kubernetes.io/worker-0: ""
----
+
Ensure the following configurations are present:

* *maxUnavailable*: Set this value to `100%`. This setting enables the Machine Config Operator (MCO) to update all nodes in the single failure domain simultaneously, significantly reducing upgrade time for the zone.

* *paused*: Set this value to `true`. Pausing the pool prevents unintended updates from starting immediately. You unpause the pool only when you are ready to upgrade that specific zone.

* *nodeSelector*: Define a label to identify the nodes that belong to this pool (for example, `node-role.kubernetes.io/worker-0`).

. Apply the `topology.kubernetes.io/zone` label to identify the KFD for the Kubernetes scheduler, and the custom node role label (for example, `worker-0`) to assign the node to the MCP by running the following command:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apply the.... to identify each KFD for the Kubernetes scheduler.

Looks like only one topology label needs to be added, but a different one must be added to each MCP (worker-0, worker-1, worker-2, worker-3..).

Copy link
Copy Markdown

@alosadagrande alosadagrande May 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would recommend including the node role label on each node during installation so you avoid any disruption on the node once the cluster is installed. The label, then, will already be included when the cluster is ready.

It can also be done later in a cluster already installed, but a warning message should be included that moving a node from a different MCP can cause disruption.

+
[source,terminal]
----
$ oc label node <node_name> node-role.kubernetes.io/worker-0="" topology.kubernetes.io/zone=<zone_name> --overwrite
----
+
[IMPORTANT]
====
Apply the `topology.kubernetes.io/zone` label before scheduling workloads on the cluster. This label guides the Kubernetes scheduler in distributing application replicas across failure domains. If you apply the label after workloads are already running, you might need to reschedule those workloads to achieve the intended distribution. Applying this label during cluster installation or node scaling is recommended, but you can also add it after installation.
====

. Verify that the worker nodes have been partitioned from the default worker pool into custom MCPs. The default worker pool must display as empty, and each custom MCP must contain the nodes assigned to its corresponding failure domain.
+
[source,terminal]
----
$ oc get mcp -o custom-columns=NAME:.metadata.name,MACHINECOUNT:.status.machineCount
----
+
.Example output
[source,terminal]
----
NAME MACHINECOUNT
master 3
worker 0
worker-0 2
worker-1 2
worker-2 2
worker-3 2
----

. Verify that the worker nodes are correctly assigned to the custom MCPs and that the pools are created successfully.
+
[source,terminal]
----
$ oc get nodes --show-labels=false -L 'topology.kubernetes.io/zone'
----
+
.Example output
----
NAME STATUS ROLES AGE VERSION ZONE
master-0.topology.kubernetes.io/zone=kfd0 Ready control-plane,master 27h v1.31.13
worker-0.topology.kubernetes.io/zone=kfd0 Ready worker,worker-0 27h v1.31.13 kfd0
worker-1.topology.kubernetes.io/zone=kfd1 Ready worker,worker-1 27h v1.31.13 kfd0
worker-2.topology.kubernetes.io/zone=kfd2 Ready worker,worker-2 27h v1.31.13 kfd2
worker-3.topology.kubernetes.io/zone=kfd3 Ready worker,worker-3 27h v1.31.13 kfd3
----

. Schedule workloads on the cluster only after you verify that nodes are labeled correctly and distributed across Kubernetes failure domains.

. Keep each custom MCP paused until you are ready to upgrade the cluster.
+
[source,terminal]
----
$ oc get mcp
----

. Upgrade the control plane by using the standard cluster upgrade process.
+
[source,terminal]
----
$ oc adm upgrade
----

. Unpause one custom MCP at a time to begin upgrading that failure domain.
+
[source,terminal]
----
$ oc patch mcp <custom_mcp_name> --type=merge -p '{"spec":{"paused":false}}'
----

. Wait until the MCP update completes before unpausing the next custom MCP.
+
[source,terminal]
----
$ oc get mcp <custom_mcp_name>
----

. Repeat this process for each custom MCP until all worker pools are upgraded.
+
[source,terminal]
----
$ oc patch mcp <next_custom_mcp_name> --type=merge -p '{"spec":{"paused":false}}'
----

. Verify that all custom MCPs report a completed update before completing the cluster upgrade.
+
[source,terminal]
----
$ oc get mcp
----
3 changes: 3 additions & 0 deletions updating/preparing_for_updates/updating-cluster-prepare.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -51,5 +51,8 @@ include::modules/update-best-practices.adoc[leveloffset=+1]
.Additional resources
* xref:../../updating/understanding_updates/intro-to-updates.adoc#understanding_clusteroperator_conditiontypes_understanding-openshift-updates[Understanding cluster Operator condition types]

// Configuring custom machine config pools for parallel upgrades
include::modules/configuring-custom-machine-config-pools-parallel-upgrades.adoc[leveloffset=+1]

// Minimizing worker node deployment time
include::modules/minimizing-worker-node-deployment-time.adoc[leveloffset=+1]