Skip to content

Commit ec5cd46

Browse files
authored
Merge pull request #72184 from jldohmann/OCPBUGS-24188
OCPBUGS-24188: add warning about setting control mcp to 3
2 parents 4892cac + 2e83cc1 commit ec5cd46

9 files changed

+47
-1
lines changed

modules/deployments-rolling-strategy.adoc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,3 +60,9 @@ These parameters allow the deployment to be tuned for availability and speed. Fo
6060

6161
Generally, if you want fast rollouts, use `maxSurge`. If you have to take into account resource quota and can accept partial unavailability, use
6262
`maxUnavailable`.
63+
64+
[WARNING]
65+
====
66+
The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
67+
====
68+

modules/nodes-pods-pod-disruption-about.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,11 @@ A `maxUnavailable` of `0%` or `0` or a `minAvailable` of `100%` or equal to the
3333
is permitted but can block nodes from being drained.
3434
====
3535

36+
[WARNING]
37+
====
38+
The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
39+
====
40+
3641
You can check for pod disruption budgets across all projects with the following:
3742

3843
[source,terminal]

modules/update-best-practices.adoc

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,11 @@ Additionally, if compute nodes do not have enough spare capacity, workloads migh
4848

4949
Make sure that you have enough available nodes in each worker pool, as well as enough spare capacity on your compute nodes, to increase the chance of successful node updates.
5050

51+
[WARNING]
52+
====
53+
The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
54+
====
55+
5156
[id="pod-disruption-budget_{context}"]
5257
== Ensure that the cluster's PodDisruptionBudget is properly configured
5358

@@ -60,4 +65,4 @@ When planning a cluster update, check the configuration of the `PodDisruptionBud
6065

6166
* For highly available workloads, make sure there are replicas that can be temporarily taken offline without being prohibited by the `PodDisruptionBudget`.
6267

63-
* For workloads that aren't highly available, make sure they are either not protected by a `PodDisruptionBudget` or have some alternative mechanism for draining these workloads eventually, such as periodic restart or guaranteed eventual termination.
68+
* For workloads that aren't highly available, make sure they are either not protected by a `PodDisruptionBudget` or have some alternative mechanism for draining these workloads eventually, such as periodic restart or guaranteed eventual termination.

modules/update-duration-estimate-cluster-update-time.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,11 @@ Cluster update time = CVO target update payload deployment time + (# node update
1414

1515
A node update iteration consists of one or more nodes updated in parallel. The control plane nodes are always updated in parallel with the compute nodes. In addition, one or more compute nodes can be updated in parallel based on the `maxUnavailable` value.
1616

17+
[WARNING]
18+
====
19+
The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
20+
====
21+
1722
For example, to estimate the update time, consider an {product-title} cluster with three control plane nodes and six compute nodes and each host takes about 5 minutes to reboot.
1823

1924
[NOTE]

modules/update-duration-factors.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,11 @@ The following factors can affect your cluster update duration:
1010

1111
* The reboot of compute nodes to the new machine configuration by Machine Config Operator (MCO)
1212
** The value of `MaxUnavailable` in the machine config pool
13+
+
14+
[WARNING]
15+
====
16+
The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
17+
====
1318
** The minimum number or percentages of replicas set in pod disruption budget (PDB)
1419
* The number of nodes in the cluster
1520
* The health of the cluster nodes

modules/update-mco-process.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@
77
= Understanding how the Machine Config Operator updates nodes
88
The Machine Config Operator (MCO) applies a new machine configuration to each control plane node and compute node. During the machine configuration update, control plane nodes and compute nodes are organized into their own machine config pools, where the pools of machines are updated in parallel. The `.spec.maxUnavailable` parameter, which has a default value of `1`, determines how many nodes in a machine config pool can simultaneously undergo the update process.
99

10+
[WARNING]
11+
====
12+
The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
13+
====
14+
1015
When the machine configuration update process begins, the MCO checks the amount of currently unavailable nodes in a pool. If there are fewer unavailable nodes than the value of `.spec.maxUnavailable`, the MCO initiates the following sequence of actions on available nodes in the pool:
1116

1217
. Cordon and drain the node

modules/update-service-overview.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,11 @@ Only updating to a newer version is supported. Reverting or rolling back your cl
3535

3636
During the update process, the Machine Config Operator (MCO) applies the new configuration to your cluster machines. The MCO cordons the number of nodes specified by the `maxUnavailable` field on the machine configuration pool and marks them unavailable. By default, this value is set to `1`. The MCO updates the affected nodes alphabetically by zone, based on the `topology.kubernetes.io/zone` label. If a zone has more than one node, the oldest nodes are updated first. For nodes that do not use zones, such as in bare metal deployments, the nodes are updated by age, with the oldest nodes updated first. The MCO updates the number of nodes as specified by the `maxUnavailable` field on the machine configuration pool at a time. The MCO then applies the new configuration and reboots the machine.
3737

38+
[WARNING]
39+
====
40+
The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
41+
====
42+
3843
If you use {op-system-base-full} machines as workers, the MCO does not update the kubelet because you must update the OpenShift API on the machines first.
3944

4045
With the specification for the new version applied to the old kubelet, the {op-system-base} machine cannot return to the `Ready` state. You cannot complete the update until the machines are available. However, the maximum number of unavailable nodes is set to ensure that normal cluster operations can continue with that number of machines out of service.

modules/update-using-custom-machine-config-pools-about.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,11 @@ The following steps outline the high-level workflow of the canary rollout update
1414
====
1515
You can change the `maxUnavailable` setting in an MCP to specify the percentage or the number of machines that can be updating at any given time. The default is `1`.
1616
====
17+
+
18+
[WARNING]
19+
====
20+
The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
21+
====
1722

1823
. Add a node selector to the custom MCPs. For each node that you do not want to update simultaneously with the rest of the cluster, add a matching label to the nodes. This label associates the node to the MCP.
1924
+

updating/updating_a_cluster/update-using-custom-machine-config-pools.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,11 @@ Because the MCO does not update nodes within paused MCPs, you can pause the MCPs
102102
Using one or more custom MCPs can give you more control over the sequence in which you update your worker nodes.
103103
For example, after you update the nodes in the first MCP, you can verify the application compatibility and then update the rest of the nodes gradually to the new version.
104104

105+
[WARNING]
106+
====
107+
The default setting for `maxUnavailable` is `1` for all the machine config pools in {product-title}. It is recommended to not change this value and update one control plane node at a time. Do not change this value to `3` for the control plane pool.
108+
====
109+
105110
[NOTE]
106111
====
107112
To ensure the stability of the control plane, creating a custom MCP from the control plane nodes is not supported. The Machine Config Operator (MCO) ignores any custom MCP created for the control plane nodes.

0 commit comments

Comments
 (0)