Skip to content

Commit 57a8b55

Browse files
OSDOCS3300:Adds openshift upgrade duration section
1 parent 65645bd commit 57a8b55

7 files changed

+181
-0
lines changed

_topic_maps/_topic_map.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -525,6 +525,8 @@ Topics:
525525
File: understanding-openshift-updates
526526
- Name: Understanding upgrade channels
527527
File: understanding-upgrade-channels-release
528+
- Name: Understanding OpenShift update duration
529+
File: understanding-openshift-update-duration
528530
- Name: Preparing to perform an EUS-to-EUS update
529531
File: preparing-eus-eus-upgrade
530532
- Name: Updating a cluster using the web console

modules/update-duration-cvo.adoc

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * updating/understanding-openshift-update-duration.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="cluster-version-operator_{context}"]
7+
= Cluster Version Operator target update payload deployment
8+
9+
The Cluster Version Operator (CVO) retrieves the target update release image and applies to the cluster. All components which run as pods are updated during this phase, whereas the host components are updated by the Machine Config Operator (MCO). This process might take 60 to 120 minutes.
10+
11+
[NOTE]
12+
====
13+
The CVO phase of the update does not restart the nodes.
14+
====
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * updating/understanding-openshift-update-duration.adoc
4+
5+
:_content-type: REFERENCE
6+
[id="estimating-cluster-update-time_{context}"]
7+
= Estimating cluster update time
8+
9+
Historical update duration of similar clusters provides you the best estimate for the future cluster updates. However, if the historical data is not available, you can use the following convention to estimate your cluster update time:
10+
11+
----
12+
Cluster update time = CVO target update payload deployment time + (# node update iterations x MCO node update time)
13+
----
14+
15+
A node update iteration consists of one or more nodes updated in parallel. The control plane nodes are always updated in parallel with the compute nodes. In addition, one or more compute nodes can be updated in parallel based on the `maxUnavailable` value.
16+
17+
For example, to estimate the update time, consider an {product-title} cluster with three control plane nodes and six compute nodes and each host takes about 5 minutes to reboot.
18+
19+
[NOTE]
20+
====
21+
The time it takes to reboot a particular node varies significantly. In cloud instances, the reboot might take about 1 to 2 minutes, whereas in physical bare metal hosts the reboot might take more than 15 minutes.
22+
====
23+
24+
.Scenario-1
25+
When you set `maxUnavailable` to `1` for both the control plane and compute nodes Machine Config Pool (MCP), then all the six compute nodes will update one after another in each iteration:
26+
27+
----
28+
Cluster update time = 60 + (6 x 5) = 90 minutes
29+
----
30+
31+
.Scenario-2
32+
When you set `maxUnavailable` to `2` for the compute node MCP, then two compute nodes will update in parallel in each iteration. Therefore it takes total three iterations to update all the nodes.
33+
34+
----
35+
Cluster update time = 60 + (3 x 5) = 75 minutes
36+
----
37+
38+
[IMPORTANT]
39+
====
40+
The default setting for `maxUnavailable` is `1` for all the MCPs in {product-title}. It is recommended that you do not change the `maxUnavailable` in the control plane MCP.
41+
====

modules/update-duration-factors.adoc

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * updating/understanding-openshift-update-duration.adoc
4+
5+
:_content-type: REFERENCE
6+
[id="factors-affecting-update-duration_{context}"]
7+
= Factors affecting update duration
8+
9+
The following factors can affect your cluster update duration:
10+
11+
* The reboot of compute nodes to the new machine configuration by Machine Config Operator (MCO)
12+
** The value of `MaxUnavailable` in the machine config pool
13+
** The minimum number or percentages of replicas set in pod disruption budget (PDB)
14+
* The number of nodes in the cluster
15+
* The health of the cluster nodes

modules/update-duration-mco.adoc

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * updating/understanding-openshift-update-duration.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="machine-config-operator-node-updates_{context}"]
7+
= Machine Config Operator node updates
8+
The Machine Config Operator (MCO) applies a new machine configuration to each control plane and compute node. During this process, the MCO performs the following sequential actions on each node of the cluster:
9+
10+
. Cordon and drain all the nodes
11+
. Update the operating system (OS)
12+
. Reboot the nodes
13+
. Uncordon all nodes and schedule workloads on the node
14+
15+
[NOTE]
16+
====
17+
When a node is cordoned, workloads cannot be scheduled to it.
18+
====
19+
20+
The time to complete this process depends on several factors including the node and infrastructure configuration. This process might take 5 or more minutes to complete per node.
21+
22+
In addition to MCO, you should consider the impact of the following parameters:
23+
24+
* The control plane node update duration is predictable and oftentimes shorter than compute nodes, because the control plane workloads are tuned for graceful updates and quick drains.
25+
26+
* You can update the compute nodes in parallel by setting the `maxUnavailable` field to greater than `1` in the Machine Config Pool (MCP). The MCO cordons the number of nodes specified in `maxUnavailable` and marks them unavailable for update.
27+
28+
* When you increase `maxUnavailable` on the MCP, it can help the pool to update more quickly. However, if `maxUnavailable` is set too high, and several nodes are cordoned simultaneously, the pod disruption budget (PDB) guarded workloads could fail to drain because a schedulable node cannot be found to run the replicas. If you increase `maxUnavailable` for the MCP, ensure that you still have sufficient schedulable nodes to allow PDB guarded workloads to drain.
29+
30+
* Before you begin the update, you must ensure that all the nodes are available. Any unavailable nodes can significantly impact the update duration because the node unavailability affects the `maxUnavailable` and pod disruption budgets.
31+
+
32+
To check the status of nodes from the terminal, run the following command:
33+
+
34+
[source,terminal]
35+
----
36+
$ oc get node
37+
----
38+
+
39+
.Example Output
40+
[source,terminal]
41+
----
42+
NAME STATUS ROLES AGE VERSION
43+
ip-10-0-137-31.us-east-2.compute.internal Ready,SchedulingDisabled worker 12d v1.23.5+3afdacb
44+
ip-10-0-151-208.us-east-2.compute.internal Ready master 12d v1.23.5+3afdacb
45+
ip-10-0-176-138.us-east-2.compute.internal Ready master 12d v1.23.5+3afdacb
46+
ip-10-0-183-194.us-east-2.compute.internal Ready worker 12d v1.23.5+3afdacb
47+
ip-10-0-204-102.us-east-2.compute.internal Ready master 12d v1.23.5+3afdacb
48+
ip-10-0-207-224.us-east-2.compute.internal Ready worker 12d v1.23.5+3afdacb
49+
----
50+
+
51+
If the status of the node is `NotReady` or `SchedulingDisabled`, then the node is not available and this impacts the update duration.
52+
+
53+
You can check the status of nodes from the *Administrator* perspective in the web console by expanding **Compute****Node**.
Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * updating/understanding-openshift-update-duration.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="redhat-enterprise-linux-nodes_{context}"]
7+
= {op-system-base-full} compute nodes
8+
9+
{op-system-base-full} compute nodes require an additional usage of `openshift-ansible` to update node binary components. The actual time spent updating {op-system-base} compute nodes should not be significantly different from {op-system-first} compute nodes.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
:_content-type: ASSEMBLY
2+
[id="understanding-openshift-update-duration"]
3+
= Understanding {product-title} update duration
4+
include::_attributes/common-attributes.adoc[]
5+
:context: openshift-update-duration
6+
7+
toc::[]
8+
9+
{product-title} update duration varies based on the deployment topology. This page helps you understand the factors that affect update duration and estimate how long the cluster update takes in your environment.
10+
11+
[id="update-duration-prerequisites"]
12+
== Prerequisites
13+
* You are familiar with xref:../architecture/architecture.adoc#architecture[OpenShift Container Platform architecture] and xref:../updating/understanding-openshift-updates.adoc#understanding-openshift-updates[OpenShift Container Platform updates].
14+
15+
include::modules/update-duration-factors.adoc[leveloffset=+1]
16+
17+
[id="cluster-update-phases"]
18+
== Cluster update phases
19+
In {product-title}, the cluster update happens in two phases:
20+
21+
* Cluster Version Operator (CVO) target update payload deployment
22+
* Machine Config Operator (MCO) node updates
23+
24+
25+
include::modules/update-duration-cvo.adoc[leveloffset=+2]
26+
27+
[role="_additional-resources"]
28+
.Additional resources
29+
30+
* xref:../architecture/architecture-installation.adoc#update-service-overview_architecture-installation[Cluster Version Operator (CVO) overview]
31+
32+
include::modules/update-duration-mco.adoc[leveloffset=+2]
33+
34+
[role="_additional-resources"]
35+
.Additional resources
36+
37+
* xref:../post_installation_configuration/machine-configuration-tasks.adoc#machine-config-overview-post-install-machine-configuration-tasks[Machine config overview]
38+
* xref:../nodes/pods/nodes-pods-configuring.adoc#nodes-pods-configuring-pod-distruption-about_nodes-pods-configuring[Pod disruption budget]
39+
40+
include::modules/update-duration-estimate-cluster-update-time.adoc[leveloffset=+1]
41+
42+
include::modules/update-duration-rhel-nodes.adoc[leveloffset=+1]
43+
44+
[role="_additional-resources"]
45+
.Additional resources
46+
47+
* xref:../updating/updating-cluster-rhel-compute.adoc#updating-cluster-rhel-compute[Updating RHEL compute machines]

0 commit comments

Comments
 (0)