Skip to content

Commit 3a53aff

Browse files
Merge pull request #60091 from skopacz1/OSDOCS-2429
2 parents 84923fe + 1039827 commit 3a53aff

20 files changed

+419
-39
lines changed

_topic_maps/_topic_map.yml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -582,8 +582,13 @@ Topics:
582582
- Name: Updating clusters overview
583583
File: index
584584
- Name: Understanding OpenShift updates
585-
File: understanding-openshift-updates
585+
Dir: understanding_updates
586586
Distros: openshift-enterprise
587+
Topics:
588+
- Name: Introduction to OpenShift updates
589+
File: intro-to-updates
590+
- Name: How cluster updates work
591+
File: how-updates-work
587592
- Name: Understanding update channels and releases
588593
File: understanding-upgrade-channels-release
589594
Distros: openshift-enterprise

images/update-runlevels.png

237 KB
Loading

modules/update-common-terms.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
// Module included in the following assemblies:
22
//
3-
// * updating/understanding-openshift-updates.adoc
3+
// * updating/understanding_updates/intro-to-updates.adoc
44

55
:_content-type: REFERENCE
66
[id="update-common-terms_{context}"]
Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * updating/understanding_updates/how-updates-work.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="update-evaluate-availability_{context}"]
7+
= Evaluation of update availability
8+
9+
The Cluster Version Operator (CVO) periodically queries the OpenShift Update Service (OSUS) for the most recent data about update possibilities.
10+
This data is based on the cluster's subscribed channel.
11+
The CVO then saves information about update recommendations into either the `availableUpdates` or `conditionalUpdates` field of its `ClusterVersion` resource.
12+
13+
The CVO periodically checks the conditional updates for update risks.
14+
These risks are conveyed through the data served by the OSUS, which contains information for each version about known issues that might affect a cluster updated to that version.
15+
Most risks are limited to clusters with specific characteristics, such as clusters with a certain size or clusters that are deployed in a particular cloud platform.
16+
17+
The CVO continuously evaluates its cluster characteristics against the conditional risk information for each conditional update. If the CVO finds that the cluster matches the criteria, the CVO stores this information in the `conditionalUpdates` field of its `ClusterVersion` resource.
18+
If the CVO finds that the cluster does not match the risks of an update, or that there are no risks associated with the update, it stores the target version in the `availableUpdates` field of its `ClusterVersion` resource.
19+
20+
The user interface, either the web console or the OpenShift CLI (`oc`), presents this information in sectioned headings to the administrator.
21+
Each *supported but not recommended* update recommendation contains a link to further resources about the risk so that the administrator can make an informed decision about the update.
22+
23+
You can inspect all available updates with the following command:
24+
25+
[source,terminal]
26+
----
27+
$ oc adm upgrade --include-not-recommended
28+
----
29+
30+
The additional `--include-not-recommended` parameter includes updates that are available but not recommended due to a known risk that applies to the cluster.
31+
32+
.Example output
33+
[source,terminal]
34+
----
35+
Cluster version is 4.10.22
36+
37+
Upstream is unset, so the cluster will use an appropriate default.
38+
Channel: fast-4.11 (available channels: candidate-4.10, candidate-4.11, eus-4.10, fast-4.10, fast-4.11, stable-4.10)
39+
40+
Recommended updates:
41+
42+
VERSION IMAGE
43+
4.10.26 quay.io/openshift-release-dev/ocp-release@sha256:e1fa1f513068082d97d78be643c369398b0e6820afab708d26acda2262940954
44+
4.10.25 quay.io/openshift-release-dev/ocp-release@sha256:ed84fb3fbe026b3bbb4a2637ddd874452ac49c6ead1e15675f257e28664879cc
45+
4.10.24 quay.io/openshift-release-dev/ocp-release@sha256:aab51636460b5a9757b736a29bc92ada6e6e6282e46b06e6fd483063d590d62a
46+
4.10.23 quay.io/openshift-release-dev/ocp-release@sha256:e40e49d722cb36a95fa1c03002942b967ccbd7d68de10e003f0baa69abad457b
47+
48+
Supported but not recommended updates:
49+
50+
Version: 4.11.0
51+
Image: quay.io/openshift-release-dev/ocp-release@sha256:300bce8246cf880e792e106607925de0a404484637627edf5f517375517d54a4
52+
Recommended: False
53+
Reason: RPMOSTreeTimeout
54+
Message: Nodes with substantial numbers of containers and CPU contention may not reconcile machine configuration https://bugzilla.redhat.com/show_bug.cgi?id=2111817#c22
55+
----
56+
57+
One way to inspect the underlying availability data created by the CVO is by querying the `ClusterVersion` resource with the following command:
58+
59+
[source,terminal]
60+
----
61+
$ oc get clusterversion version -o json | jq '.status.availableUpdates'
62+
----
63+
64+
.Example output
65+
[source,terminal]
66+
----
67+
[
68+
{
69+
"channels": [
70+
"candidate-4.11",
71+
"candidate-4.12",
72+
"fast-4.11",
73+
"fast-4.12"
74+
],
75+
"image": "quay.io/openshift-release-dev/ocp-release@sha256:400267c7f4e61c6bfa0a59571467e8bd85c9188e442cbd820cc8263809be3775",
76+
"url": "https://access.redhat.com/errata/RHBA-2023:3213",
77+
"version": "4.11.41"
78+
},
79+
...
80+
]
81+
----
82+
83+
A similar command can be used to check conditional updates:
84+
85+
[source,terminal]
86+
----
87+
$ oc get clusterversion version -o json | jq '.status.conditionalUpdates'
88+
----
89+
90+
.Example output
91+
[source,terminal]
92+
----
93+
[
94+
{
95+
"conditions": [
96+
{
97+
"lastTransitionTime": "2023-05-30T16:28:59Z",
98+
"message": "The 4.11.36 release only resolves an installation issue https://issues.redhat.com//browse/OCPBUGS-11663 , which does not affect already running clusters. 4.11.36 does not include fixes delivered in recent 4.11.z releases and therefore upgrading from these versions would cause fixed bugs to reappear. Red Hat does not recommend upgrading clusters to 4.11.36 version for this reason. https://access.redhat.com/solutions/7007136",
99+
"reason": "PatchesOlderRelease",
100+
"status": "False",
101+
"type": "Recommended"
102+
}
103+
],
104+
"release": {
105+
"channels": [...],
106+
"image": "quay.io/openshift-release-dev/ocp-release@sha256:8c04176b771a62abd801fcda3e952633566c8b5ff177b93592e8e8d2d1f8471d",
107+
"url": "https://access.redhat.com/errata/RHBA-2023:1733",
108+
"version": "4.11.36"
109+
},
110+
"risks": [...]
111+
},
112+
...
113+
]
114+
----
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * updating/understanding_updates/how-updates-work.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="update-manifest-application_{context}"]
7+
= Understanding how manifests are applied during an update
8+
9+
Some manifests supplied in a release image must be applied in a certain order because of the dependencies between them.
10+
For example, the `CustomResourceDefinition` resource must be created before the matching custom resources.
11+
Additionally, there is a logical order in which the individual cluster Operators must be updated to minimize disruption in the cluster.
12+
The Cluster Version Operator (CVO) implements this logical order through the concept of Runlevels.
13+
14+
These dependencies are encoded in the filenames of the manifests in the release image:
15+
16+
[source, terminal]
17+
----
18+
0000_<runlevel>_<component>_<manifest-name>.yaml
19+
----
20+
21+
For example:
22+
23+
[source, terminal]
24+
----
25+
0000_03_config-operator_01_proxy.crd.yaml
26+
----
27+
28+
The CVO internally builds a dependency graph for the manifests, where the CVO obeys the following rules:
29+
30+
* During an update, manifests at a lower Runlevel are applied before those at a higher Runlevel.
31+
32+
* Within one Runlevel, manifests for different components can be applied in parallel.
33+
34+
* Within one Runlevel, manifests for a single component are applied in lexicographic order.
35+
36+
The CVO then applies manifests following the generated dependency graph.
37+
38+
[NOTE]
39+
====
40+
For some resource types, the CVO monitors the resource after its manifest is applied, and considers it to be successfully updated only after the resource reaches a stable state.
41+
Achieving this stable state can take some time.
42+
This is especially true for cluster Operators, which might perform their own update actions in the cluster after the CVO deploys their new versions.
43+
While the additional update actions take place, these cluster Operators temporarily set their `Progressing` condition to `True`.
44+
====
45+
46+
The CVO waits until all cluster Operators in the Runlevel meet the following conditions before it proceeds to the next Runlevel:
47+
48+
* The cluster Operators have an `Available=True` condition.
49+
50+
* The cluster Operators have a `Degraded=False` condition.
51+
52+
* The cluster Operators declare they have achieved the desired version in their ClusterOperator resource.
53+
54+
Some actions can take significant time to finish. The CVO waits for the actions to complete in order to ensure the subsequent Runlevels can proceed safely.
55+
The process of applying all manifests is expected to take 60 to 120 minutes in total; see *Understanding {product-title} update duration* for more information about factors that influence update duration.
56+
57+
image::update-runlevels.png[A diagram displaying the sequence of Runlevels and the manifests of components within each level]
58+
59+
In the previous example diagram, the CVO is waiting until all work is completed at Runlevel 20.
60+
The CVO has applied all manifests to the Operators in the Runlevel, but the `kube-apiserver-operator ClusterOperator` performs some actions after its new version was deployed. The `kube-apiserver-operator ClusterOperator` declares this progress through the `Progressing=True` condition and by not declaring the new version as reconciled in its `status.versions`.
61+
The CVO waits until the ClusterOperator reports an acceptable status, and then it will start applying manifests at Runlevel 25.

modules/update-mco-process.adoc

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * updating/understanding_updates/how-updates-work.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="mco-update-process_{context}"]
7+
= Understanding how the Machine Config Operator updates nodes
8+
The Machine Config Operator (MCO) applies a new machine configuration to each control plane node and compute node. During the machine configuration update, control plane nodes and compute nodes are organized into their own machine config pools, where the pools of machines are updated in parallel. The `.spec.maxUnavailable` parameter, which has a default value of `1`, determines how many nodes in a machine config pool can simultaneously undergo the update process.
9+
10+
When the machine configuration update process begins, the MCO checks the amount of currently unavailable nodes in a pool. If there are fewer unavailable nodes than the value of `.spec.maxUnavailable`, the MCO initiates the following sequence of actions on available nodes in the pool:
11+
12+
. Cordon and drain the node
13+
+
14+
[NOTE]
15+
====
16+
When a node is cordoned, workloads cannot be scheduled to it.
17+
====
18+
19+
. Update the system configuration and operating system (OS) of the node
20+
21+
. Reboot the node
22+
23+
. Uncordon the node
24+
25+
A node undergoing this process is unavailable until it is uncordoned and workloads can be scheduled to it again. The MCO begins updating nodes until the number of unavailable nodes is equal to the value of `.spec.maxUnavailable`.
26+
27+
As a node completes its update and becomes available, the number of unavailable nodes in the machine config pool is once again fewer than `.spec.maxUnavailable`. If there are remaining nodes that need to be updated, the MCO initiates the update process on a node until the `.spec.maxUnavailable` limit is once again reached. This process repeats until each control plane node and compute node has been updated.
28+
29+
The following example workflow describes how this process might occur in a machine config pool with 5 nodes, where `.spec.maxUnavailable` is 3 and all nodes are initially available:
30+
31+
. The MCO cordons nodes 1, 2, and 3, and begins to drain them.
32+
33+
. Node 2 finishes draining, reboots, and becomes available again. The MCO cordons node 4 and begins draining it.
34+
35+
. Node 1 finishes draining, reboots, and becomes available again. The MCO cordons node 5 and begins draining it.
36+
37+
. Node 3 finishes draining, reboots, and becomes available again.
38+
39+
. Node 5 finishes draining, reboots, and becomes available again.
40+
41+
. Node 4 finishes draining, reboots, and becomes available again.
42+
43+
Because the update process for each node is independent of other nodes, some nodes in the example above finish their update out of the order in which they were cordoned by the MCO.
44+
45+
You can check the status of the machine configuration update by running the following command:
46+
47+
[source,terminal]
48+
----
49+
$ oc get mcp
50+
----
51+
52+
.Example output
53+
54+
[source,terminal]
55+
----
56+
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
57+
master rendered-master-acd1358917e9f98cbdb599aea622d78b True False False 3 3 3 0 22h
58+
worker rendered-worker-1d871ac76e1951d32b2fe92369879826 False True False 2 1 1 0 22h
59+
----

modules/update-process-workflow.adoc

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * updating/understanding_updates/how-updates-work.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="update-process-workflow_{context}"]
7+
= Update process workflow
8+
9+
The following steps represent a detailed workflow of the {product-title} (OCP) update process:
10+
11+
. The target version is stored in the `spec.desiredUpdate.version` field of the `ClusterVersion` resource, which may be managed through the web console or the CLI.
12+
13+
. The Cluster Version Operator (CVO) detects that the `desiredUpdate` in the `ClusterVersion` resource differs from the current cluster version.
14+
Using graph data from the OpenShift Update Service, the CVO resolves the desired cluster version to a pull spec for the release image.
15+
16+
. The CVO validates the integrity and authenticity of the release image.
17+
Red Hat publishes cryptographically-signed statements about published release images at predefined locations by using image SHA digests as unique and immutable release image identifiers.
18+
The CVO utilizes a list of built-in public keys to validate the presence and signatures of the statement matching the checked release image.
19+
20+
. The CVO creates a job named `version-$version-$hash` in the `openshift-cluster-version` namespace.
21+
This job uses containers that are executing the release image, so the cluster downloads the image through the container runtime.
22+
The job then extracts the manifests and metadata from the release image to a shared volume that is accessible to the CVO.
23+
24+
. The CVO validates the extracted manifests and metadata.
25+
26+
. The CVO checks some preconditions to ensure that no problematic condition is detected in the cluster.
27+
Certain conditions can prevent updates from proceeding.
28+
These conditions are either determined by the CVO itself, or reported by individual cluster Operators that detect some details about the cluster that the Operator considers problematic for the update.
29+
30+
. The CVO records the accepted release in `status.desired` and creates a `status.history` entry about the new update.
31+
32+
. The CVO begins applying the manifests from the release image.
33+
Cluster Operators are updated in separate stages called Runlevels, and the CVO ensures that all Operators in a Runlevel finish updating before it proceeds to the next level.
34+
35+
. Manifests for the CVO itself are applied early in the process.
36+
When the CVO deployment is applied, the current CVO pod terminates, and a CVO pod using the new version starts.
37+
The new CVO proceeds to apply the remaining manifests.
38+
39+
. The update proceeds until the entire control plane is updated to the new version.
40+
Individual cluster Operators might perform update tasks on their domain of the cluster, and while they do so, they report their state through the `Progressing=True` condition.
41+
42+
. The Machine Config Operator (MCO) manifests are applied towards the end of the process.
43+
The updated MCO then begins updating the system configuration and operating system of every node.
44+
Each node might be drained, updated, and rebooted before it starts to accept workloads again.
45+
46+
The cluster reports as updated after the control plane update is finished, usually before all nodes are updated.
47+
After the update, the CVO maintains all cluster resources to match the state delivered in the release image.

modules/update-release-images.adoc

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
// Module included in the following assemblies:
2+
//
3+
// * updating/understanding_updates/how-updates-work.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="update-release-images_{context}"]
7+
= Release images
8+
9+
A release image is the delivery mechanism for a specific {product-title} (OCP) version.
10+
It contains the release metadata, a Cluster Version Operator (CVO) binary matching the release version, every manifest needed to deploy individual OpenShift cluster Operators, and a list of SHA digest-versioned references to all container images that make up this OpenShift version.
11+
12+
You can inspect the content of a specific release image by running the following command:
13+
14+
[source,terminal]
15+
----
16+
$ oc adm release extract <release image>
17+
----
18+
19+
.Example output
20+
[source,terminal]
21+
----
22+
$ oc adm release extract quay.io/openshift-release-dev/ocp-release:4.12.6-x86_64
23+
Extracted release payload from digest sha256:800d1e39d145664975a3bb7cbc6e674fbf78e3c45b5dde9ff2c5a11a8690c87b created at 2023-03-01T12:46:29Z
24+
25+
$ ls
26+
0000_03_authorization-openshift_01_rolebindingrestriction.crd.yaml
27+
0000_03_config-operator_01_proxy.crd.yaml
28+
0000_03_marketplace-operator_01_operatorhub.crd.yaml
29+
0000_03_marketplace-operator_02_operatorhub.cr.yaml
30+
0000_03_quota-openshift_01_clusterresourcequota.crd.yaml <1>
31+
...
32+
0000_90_service-ca-operator_02_prometheusrolebinding.yaml <2>
33+
0000_90_service-ca-operator_03_servicemonitor.yaml
34+
0000_99_machine-api-operator_00_tombstones.yaml
35+
image-references <3>
36+
release-metadata
37+
----
38+
<1> Manifest for `ClusterResourceQuota` CRD, to be applied on Runlevel 03
39+
<2> Manifest for `PrometheusRoleBinding` resource for the `service-ca-operator`, to be applied on Runlevel 90
40+
<3> List of SHA digest-versioned references to all required images

modules/update-service-overview.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
// Module included in the following assemblies:
22
//
33
// * architecture/architecture-installation.adoc
4-
// * updating/understanding-openshift-updates.adoc
4+
// * updating/understanding_updates/intro-to-updates.adoc
55

66
:_content-type: CONCEPT
77
[id="update-service-about_{context}"]

security/container_security/security-hosts-vms.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ include::modules/security-hosts-vms-rhcos.adoc[leveloffset=+1]
2828
* xref:../../installing/install_config/installing-customizing.adoc#installation-special-config-kmod_installing-customizing[Kernel modules]
2929
* xref:../../installing/install_config/installing-customizing.adoc#installation-special-config-encrypt-disk_installing-customizing[Disk encryption]
3030
* xref:../../installing/install_config/installing-customizing.adoc#installation-special-config-chrony_installing-customizing[Chrony time service]
31-
* xref:../../updating/understanding-openshift-updates.adoc#update-service-about_understanding-openshift-updates[About the OpenShift Update Service]
31+
* xref:../../updating/understanding_updates/intro-to-updates.adoc#update-service-about_understanding-openshift-updates[About the OpenShift Update Service]
3232
////
3333
ifndef::openshift-origin[]
3434
* xref:../../installing/installing-fips.adoc#installing-fips[FIPS cryptography]

0 commit comments

Comments
 (0)