Skip to content

Commit 333507e

Browse files
authored
Merge pull request #81685 from amolnar-rh/TELCODOCS-1861-concept
TELCODOCS-1861: Add concept module for introducing IBGU
2 parents c449556 + 0595545 commit 333507e

File tree

5 files changed

+202
-30
lines changed

5 files changed

+202
-30
lines changed

edge_computing/image_based_upgrade/cnf-understanding-image-based-upgrade.adoc

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -76,12 +76,6 @@ include::modules/cnf-image-based-upgrade.adoc[leveloffset=+1]
7676
7777
* xref:../../edge_computing/image_based_upgrade/cnf-image-based-upgrade-base.adoc#cnf-image-based-upgrade[Performing an image-based upgrade for {sno} clusters with {lcao}]
7878
79-
* xref:../../edge_computing/image_based_upgrade/ztp-image-based-upgrade.adoc#ztp-image-based-upgrade[Performing an image-based upgrade for {sno} clusters using {ztp}]
80-
81-
* xref:../../edge_computing/image_based_upgrade/cnf-image-based-upgrade-base.adoc#cnf-image-based-upgrade-rollback_cnf-non-gitops[Moving to the Rollback stage of the image-based upgrade with {lcao}]
82-
83-
* xref:../../edge_computing/image_based_upgrade/ztp-image-based-upgrade.adoc#ztp-image-based-upgrade-rollback_ztp-gitops[Moving to the Rollback stage of the image-based upgrade with {lcao} and {ztp}]
84-
8579
include::modules/cnf-image-based-upgrade-guidelines.adoc[leveloffset=+1]
8680

8781
[role="_additional-resources"]
@@ -113,8 +107,6 @@ include::modules/cnf-image-based-upgrade-extra-manifests-guide.adoc[leveloffset=
113107

114108
* xref:../../edge_computing/image_based_upgrade/cnf-image-based-upgrade-base.adoc#cnf-image-based-upgrade[Performing an image-based upgrade for {sno} clusters with {lcao}]
115109
116-
* xref:../../edge_computing/image_based_upgrade/ztp-image-based-upgrade.adoc#ztp-image-based-upgrade[Performing an image-based upgrade for {sno} clusters using {ztp}]
117-
118110
* xref:../../edge_computing/ztp-preparing-the-hub-cluster.adoc#ztp-preparing-the-hub-cluster[Preparing the hub cluster for ZTP]
119111
120112
* xref:../../edge_computing/image_based_upgrade/preparing_for_image_based_upgrade/cnf-image-based-upgrade-prep-resources.adoc#cnf-image-based-upgrade-prep-oadp_cnf-non-gitops[Creating ConfigMap objects for the image-based upgrade with {lcao}]

edge_computing/image_based_upgrade/ztp-image-based-upgrade.adoc

Lines changed: 7 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,19 @@ include::_attributes/common-attributes.adoc[]
66

77
toc::[]
88

9-
You can upgrade your managed {sno} cluster with the image-based upgrade through {ztp-first}.
10-
11-
When you deploy the {lcao} on a cluster, an `ImageBasedUpgrade` CR is automatically created.
12-
You update this CR to specify the image repository of the seed image and to move through the different stages.
9+
You can use a single resource on the hub cluster, the `ImageBasedGroupUpgrade` custom resource (CR), to manage an imaged-based upgrade on a selected group of managed clusters through all stages.
10+
{cgu-operator-first} reconciles the `ImageBasedGroupUpgrade` CR and creates the underlying resources to complete the defined stage transitions, either in a manually controlled or a fully automated upgrade flow.
1311

1412
// Lifecycle Agent (LCA)
1513

16-
include::modules/ztp-image-based-upgrade-prep.adoc[leveloffset=+1]
14+
include::modules/ztp-image-based-upgrade-concept.adoc[leveloffset=+1]
1715

1816
[role="_additional-resources"]
1917
.Additional resources
2018

19+
* xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-scenario-3-recovering-expired-certs_dr-recovering-expired-certs[Recovering from expired control plane certificates]
20+
21+
////
2122
* xref:../../edge_computing/ztp-preparing-the-hub-cluster.adoc#ztp-preparing-the-ztp-git-repository-ver-ind_ztp-preparing-the-hub-cluster[Preparing the {ztp} site configuration repository for version independence]
2223
2324
* xref:../../edge_computing/image_based_upgrade/preparing_for_image_based_upgrade/ztp-image-based-upgrade-prep-resources.adoc#ztp-image-based-upgrade-prep-resources[Creating ConfigMap objects for the image-based upgrade with {lcao} using {ztp}]
@@ -29,21 +30,6 @@ include::modules/ztp-image-based-upgrade-prep.adoc[leveloffset=+1]
2930
* xref:../../backup_and_restore/application_backup_and_restore/backing_up_and_restoring/oadp-creating-backup-cr.adoc#oadp-creating-backup-cr-doc[Creating a Backup CR]
3031
3132
* xref:../../backup_and_restore/application_backup_and_restore/backing_up_and_restoring/restoring-applications.adoc#oadp-creating-restore-cr_restoring-applications[Creating a Restore CR]
32-
33-
include::modules/ztp-image-based-upgrade-upgrade.adoc[leveloffset=+1]
34-
35-
[role="_additional-resources"]
36-
.Additional resources
37-
38-
* xref:../../edge_computing/image_based_upgrade/ztp-image-based-upgrade.adoc#ztp-image-based-upgrade-rollback_ztp-gitops[Moving to the Rollback stage of the image-based upgrade with {lcao} and {ztp}]
39-
40-
* xref:../../edge_computing/cnf-talm-for-cluster-upgrades.adoc#talo-policies-concept_cnf-topology-aware-lifecycle-manager[Update policies on managed clusters]
41-
42-
include::modules/ztp-image-based-upgrade-rollback.adoc[leveloffset=+1]
43-
44-
[role="_additional-resources"]
45-
.Additional resources
46-
47-
* xref:../../backup_and_restore/control_plane_backup_and_restore/disaster_recovery/scenario-3-expired-certs.adoc#dr-scenario-3-recovering-expired-certs_dr-recovering-expired-certs[Recovering from expired control plane certificates]
33+
////
4834
4935
include::modules/cnf-image-based-upgrade-troubleshooting.adoc[leveloffset=+1]

modules/cnf-image-based-upgrade-troubleshooting.adoc

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,13 @@
66
[id="cnf-image-based-upgrade-troubleshooting_{context}"]
77
= Troubleshooting image-based upgrades with {lcao}
88

9-
You can encounter issues during the image-based upgrade.
9+
Perform troubleshooting steps on the managed clusters that are affected by an issue.
10+
11+
[IMPORTANT]
12+
====
13+
If you are using the `ImageBasedGroupUpgrade` CR to upgrade your clusters, ensure that the `lcm.openshift.io/ibgu-<stage>-completed or `lcm.openshift.io/ibgu-<stage>-failed` cluster labels are updated properly after performing troubleshooting or recovery steps on the managed clusters.
14+
This ensures that the {cgu-operator} continues to manage the image-based upgrade for the cluster.
15+
====
1016

1117
[id="cnf-image-based-upgrade-troubleshooting-must-gather_{context}"]
1218
== Collecting logs
Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
// Module included in the following assemblies:
2+
// * edge_computing/image-based-upgrade/ztp-image-based-upgrade.adoc
3+
4+
:_mod-docs-content-type: CONCEPT
5+
[id="ztp-image-based-upgrade-concept_{context}"]
6+
= Managing the image-based upgrade at scale using the `ImageBasedGroupUpgrade` CR on the hub
7+
8+
The `ImageBasedGroupUpgrade` CR combines the `ImageBasedUpgrade` and `ClusterGroupUpgrade` APIs.
9+
For example, you can define the cluster selection and rollout strategy with the `ImageBasedGroupUpgrade` API in the same way as the `ClusterGroupUpgrade` API.
10+
The stage transitions are different from the `ImageBasedUpgrade` API.
11+
The `ImageBasedGroupUpgrade` API allows you to combine several stage transitions, also called actions, into one step that share one rollout strategy.
12+
13+
.Example ImageBasedGroupUpgrade.yaml
14+
include::snippets/ibu-ImageBasedGroupUpgrade.adoc[]
15+
16+
[id="ztp-image-based-upgrade-supported-combinations_{context}"]
17+
== Supported action combinations
18+
19+
Actions are the list of stage transitions that {cgu-operator} completes in the steps of an upgrade plan for the selected group of clusters.
20+
Each `action` entry in the `ImageBasedGroupUpgrade` CR is a separate step and a step contains one or several actions that share the same rollout strategy.
21+
You can achieve more control over the rollout strategy for each action by separating actions into steps.
22+
23+
These actions can be combined differently in your upgrade plan and you can add subsequent steps later.
24+
Wait until the previous steps either complete or fail before adding a step to your plan.
25+
The first action of an added step for clusters that failed a previous steps must be either `Abort` or `Rollback`.
26+
27+
[IMPORTANT]
28+
====
29+
You cannot remove actions or steps from an ongoing plan.
30+
====
31+
32+
The following table shows example plans for different levels of control over the rollout strategy:
33+
34+
.Example upgrade plans
35+
[cols=2*, width="100%", options="header"]
36+
|====
37+
|Example plan
38+
|Description
39+
40+
a|[source,yaml]
41+
----
42+
plan:
43+
- actions: ["Prep", "Upgrade", "FinalizeUpgrade"]
44+
rolloutStrategy:
45+
maxConcurrency: 200
46+
timeout: 60
47+
----
48+
|All actions share the same strategy
49+
50+
a|[source,yaml]
51+
----
52+
plan:
53+
- actions: ["Prep", "Upgrade"]
54+
rolloutStrategy:
55+
maxConcurrency: 200
56+
timeout: 60
57+
- actions: ["FinalizeUpgrade"]
58+
rolloutStrategy:
59+
maxConcurrency: 500
60+
timeout: 10
61+
----
62+
|Some actions share the same strategy
63+
64+
a|[source,yaml]
65+
----
66+
plan:
67+
- actions: ["Prep"]
68+
rolloutStrategy:
69+
maxConcurrency: 200
70+
timeout: 60
71+
- actions: ["Upgrade"]
72+
rolloutStrategy:
73+
maxConcurrency: 200
74+
timeout: 20
75+
- actions: ["FinalizeUpgrade"]
76+
rolloutStrategy:
77+
maxConcurrency: 500
78+
timeout: 10
79+
----
80+
|All actions have different strategies
81+
82+
|====
83+
84+
[IMPORTANT]
85+
====
86+
Clusters that fail one of the actions will skip the remaining actions in the same step.
87+
====
88+
89+
The `ImageBasedGroupUpgrade` API accepts the following actions:
90+
91+
`Prep`:: Start preparing the upgrade resources by moving to the `Prep` stage.
92+
`Upgrade`:: Start the upgrade by moving to the `Upgrade` stage.
93+
`FinalizeUpgrade`:: Finalize the upgrade on selected clusters that completed the `Upgrade` action by moving to the `Idle` stage.
94+
`Rollback`:: Start a rollback only on successfully upgraded clusters by moving to the `Rollback` stage.
95+
`FinalizeRollback`:: Finalize the rollback by moving to the `Idle` stage.
96+
`AbortOnFailure`:: Cancel the upgrade on selected clusters that failed the `Prep` or `Upgrade` actions by moving to the `Idle` stage.
97+
`Abort`:: Cancel an ongoing upgrade only on clusters that are not yet upgraded by moving to the `Idle` stage.
98+
99+
The following action combinations are supported. A pair of brackets signifies one step in the `plan` section:
100+
101+
* `["Prep"]`, `["Abort"]`
102+
* `["Prep", "Upgrade", "FinalizeUpgrade"]`
103+
* `["Prep"]`, `["AbortOnFailure"]`, `["Upgrade"]`, `["AbortOnFailure"]`, `["FinalizeUpgrade"]`
104+
* `["Rollback", "FinalizeRollback"]`
105+
106+
Use one of the following combinations when you need to resume or cancel an ongoing upgrade from a completely new `ImageBasedGroupUpgrade` CR:
107+
108+
* `["Upgrade","FinalizeUpgrade"]`
109+
* `["FinalizeUpgrade"]`
110+
* `["FinalizeRollback"]`
111+
* `["Abort"]`
112+
* `["AbortOnFailure"]`
113+
114+
[id="ztp-image-based-upgrade-cluster-labeling_{context}"]
115+
== Labeling for cluster selection
116+
117+
Use the `spec.clusterLabelSelectors` field for initial cluster selection.
118+
In addition, {cgu-operator} labels the managed clusters according to the results of their last stage transition.
119+
120+
When a stage completes or fails, {cgu-operator} marks the relevant clusters with the following labels:
121+
122+
* `lcm.openshift.io/ibgu-<stage>-completed`
123+
* `lcm.openshift.io/ibgu-<stage>-failed`
124+
125+
Use these cluster labels to cancel or roll back an upgrade on a group of clusters after troubleshooting issues that you might encounter.
126+
127+
[IMPORTANT]
128+
====
129+
If you are using the `ImageBasedGroupUpgrade` CR to upgrade your clusters, ensure that the `lcm.openshift.io/ibgu-<stage>-completed` or `lcm.openshift.io/ibgu-<stage>-failed` cluster labels are updated properly after performing troubleshooting or recovery steps on the managed clusters.
130+
This ensures that the {cgu-operator} continues to manage the image-based upgrade for the cluster.
131+
====
132+
133+
For example, if you want to cancel the upgrade for all managed clusters except for clusters that successfully completed the upgrade, you can add an `Abort` action to your plan.
134+
The `Abort` action moves back the `ImageBasedUpgrade` CR to `Idle` stage, which cancels the upgrade on not yet upgraded clusters.
135+
Adding a separate `Abort` action ensures that the {cgu-operator} does not perform the `Abort` action on clusters that have the `lcm.openshift.io/ibgu-upgrade-completed` label.
136+
137+
The cluster labels are removed after successfully canceling or finalizing the upgrade.
138+
139+
[id="ztp-image-based-upgrade-status-monitoring_{context}"]
140+
== Status monitoring
141+
142+
The `ImageBasedGroupUpgrade` CR ensures better monitoring experience with a comprehensive status reporting for all clusters that is aggregated in one place.
143+
You can monitor the following actions:
144+
145+
`status.clusters.completedActions`:: Shows all completed actions defined in the `plan` section.
146+
`status.clusters.currentAction`:: Shows all actions that are currently in progress.
147+
`status.clusters.failedActions`:: Shows all failed actions along with a detailed error message.
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
[source,yaml]
2+
----
3+
apiVersion: lcm.openshift.io/v1alpha1
4+
kind: ImageBasedGroupUpgrade
5+
metadata:
6+
name: example-group-upgrade
7+
namespace: default
8+
spec:
9+
clusterLabelSelectors: <1>
10+
- matchExpressions:
11+
- key: name
12+
operator: In
13+
values:
14+
- spoke1
15+
- spoke4
16+
- spoke6
17+
ibuSpec:
18+
seedImageRef: <2>
19+
image: quay.io/seed/image:4.16.0-rc.1
20+
version: 4.16.0-rc.1
21+
pullSecretRef:
22+
name: "<seed_pull_secret>"
23+
extraManifests: <3>
24+
- name: example-extra-manifests
25+
namespace: openshift-lifecycle-agent
26+
oadpContent: <4>
27+
- name: oadp-cm
28+
namespace: openshift-adp
29+
plan: <5>
30+
- actions: ["Prep", "Upgrade", "FinalizeUpgrade"]
31+
rolloutStrategy:
32+
maxConcurrency: 200 <6>
33+
timeout: 2400 <7>
34+
----
35+
<1> Specify the set of clusters that you want to upgrade.
36+
<2> Defines the target platform version, the seed image to be used, and the secret required to access the image.
37+
<3> (Optional) Specify the list of ConfigMap resources that contain your custom catalog sources to retain after the upgrade, and your extra manifests to apply to the target cluster that are not part of the seed image.
38+
<4> Specify the list of ConfigMap resources that contain the OADP Backup and Restore CRs.
39+
<5> Defines the upgrade plan.
40+
<6> Specify the number of clusters to update in a batch.
41+
<7> Specify the timeout limit to complete the action in minutes.

0 commit comments

Comments
 (0)