|
| 1 | +// Module included in the following assemblies: |
| 2 | +// * edge_computing/image-based-upgrade/ztp-image-based-upgrade.adoc |
| 3 | + |
| 4 | +:_mod-docs-content-type: CONCEPT |
| 5 | +[id="ztp-image-based-upgrade-concept_{context}"] |
| 6 | += Managing the image-based upgrade at scale using the `ImageBasedGroupUpgrade` CR on the hub |
| 7 | + |
| 8 | +The `ImageBasedGroupUpgrade` CR combines the `ImageBasedUpgrade` and `ClusterGroupUpgrade` APIs. |
| 9 | +For example, you can define the cluster selection and rollout strategy with the `ImageBasedGroupUpgrade` API in the same way as the `ClusterGroupUpgrade` API. |
| 10 | +The stage transitions are different from the `ImageBasedUpgrade` API. |
| 11 | +The `ImageBasedGroupUpgrade` API allows you to combine several stage transitions, also called actions, into one step that share one rollout strategy. |
| 12 | + |
| 13 | +.Example ImageBasedGroupUpgrade.yaml |
| 14 | +include::snippets/ibu-ImageBasedGroupUpgrade.adoc[] |
| 15 | + |
| 16 | +[id="ztp-image-based-upgrade-supported-combinations_{context}"] |
| 17 | +== Supported action combinations |
| 18 | + |
| 19 | +Actions are the list of stage transitions that {cgu-operator} completes in the steps of an upgrade plan for the selected group of clusters. |
| 20 | +Each `action` entry in the `ImageBasedGroupUpgrade` CR is a separate step and a step contains one or several actions that share the same rollout strategy. |
| 21 | +You can achieve more control over the rollout strategy for each action by separating actions into steps. |
| 22 | + |
| 23 | +These actions can be combined differently in your upgrade plan and you can add subsequent steps later. |
| 24 | +Wait until the previous steps either complete or fail before adding a step to your plan. |
| 25 | +The first action of an added step for clusters that failed a previous steps must be either `Abort` or `Rollback`. |
| 26 | + |
| 27 | +[IMPORTANT] |
| 28 | +==== |
| 29 | +You cannot remove actions or steps from an ongoing plan. |
| 30 | +==== |
| 31 | + |
| 32 | +The following table shows example plans for different levels of control over the rollout strategy: |
| 33 | + |
| 34 | +.Example upgrade plans |
| 35 | +[cols=2*, width="100%", options="header"] |
| 36 | +|==== |
| 37 | +|Example plan |
| 38 | +|Description |
| 39 | + |
| 40 | +a|[source,yaml] |
| 41 | +---- |
| 42 | +plan: |
| 43 | +- actions: ["Prep", "Upgrade", "FinalizeUpgrade"] |
| 44 | + rolloutStrategy: |
| 45 | + maxConcurrency: 200 |
| 46 | + timeout: 60 |
| 47 | +---- |
| 48 | +|All actions share the same strategy |
| 49 | + |
| 50 | +a|[source,yaml] |
| 51 | +---- |
| 52 | +plan: |
| 53 | +- actions: ["Prep", "Upgrade"] |
| 54 | + rolloutStrategy: |
| 55 | + maxConcurrency: 200 |
| 56 | + timeout: 60 |
| 57 | +- actions: ["FinalizeUpgrade"] |
| 58 | + rolloutStrategy: |
| 59 | + maxConcurrency: 500 |
| 60 | + timeout: 10 |
| 61 | +---- |
| 62 | +|Some actions share the same strategy |
| 63 | + |
| 64 | +a|[source,yaml] |
| 65 | +---- |
| 66 | +plan: |
| 67 | +- actions: ["Prep"] |
| 68 | + rolloutStrategy: |
| 69 | + maxConcurrency: 200 |
| 70 | + timeout: 60 |
| 71 | +- actions: ["Upgrade"] |
| 72 | + rolloutStrategy: |
| 73 | + maxConcurrency: 200 |
| 74 | + timeout: 20 |
| 75 | +- actions: ["FinalizeUpgrade"] |
| 76 | + rolloutStrategy: |
| 77 | + maxConcurrency: 500 |
| 78 | + timeout: 10 |
| 79 | +---- |
| 80 | +|All actions have different strategies |
| 81 | + |
| 82 | +|==== |
| 83 | + |
| 84 | +[IMPORTANT] |
| 85 | +==== |
| 86 | +Clusters that fail one of the actions will skip the remaining actions in the same step. |
| 87 | +==== |
| 88 | + |
| 89 | +The `ImageBasedGroupUpgrade` API accepts the following actions: |
| 90 | + |
| 91 | +`Prep`:: Start preparing the upgrade resources by moving to the `Prep` stage. |
| 92 | +`Upgrade`:: Start the upgrade by moving to the `Upgrade` stage. |
| 93 | +`FinalizeUpgrade`:: Finalize the upgrade on selected clusters that completed the `Upgrade` action by moving to the `Idle` stage. |
| 94 | +`Rollback`:: Start a rollback only on successfully upgraded clusters by moving to the `Rollback` stage. |
| 95 | +`FinalizeRollback`:: Finalize the rollback by moving to the `Idle` stage. |
| 96 | +`AbortOnFailure`:: Cancel the upgrade on selected clusters that failed the `Prep` or `Upgrade` actions by moving to the `Idle` stage. |
| 97 | +`Abort`:: Cancel an ongoing upgrade only on clusters that are not yet upgraded by moving to the `Idle` stage. |
| 98 | + |
| 99 | +The following action combinations are supported. A pair of brackets signifies one step in the `plan` section: |
| 100 | + |
| 101 | +* `["Prep"]`, `["Abort"]` |
| 102 | +* `["Prep", "Upgrade", "FinalizeUpgrade"]` |
| 103 | +* `["Prep"]`, `["AbortOnFailure"]`, `["Upgrade"]`, `["AbortOnFailure"]`, `["FinalizeUpgrade"]` |
| 104 | +* `["Rollback", "FinalizeRollback"]` |
| 105 | + |
| 106 | +Use one of the following combinations when you need to resume or cancel an ongoing upgrade from a completely new `ImageBasedGroupUpgrade` CR: |
| 107 | + |
| 108 | +* `["Upgrade","FinalizeUpgrade"]` |
| 109 | +* `["FinalizeUpgrade"]` |
| 110 | +* `["FinalizeRollback"]` |
| 111 | +* `["Abort"]` |
| 112 | +* `["AbortOnFailure"]` |
| 113 | + |
| 114 | +[id="ztp-image-based-upgrade-cluster-labeling_{context}"] |
| 115 | +== Labeling for cluster selection |
| 116 | + |
| 117 | +Use the `spec.clusterLabelSelectors` field for initial cluster selection. |
| 118 | +In addition, {cgu-operator} labels the managed clusters according to the results of their last stage transition. |
| 119 | + |
| 120 | +When a stage completes or fails, {cgu-operator} marks the relevant clusters with the following labels: |
| 121 | + |
| 122 | +* `lcm.openshift.io/ibgu-<stage>-completed` |
| 123 | +* `lcm.openshift.io/ibgu-<stage>-failed` |
| 124 | + |
| 125 | +Use these cluster labels to cancel or roll back an upgrade on a group of clusters after troubleshooting issues that you might encounter. |
| 126 | + |
| 127 | +[IMPORTANT] |
| 128 | +==== |
| 129 | +If you are using the `ImageBasedGroupUpgrade` CR to upgrade your clusters, ensure that the `lcm.openshift.io/ibgu-<stage>-completed` or `lcm.openshift.io/ibgu-<stage>-failed` cluster labels are updated properly after performing troubleshooting or recovery steps on the managed clusters. |
| 130 | +This ensures that the {cgu-operator} continues to manage the image-based upgrade for the cluster. |
| 131 | +==== |
| 132 | + |
| 133 | +For example, if you want to cancel the upgrade for all managed clusters except for clusters that successfully completed the upgrade, you can add an `Abort` action to your plan. |
| 134 | +The `Abort` action moves back the `ImageBasedUpgrade` CR to `Idle` stage, which cancels the upgrade on not yet upgraded clusters. |
| 135 | +Adding a separate `Abort` action ensures that the {cgu-operator} does not perform the `Abort` action on clusters that have the `lcm.openshift.io/ibgu-upgrade-completed` label. |
| 136 | + |
| 137 | +The cluster labels are removed after successfully canceling or finalizing the upgrade. |
| 138 | + |
| 139 | +[id="ztp-image-based-upgrade-status-monitoring_{context}"] |
| 140 | +== Status monitoring |
| 141 | + |
| 142 | +The `ImageBasedGroupUpgrade` CR ensures better monitoring experience with a comprehensive status reporting for all clusters that is aggregated in one place. |
| 143 | +You can monitor the following actions: |
| 144 | + |
| 145 | +`status.clusters.completedActions`:: Shows all completed actions defined in the `plan` section. |
| 146 | +`status.clusters.currentAction`:: Shows all actions that are currently in progress. |
| 147 | +`status.clusters.failedActions`:: Shows all failed actions along with a detailed error message. |
0 commit comments