Skip to content

Commit 6c8e422

Browse files
committed
TELCODOCS-949 TALM 4.13 updates
1 parent ef450b1 commit 6c8e422

6 files changed

+130
-32
lines changed

modules/cnf-topology-aware-lifecycle-manager-about-cgu-crs.adoc

Lines changed: 36 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ After {cgu-operator} completes a cluster update, the cluster does not update aga
5555
* The `clusters` field specifies a list of clusters to update.
5656
* The `canaries` field specifies the clusters for canary updates.
5757
* The `maxConcurrency` field specifies the number of clusters to update in a batch.
58+
* The `actions` field specifies `beforeEnable` actions that {cgu-operator} takes as it begins the update process, and `afterCompletion` actions that {cgu-operator} takes as it completes policy remediation for each cluster.
5859

5960
You can use the `clusters`, `clusterLabelSelector`, and `clusterSelector` fields together to create a combined list of clusters.
6061

@@ -76,43 +77,49 @@ metadata:
7677
resourceVersion: '40451823'
7778
uid: cca245a5-4bca-45fa-89c0-aa6af81a596c
7879
Spec:
79-
actions:
80-
afterCompletion:
80+
actions:
81+
afterCompletion: <1>
82+
addClusterLabels:
83+
upgrade-done: ""
84+
deleteClusterLabels:
85+
upgrade-running: ""
8186
deleteObjects: true
82-
beforeEnable: {}
87+
beforeEnable: <2>
88+
addClusterLabels:
89+
upgrade-running: ""
8390
backup: false
84-
clusters: <1>
91+
clusters: <3>
8592
- spoke1
86-
enable: false <2>
87-
managedPolicies: <3>
93+
enable: false <4>
94+
managedPolicies: <5>
8895
- talm-policy
8996
preCaching: false
90-
remediationStrategy: <4>
91-
canaries: <5>
97+
remediationStrategy: <6>
98+
canaries: <7>
9299
- spoke1
93-
maxConcurrency: 2 <6>
100+
maxConcurrency: 2 <8>
94101
timeout: 240
95-
clusterLabelSelectors: <7>
102+
clusterLabelSelectors: <9>
96103
- matchExpressions:
97104
- key: label1
98105
operator: In
99106
values:
100107
- value1a
101108
- value1b
102-
batchTimeoutAction: <8>
103-
status: <9>
109+
batchTimeoutAction: <10>
110+
status: <11>
104111
computedMaxConcurrency: 2
105112
conditions:
106113
- lastTransitionTime: '2022-11-18T16:27:15Z'
107114
message: All selected clusters are valid
108115
reason: ClusterSelectionCompleted
109116
status: 'True'
110-
type: ClustersSelected <10>
117+
type: ClustersSelected <12>
111118
- lastTransitionTime: '2022-11-18T16:27:15Z'
112119
message: Completed validation
113120
reason: ValidationCompleted
114121
status: 'True'
115-
type: Validated <11>
122+
type: Validated <13>
116123
- lastTransitionTime: '2022-11-18T16:37:16Z'
117124
message: Not enabled
118125
reason: NotEnabled
@@ -129,17 +136,19 @@ status: <9>
129136
- spoke3
130137
status:
131138
----
132-
<1> Defines the list of clusters to update.
133-
<2> The `enable` field is set to `false`.
134-
<3> Lists the user-defined set of policies to remediate.
135-
<4> Defines the specifics of the cluster updates.
136-
<5> Defines the clusters for canary updates.
137-
<6> Defines the maximum number of concurrent updates in a batch. The number of remediation batches is the number of canary clusters, plus the number of clusters, except the canary clusters, divided by the maxConcurrency value. The clusters that are already compliant with all the managed policies are excluded from the remediation plan.
138-
<7> Displays the parameters for selecting clusters.
139-
<8> Controls what happens if a batch times out. Possible values are `abort` or `continue`. If unspecified, the default is `continue`.
140-
<9> Displays information about the status of the updates.
141-
<10> The `ClustersSelected` condition shows that all selected clusters are valid.
142-
<11> The `Validated` condition shows that all selected clusters have been validated.
139+
<1> Specifies the action that {cgu-operator} takes when it completes policy remediation for each cluster.
140+
<2> Specifies the action that {cgu-operator} takes as it begins the update process.
141+
<3> Defines the list of clusters to update.
142+
<4> The `enable` field is set to `false`.
143+
<5> Lists the user-defined set of policies to remediate.
144+
<6> Defines the specifics of the cluster updates.
145+
<7> Defines the clusters for canary updates.
146+
<8> Defines the maximum number of concurrent updates in a batch. The number of remediation batches is the number of canary clusters, plus the number of clusters, except the canary clusters, divided by the `maxConcurrency` value. The clusters that are already compliant with all the managed policies are excluded from the remediation plan.
147+
<9> Displays the parameters for selecting clusters.
148+
<10> Controls what happens if a batch times out. Possible values are `abort` or `continue`. If unspecified, the default is `continue`.
149+
<11> Displays information about the status of the updates.
150+
<12> The `ClustersSelected` condition shows that all selected clusters are valid.
151+
<13> The `Validated` condition shows that all selected clusters have been validated.
143152

144153
[NOTE]
145154
====
@@ -168,7 +177,8 @@ Policies are missing or invalid, or an invalid platform image has been specified
168177
[id="precaching_{context}"]
169178
== Pre-caching
170179

171-
Clusters might have limited bandwidth to access the container image registry, which can cause a timeout before the updates are completed. On {sno} clusters, you can use pre-caching to avoid this. The container image pre-caching starts when you create a `ClusterGroupUpgrade` CR with the `preCaching` field set to `true`.
180+
Clusters might have limited bandwidth to access the container image registry, which can cause a timeout before the updates are completed. On {sno} clusters, you can use pre-caching to avoid this. The container image pre-caching starts when you create a `ClusterGroupUpgrade` CR with the `preCaching` field set to `true`.
181+
{cgu-operator} compares the available disk space with the estimated {product-title} image size to ensure that there is enough space. If a cluster has insufficient space, {cgu-operator} cancels pre-caching for that cluster and does not remediate policies on it.
172182

173183
{cgu-operator} uses the `PrecacheSpecValid` condition to report status information as follows:
174184

modules/cnf-topology-aware-lifecycle-manager-autocreate-cgu-cr-ztp.adoc

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,9 @@
88

99
{cgu-operator} has a controller called `ManagedClusterForCGU` that monitors the `Ready` state of the `ManagedCluster` CRs on the hub cluster and creates the `ClusterGroupUpgrade` CRs for ZTP (zero touch provisioning).
1010

11-
For any managed cluster in the `Ready` state without a "ztp-done" label applied, the `ManagedClusterForCGU` controller automatically creates a `ClusterGroupUpgrade` CR in the `ztp-install` namespace with its associated {rh-rhacm} policies that are created during the ZTP process. {cgu-operator} then remediates the set of configuration policies that are listed in the auto-created `ClusterGroupUpgrade` CR to push the configuration CRs to the managed cluster.
11+
For any managed cluster in the `Ready` state without a `ztp-done` label applied, the `ManagedClusterForCGU` controller automatically creates a `ClusterGroupUpgrade` CR in the `ztp-install` namespace with its associated {rh-rhacm} policies that are created during the ZTP process. {cgu-operator} then remediates the set of configuration policies that are listed in the auto-created `ClusterGroupUpgrade` CR to push the configuration CRs to the managed cluster.
1212

13-
[NOTE]
14-
====
15-
If the managed cluster has no bound policies when the cluster becomes `Ready`, no `ClusterGroupUpgrade` CR is created.
16-
====
13+
If there are no policies for the managed cluster at the time when the cluster becomes `Ready`, a `ClusterGroupUpgrade` CR with no policies is created. Upon completion of the `ClusterGroupUpgrade` the managed cluster is labeled as `ztp-done`. If there are policies that you want to apply for that managed cluster, manually create a `ClusterGroupUpgrade` as a day-2 operation.
1714

1815
.Example of an auto-created `ClusterGroupUpgrade` CR for ZTP
1916

modules/cnf-topology-aware-lifecycle-manager-precache-feature.adoc

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@
88

99
For {sno}, the pre-cache feature allows the required container images to be present on the spoke cluster before the update starts.
1010

11+
[NOTE]
12+
====
13+
For pre-caching, {cgu-operator} uses the `spec.remediationStrategy.timeout` value from the `ClusterGroupUpgrade` CR. You must set a `timeout` value that allows sufficient time for the pre-caching job to complete. When you enable the `ClusterGroupUpgrade` CR after pre-caching has completed, you can change the `timeout` value to a duration that is appropriate for the update.
14+
====
15+
1116
.Prerequisites
1217

1318
* Install the {cgu-operator-first}.
Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
// Module included in the following assemblies:
2+
// Epic CNF-6848 (4.13), Story TELCODOCS-949
3+
// * scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc
4+
5+
:_content-type: CONCEPT
6+
[id="talo-precache-feature-image-filter_{context}"]
7+
= Using the container image pre-cache filter
8+
9+
The pre-cache feature typically downloads more images than a cluster needs for an update. You can control which pre-cache images are downloaded to a cluster. This decreases download time, and saves bandwidth and storage.
10+
11+
You can see a list of all images to be downloaded using the following command:
12+
13+
[source,terminal]
14+
----
15+
$ oc adm release info <ocp-version>
16+
----
17+
18+
The following `ConfigMap` example shows how you can exclude images using the `excludePrecachePatterns` field.
19+
20+
[source,yaml]
21+
----
22+
apiVersion: v1
23+
kind: ConfigMap
24+
metadata:
25+
name: cluster-group-upgrade-overrides
26+
data:
27+
excludePrecachePatterns: |
28+
azure <1>
29+
aws
30+
vsphere
31+
alibaba
32+
----
33+
<1> {cgu-operator} excludes all images with names that include any of the patterns listed here.

modules/cnf-topology-aware-lifecycle-manager-troubleshooting.adoc

Lines changed: 52 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -442,4 +442,55 @@ This may be because:
442442
* The CGU was run too soon after a policy was created or updated.
443443
* The remediation of a policy affects the compliance of subsequent policies in the `ClusterGroupUpgrade` CR.
444444

445-
Resolution:: Create a new and apply `ClusterGroupUpdate` CR with the same specification .
445+
Resolution:: Create and apply a new `ClusterGroupUpdate` CR with the same specification.
446+
447+
[discrete]
448+
[id="talo-troubleshooting-auto-create-policies_{context}"]
449+
=== Auto-created `ClusterGroupUpgrade` CR in the ZTP workflow has no managed policies
450+
451+
Issue:: If there are no policies for the managed cluster when the cluster becomes `Ready`, a `ClusterGroupUpgrade` CR with no policies is auto-created.
452+
Upon completion of the `ClusterGroupUpgrade` CR, the managed cluster is labeled as `ztp-done`.
453+
If the `PolicyGenTemplate` CRs were not pushed to the Git repository within the required time after `SiteConfig` resources were pushed, this might result in no policies being available for the target cluster when the cluster became `Ready`.
454+
455+
Resolution:: Verify that the policies you want to apply are available on the hub cluster, then create a `ClusterGroupUpgrade` CR with the required policies.
456+
457+
You can either manually create the `ClusterGroupUpgrade` CR or trigger auto-creation again. To trigger auto-creation of the `ClusterGroupUpgrade` CR, remove the `ztp-done` label from the cluster and delete the empty `ClusterGroupUpgrade` CR that was previously created in the `zip-install` namespace.
458+
459+
[discrete]
460+
[id="talo-troubleshooting-pre-cache-failed_{context}"]
461+
=== Pre-caching has failed
462+
463+
Issue:: Pre-caching might fail for one of the following reasons:
464+
* There is not enough free space on the node.
465+
* For a disconnected environment, the pre-cache image has not been properly mirrored.
466+
* There was an issue when creating the pod.
467+
468+
Resolution::
469+
. To check if pre-caching has failed due to insufficient space, check the log of the pre-caching pod in the node.
470+
.. Find the name of the pod using the following command:
471+
+
472+
[source,terminal]
473+
----
474+
$ oc get pods -n openshift-talo-pre-cache
475+
----
476+
+
477+
.. Check the logs to see if the error is related to insufficient space using the following command:
478+
+
479+
[source,terminal]
480+
----
481+
$ oc logs -n openshift-talo-pre-cache <pod name>
482+
----
483+
+
484+
. If there is no log, check the pod status using the following command:
485+
+
486+
[source,terminal]
487+
----
488+
$ oc describe pod -n openshift-talo-pre-cache <pod name>
489+
----
490+
+
491+
. If the pod does not exist, check the job status to see why it could not create a pod using the following command:
492+
+
493+
[source,terminal]
494+
----
495+
$ oc describe job -n openshift-talo-pre-cache pre-cache
496+
----

scalability_and_performance/cnf-talm-for-cluster-upgrades.adoc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ include::modules/cnf-topology-aware-lifecycle-manager-backup-recovery.adoc[level
3939

4040
include::modules/cnf-topology-aware-lifecycle-manager-precache-concept.adoc[leveloffset=+1]
4141

42+
include::modules/cnf-topology-aware-lifecycle-manager-precache-image-filter.adoc[leveloffset=+2]
43+
4244
include::modules/cnf-topology-aware-lifecycle-manager-precache-feature.adoc[leveloffset=+2]
4345

4446
include::modules/cnf-topology-aware-lifecycle-manager-troubleshooting.adoc[leveloffset=+1]

0 commit comments

Comments
 (0)