Skip to content

Commit db841e1

Browse files
authored
Update howto-cluster-runtime-upgrade-template.md
1 parent f0f1c1c commit db841e1

File tree

1 file changed

+20
-19
lines changed

1 file changed

+20
-19
lines changed

articles/operator-nexus/howto-cluster-runtime-upgrade-template.md

Lines changed: 20 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -43,16 +43,16 @@ Runtime changes are categorized as follows:
4343
- <START_TIME>: Planned start time of upgrade
4444
- /<DURATION/>: Estimated Duration of upgrade
4545
- <DEPLOYMENT_THRESHOLD>: Compute deployment threshold
46-
- <DEPLOYMENT_PAUSE_MINS>: Time to wait before moving to the next rack once the current rack percent of Compute servers complete upgrade
47-
- <NFC_NAME>: Associated NFC
48-
- <CM_NAME>: Associated CM
46+
- <DEPLOYMENT_PAUSE_MINS>: Time to wait before moving to the next Rack once the current Rack meets the deploymen threshold
47+
- <NFC_NAME>: Associated Nework Fabric Controller (NFC)
48+
- <CM_NAME>: Associated Cluster Manager (CM)
4949
- <ETCD_LAST_ROTATION_DATE>: Control plane etcd credential last rotation date
5050
- <ETCD_ROTATION_DAYS>: Control plane etcd credential next rotation period
51-
- <BMM_ISSUE_LIST>: List of BMM with provisioining issues afer Cluster upgrade is complete
51+
- <BMM_ISSUE_LIST>: List of BMM with provisioning issues after Cluster upgrade is complete
5252

5353
## Pre-Checks
5454

55-
1. Very last/next rotation date on etcd credential will not occur during upgrade on each control plane Bare Metal Machine (BMM):
55+
1. On each control-plane Bare Metal Machine (BMM), verify the next rotation date on `etcd credential` doesn't occur during the upgrade:
5656
- Check in Azure portal from the following path: `Clusters` -> <CLUSTER_NAME> -> `Resources` -> `Bare Metal Machines`
5757
- Select each BMM with `control-plane` under the `Role`: <CLUSTER_CONTROL_BMM> -> `JSON View`
5858
- Validate the `lastRotationTime` and `rotationPeriodDays` under the `etcd credential` section:
@@ -67,7 +67,7 @@ Runtime changes are categorized as follows:
6767
>[!Important]
6868
> If the upgrade will occur within three days of the next `etcd credential` rotation (<ETCD_LAST_ROTATION_DATE> + <ETCD_ROTATION_DAYS>), contact Miscrosoft Support to complete a manual rotation before starting the upgrade.
6969
70-
2. Validate the provisioning and detailed status for the Cluster Manager (CM) and Cluster.
70+
2. Validate the provisioning and detailed status for the CM and Cluster.
7171
7272
Set up the subscription, CM, and Cluster parameters:
7373
```
@@ -93,26 +93,27 @@ Runtime changes are categorized as follows:
9393
```
9494
9595
>[!Note]
96-
> If CM `Provisioning state` is not `Succeeded` and Cluster `Detailed status` is not `Running` stop the upgrade until issues are resolved.
96+
> If CM `Provisioning state` isn't `Succeeded` and Cluster `Detailed status` isn't `Running` stop the upgrade until issues are resolved.
9797
9898
3. Check the Bare Metal Machine status `Detailed status` is `Running`:
9999
```
100100
az networkcloud baremetalmachine list -g $CLUSTER_MRG --subscription $SUBSCRIPTION_ID --query "sort_by([].{name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,cordonStatus:cordonStatus,powerState:powerState,kubernetesVersion:kubernetesVersion,machineClusterVersion:machineClusterVersion,machineRoles:machineRoles| join(', ', @),createdAt:systemData.createdAt}, &name)" -o table
101101
```
102102
103-
Check the following for each BMM:
103+
Validate the following resource states for each BMM:
104104
- ReadyState: True
105105
- ProvisioningState: Succeeded
106106
- DetailedStatus: Provisioned
107107
- CordonStatus: Uncordoned
108108
- PowerState: On
109109
110-
4. Collect a profile of the tenant workloads pre-upgrade:
110+
4. Collect a profile of the tenant workloads:
111111
```
112112
az networkcloud clustermanager show -g $CM_RG --resource-name $CM_NAME --subscription $SUBSCRIPTION_ID -o table
113113
az networkcloud virtualmachine list --sub $SUBSCRIPTION_ID --query "reverse(sort_by([?clusterId=='$CLUSTER_RID'].{name:name, createdAt:systemData.createdAt, resourceGroup:resourceGroup, powerState:powerState, provisioningState:provisioningState, detailedStatus:detailedStatus,bareMetalMachineId:bareMetalMachineIdi,CPUCount:cpuCores, EmulatorStatus:isolateEmulatorThread}, &createdAt))" -o table
114114
az networkcloud kubernetescluster list --sub $SUBSCRIPTION_ID --query "[?clusterId=='$CLUSTER_RID'].{name:name, resourceGroup:resourceGroup, provisioningState:provisioningState, detailedStatus:detailedStatus, detailedStatusMessage:detailedStatusMessage, createdAt:systemData.createdAt, kubernetesVersion:kubernetesVersion}" -o table
115115
```
116+
116117
5. Review Operator Nexus Release notes for required checks and configuration updates not included in this document.
117118
118119
## Send notification to Operations of upgrade schedule for the Cluster.
@@ -147,11 +148,11 @@ To help track upgrades, add a tag to the Cluster resource in Azure portal (optio
147148
## Set deployment strategy and Compute threshold on Cluster if different from default
148149
The default threshold for the percent of Compute BMM to pass hardware validation and provisioning is 80% with a default pause between Racks of one minute.
149150
150-
`update-strategy` can be the following:
151-
* `Rack` - Upgrade each Rack one at a time and move to the next Rack once the Compute threshold is met for the curren Rack. Pause for <DEPLOYMENT_PAUSE_MINS> before starting next Rack.
151+
The following settings are available for `update-strategy`:
152+
* `Rack` - Upgrade each Rack one at a time and move to the next Rack once the Compute threshold is met for the current Rack. Pause for <DEPLOYMENT_PAUSE_MINS> before starting next Rack.
152153
* `PauseAfterRack` - Wait for user API response to continue to the next Rack once the Compute threshold is met for the current Rack.
153154
154-
If `updateStrategy` is not set, the default are as follows:
155+
If `updateStrategy` isn't set, the default values are as follows:
155156
```
156157
"updateStrategy": {
157158
"maxUnavailable": 32767,
@@ -202,7 +203,7 @@ az networkcloud cluster show -g <CLUSTER_RG> -n <CLUSTER_NAME> --subscription <C
202203
203204
## Run upgrade from either portal or cli:
204205
* To start upgrade from Azure portal, go to Cluster resource, click `Update`, select <CLUSTER_VERSION>, then click `Update`
205-
* To run upgrade from Azure CLI, run the following:
206+
* To run upgrade from Azure CLI, run the following command:
206207
```
207208
az networkcloud cluster update-version --subscription $SUBSCRIPTION_ID --cluster-name $CLUSTER_NAME --target-cluster-version $CLUSTER_VERSION --resource-group $CLUSTER_RG --no-wait --debug
208209
```
@@ -219,7 +220,7 @@ az networkcloud cluster show -g <CLUSTER_RG> -n <CLUSTER_NAME> --subscription <C
219220
```
220221
az networkcloud cluster list -g $CLUSTER_RG --subscription $SUBSCRIPTION_ID -o table
221222
```
222-
When the upgrade is complete, the Cluster `Detailed status` will move to `Running` state and the `Detailed status message` will show 'Cluster is up and running.`
223+
The Cluster `Detailed status` shows `Running` and the `Detailed status message` shows 'Cluster is up and running.` when the upgrade is complete.
223224
224225
## Monitor status of Bare Metal Machines:
225226
```
@@ -236,17 +237,17 @@ Validate the following for each BMM:
236237
- KubernetesVersion: <NEW_VERSION>
237238
- MachineClusterVersion: <NEXUS_VERSION>
238239
239-
For any BMM that does not complete provisioning, and Cluster upgrade is complete, add a Tag to the BMM resource (optional):
240+
Add a Tag to the BMM resource to track any BMM that fails to complete provisioning (optional):
240241
```
241242
|Name | Value |
242243
|--------------------|-----------------
243244
|BF provision issue |<DE_ID> |
244245
```
245246
246247
## Continuing upgrade during `PauseAfterRack` strategy:
247-
Once a compute rack has met the success threshold, the upgrade will move into a pause until the user signals to the operator to continue the upgrade.
248+
Once a compute Rack meets the success threshold, the upgrade pauses until the user signals to the operator to continue the upgrade.
248249
249-
Use the following to continue upgrade once a Compute Rack has met the Compute deployment threshold for the rack:
250+
Use the following command to continue upgrade once a Compute Rack is paused after meeting the deployment threshold for the Rack:
250251
```
251252
az networkcloud cluster continue-update-version -g $CLUSTER_RG -n $CLUSTER_NAME$ --subscription $SUBSCRIPTION_ID
252253
```
@@ -257,7 +258,7 @@ The following troubleshooting documents can help recover BMM upgrade issues:
257258
- [BMM Degraded Status](troubleshoot-bare-metal-machine-degraded.md)
258259
- [BMM Warning Status](troubleshoot-bare-metal-machine-warning.md)
259260
260-
If troubleshooting does not resolve the issue, open a Microsoft support ticket:
261+
If troubleshooting doesn't resolve the issue, open a Microsoft support ticket:
261262
1. Collect any errors in the Azure CLI output.
262263
2. Collect Cluster and BMM operation state from Azure portal or Azure CLI.
263264
3. Create Azure Support Request for any Cluster or BMM upgrade failures and attach any errors along with ASYNC URL, correlation ID, and operation state of the Cluster and BMMs.
@@ -280,7 +281,7 @@ Run the following commands to check the status of the CM, Cluster, and BMM:
280281
az networkcloud baremetalmachine list -g $CLUSTER_MRG --subscription $SUBSCRIPTION_ID --query "sort_by([].{name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,cordonStatus:cordonStatus,powerState:powerState,kubernetesVersion:kubernetesVersion,machineClusterVersion:machineClusterVersion,machineRoles:machineRoles| join(', ', @),createdAt:systemData.createdAt}, &name)" -o table
281282
```
282283
283-
Check the following for each BMM:
284+
Validate the following resource states for each BMM:
284285
- ReadyState: True
285286
- ProvisioningState: Succeeded
286287
- DetailedStatus: Provisioned

0 commit comments

Comments
 (0)