You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- <ETCD_LAST_ROTATION_DATE>: Control plane etcd credential last rotation date
50
50
- <ETCD_ROTATION_DAYS>: Control plane etcd credential next rotation period
51
-
- <BMM_ISSUE_LIST>: List of BMM with provisioining issues afer Cluster upgrade is complete
51
+
- <BMM_ISSUE_LIST>: List of BMM with provisioning issues after Cluster upgrade is complete
52
52
53
53
## Pre-Checks
54
54
55
-
1.Very last/next rotation date on etcd credential will not occur during upgrade on each control plane Bare Metal Machine (BMM):
55
+
1.On each control-plane Bare Metal Machine (BMM), verify the next rotation date on `etcd credential` doesn't occur during the upgrade:
56
56
- Check in Azure portal from the following path: `Clusters` -> <CLUSTER_NAME> -> `Resources` -> `Bare Metal Machines`
57
57
- Select each BMM with `control-plane` under the `Role`: <CLUSTER_CONTROL_BMM> -> `JSON View`
58
58
- Validate the `lastRotationTime` and `rotationPeriodDays` under the `etcd credential` section:
@@ -67,7 +67,7 @@ Runtime changes are categorized as follows:
67
67
>[!Important]
68
68
> If the upgrade will occur within three days of the next `etcd credential` rotation (<ETCD_LAST_ROTATION_DATE> + <ETCD_ROTATION_DAYS>), contact Miscrosoft Support to complete a manual rotation before starting the upgrade.
69
69
70
-
2. Validate the provisioning and detailed status for the Cluster Manager (CM) and Cluster.
70
+
2. Validate the provisioning and detailed status for the CM and Cluster.
71
71
72
72
Set up the subscription, CM, and Cluster parameters:
73
73
```
@@ -93,26 +93,27 @@ Runtime changes are categorized as follows:
93
93
```
94
94
95
95
>[!Note]
96
-
> If CM `Provisioning state` is not `Succeeded` and Cluster `Detailed status` is not `Running` stop the upgrade until issues are resolved.
96
+
> If CM `Provisioning state` isn't `Succeeded` and Cluster `Detailed status` isn't `Running` stop the upgrade until issues are resolved.
97
97
98
98
3. Check the Bare Metal Machine status `Detailed status` is `Running`:
99
99
```
100
100
az networkcloud baremetalmachine list -g $CLUSTER_MRG --subscription $SUBSCRIPTION_ID --query "sort_by([].{name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,cordonStatus:cordonStatus,powerState:powerState,kubernetesVersion:kubernetesVersion,machineClusterVersion:machineClusterVersion,machineRoles:machineRoles| join(', ', @),createdAt:systemData.createdAt}, &name)" -o table
101
101
```
102
102
103
-
Check the following for each BMM:
103
+
Validate the following resource states for each BMM:
104
104
- ReadyState: True
105
105
- ProvisioningState: Succeeded
106
106
- DetailedStatus: Provisioned
107
107
- CordonStatus: Uncordoned
108
108
- PowerState: On
109
109
110
-
4. Collect a profile of the tenant workloads pre-upgrade:
110
+
4. Collect a profile of the tenant workloads:
111
111
```
112
112
az networkcloud clustermanager show -g $CM_RG --resource-name $CM_NAME --subscription $SUBSCRIPTION_ID -o table
113
113
az networkcloud virtualmachine list --sub $SUBSCRIPTION_ID --query "reverse(sort_by([?clusterId=='$CLUSTER_RID'].{name:name, createdAt:systemData.createdAt, resourceGroup:resourceGroup, powerState:powerState, provisioningState:provisioningState, detailedStatus:detailedStatus,bareMetalMachineId:bareMetalMachineIdi,CPUCount:cpuCores, EmulatorStatus:isolateEmulatorThread}, &createdAt))" -o table
114
114
az networkcloud kubernetescluster list --sub $SUBSCRIPTION_ID --query "[?clusterId=='$CLUSTER_RID'].{name:name, resourceGroup:resourceGroup, provisioningState:provisioningState, detailedStatus:detailedStatus, detailedStatusMessage:detailedStatusMessage, createdAt:systemData.createdAt, kubernetesVersion:kubernetesVersion}" -o table
115
115
```
116
+
116
117
5. Review Operator Nexus Release notes for required checks and configuration updates not included in this document.
117
118
118
119
## Send notification to Operations of upgrade schedule for the Cluster.
@@ -147,11 +148,11 @@ To help track upgrades, add a tag to the Cluster resource in Azure portal (optio
147
148
## Set deployment strategy and Compute threshold on Cluster if different from default
148
149
The default threshold for the percent of Compute BMM to pass hardware validation and provisioning is 80% with a default pause between Racks of one minute.
149
150
150
-
`update-strategy` can be the following:
151
-
* `Rack` - Upgrade each Rack one at a time and move to the next Rack once the Compute threshold is met for the curren Rack. Pause for <DEPLOYMENT_PAUSE_MINS> before starting next Rack.
151
+
The following settings are available for `update-strategy`:
152
+
* `Rack` - Upgrade each Rack one at a time and move to the next Rack once the Compute threshold is met for the current Rack. Pause for <DEPLOYMENT_PAUSE_MINS> before starting next Rack.
152
153
* `PauseAfterRack` - Wait for user API response to continue to the next Rack once the Compute threshold is met for the current Rack.
153
154
154
-
If `updateStrategy` is not set, the default are as follows:
155
+
If `updateStrategy` isn't set, the default values are as follows:
155
156
```
156
157
"updateStrategy": {
157
158
"maxUnavailable": 32767,
@@ -202,7 +203,7 @@ az networkcloud cluster show -g <CLUSTER_RG> -n <CLUSTER_NAME> --subscription <C
202
203
203
204
## Run upgrade from either portal or cli:
204
205
* To start upgrade from Azure portal, go to Cluster resource, click `Update`, select <CLUSTER_VERSION>, then click `Update`
205
-
* To run upgrade from Azure CLI, run the following:
206
+
* To run upgrade from Azure CLI, run the following command:
@@ -219,7 +220,7 @@ az networkcloud cluster show -g <CLUSTER_RG> -n <CLUSTER_NAME> --subscription <C
219
220
```
220
221
az networkcloud cluster list -g $CLUSTER_RG --subscription $SUBSCRIPTION_ID -o table
221
222
```
222
-
When the upgrade is complete, the Cluster `Detailed status` will move to `Running` state and the `Detailed status message` will show 'Cluster is up and running.`
223
+
The Cluster `Detailed status` shows `Running` and the `Detailed status message` shows 'Cluster is up and running.` when the upgrade is complete.
223
224
224
225
## Monitor status of Bare Metal Machines:
225
226
```
@@ -236,17 +237,17 @@ Validate the following for each BMM:
236
237
- KubernetesVersion: <NEW_VERSION>
237
238
- MachineClusterVersion: <NEXUS_VERSION>
238
239
239
-
For any BMM that does not complete provisioning, and Cluster upgrade is complete, add a Tag to the BMM resource (optional):
240
+
Add a Tag to the BMM resource to track any BMM that fails to complete provisioning (optional):
240
241
```
241
242
|Name | Value |
242
243
|--------------------|-----------------
243
244
|BF provision issue |<DE_ID> |
244
245
```
245
246
246
247
## Continuing upgrade during `PauseAfterRack` strategy:
247
-
Once a compute rack has met the success threshold, the upgrade will move into a pause until the user signals to the operator to continue the upgrade.
248
+
Once a compute Rack meets the success threshold, the upgrade pauses until the user signals to the operator to continue the upgrade.
248
249
249
-
Use the following to continue upgrade once a Compute Rack has met the Compute deployment threshold for the rack:
250
+
Use the following command to continue upgrade once a Compute Rack is paused after meeting the deployment threshold for the Rack:
250
251
```
251
252
az networkcloud cluster continue-update-version -g $CLUSTER_RG -n $CLUSTER_NAME$ --subscription $SUBSCRIPTION_ID
252
253
```
@@ -257,7 +258,7 @@ The following troubleshooting documents can help recover BMM upgrade issues:
If troubleshooting does not resolve the issue, open a Microsoft support ticket:
261
+
If troubleshooting doesn't resolve the issue, open a Microsoft support ticket:
261
262
1. Collect any errors in the Azure CLI output.
262
263
2. Collect Cluster and BMM operation state from Azure portal or Azure CLI.
263
264
3. Create Azure Support Request for any Cluster or BMM upgrade failures and attach any errors along with ASYNC URL, correlation ID, and operation state of the Cluster and BMMs.
@@ -280,7 +281,7 @@ Run the following commands to check the status of the CM, Cluster, and BMM:
280
281
az networkcloud baremetalmachine list -g $CLUSTER_MRG --subscription $SUBSCRIPTION_ID --query "sort_by([].{name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,cordonStatus:cordonStatus,powerState:powerState,kubernetesVersion:kubernetesVersion,machineClusterVersion:machineClusterVersion,machineRoles:machineRoles| join(', ', @),createdAt:systemData.createdAt}, &name)" -o table
281
282
```
282
283
283
-
Check the following for each BMM:
284
+
Validate the following resource states for each BMM:
0 commit comments