Skip to content

Commit 4905f9e

Browse files
authored
Update howto-cluster-runtime-upgrade-template.md
Resolve comments.
1 parent 2b3fcf6 commit 4905f9e

File tree

1 file changed

+23
-30
lines changed

1 file changed

+23
-30
lines changed

articles/operator-nexus/howto-cluster-runtime-upgrade-template.md

Lines changed: 23 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: how-to
99
ms.custom: azure-operator-nexus, template-include
1010
---
1111

12-
# Cluster Runtime Upgrade Template
12+
# Cluster runtime upgrade template
1313

1414
This how-to guide provides a step-by-step template for upgrading a Nexus Cluster designed to assist users in managing a reproducible end-to-end upgrade through Azure APIs and standard operating procedures. Regular updates are crucial for maintaining system integrity and accessing the latest product improvements.
1515

@@ -111,35 +111,28 @@ If any failures occur, report the <MISE_CID>, <CORRELATION_ID>, status code, and
111111

112112
1. Validate the provisioning and detailed status for the CM and Cluster.
113113

114-
Set up the subscription, CM, and Cluster parameters:
114+
Login to Azure CLI and select or set the `<CUSTOMER_SUB_ID>`:
115115
```
116-
export SUBSCRIPTION_ID=<CUSTOMER_SUB_ID>
117-
export CM_RG=<CM_RG>
118-
export CM_NAME=<CM_NAME>
119-
export CLUSTER_RG=<CLUSTER_RG>
120-
export CLUSTER_NAME=<CLUSTER_NAME>
121-
export CLUSTER_RID=<CLUSTER_RID>
122-
export CLUSTER_MRG=<CLUSTER_MRG>
123-
export THRESHOLD=<DEPLOYMENT_THRESHOLD>
124-
export PAUSE_MINS=<DEPLOYMENT_PAUSE_MINS>
116+
az login
117+
az account set --subscription <CUSTOMER_SUB_ID>
125118
```
126119

127120
Check that the CM is in `Succeeded` for `Provisioning state`:
128121
```
129-
az networkcloud clustermanager show -g $CM_RG --resource-name $CM_NAME --subscription $SUBSCRIPTION_ID -o table
122+
az networkcloud clustermanager show -g <CM_RG> --resource-name <CM_NAME> --subscription <CUSTOMER_SUB_ID> -o table
130123
```
131124

132125
Check the Cluster status `Detailed status` is `Running`:
133126
```
134-
az networkcloud cluster show -g $CLUSTER_RG --resource-name $CLUSTER_NAME --subscription $SUBSCRIPTION_ID -o table
127+
az networkcloud cluster show -g <CLUSTER_RG> --resource-name <CLUSTER_NAME> --subscription <CUSTOMER_SUB_ID> -o table
135128
```
136129

137130
>[!Note]
138131
> If CM `Provisioning state` isn't `Succeeded` and Cluster `Detailed status` isn't `Running` stop the upgrade until issues are resolved.
139132
140133
2. Check the Bare Metal Machine (BMM) status `Detailed status` is `Running`:
141134
```
142-
az networkcloud baremetalmachine list -g $CLUSTER_MRG --subscription $SUBSCRIPTION_ID --query "sort_by([].{name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,cordonStatus:cordonStatus,powerState:powerState,kubernetesVersion:kubernetesVersion,machineClusterVersion:machineClusterVersion,machineRoles:machineRoles| join(', ', @),createdAt:systemData.createdAt}, &name)" -o table
135+
az networkcloud baremetalmachine list -g <CLUSTER_MRG> --subscription <CUSTOMER_SUB_ID> --query "sort_by([].{name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,cordonStatus:cordonStatus,powerState:powerState,kubernetesVersion:kubernetesVersion,machineClusterVersion:machineClusterVersion,machineRoles:machineRoles| join(', ', @),createdAt:systemData.createdAt}, &name)" -o table
143136
```
144137

145138
Validate the following resource states for each BMM (except spare):
@@ -158,8 +151,8 @@ If any failures occur, report the <MISE_CID>, <CORRELATION_ID>, status code, and
158151

159152
3. Collect a profile of the tenant workloads:
160153
```
161-
az networkcloud virtualmachine list --sub $SUBSCRIPTION_ID --query "reverse(sort_by([?clusterId=='$CLUSTER_RID'].{name:name, createdAt:systemData.createdAt, resourceGroup:resourceGroup, powerState:powerState, provisioningState:provisioningState, detailedStatus:detailedStatus,bareMetalMachineId:bareMetalMachineIdi,CPUCount:cpuCores, EmulatorStatus:isolateEmulatorThread}, &createdAt))" -o table
162-
az networkcloud kubernetescluster list --sub $SUBSCRIPTION_ID --query "[?clusterId=='$CLUSTER_RID'].{name:name, resourceGroup:resourceGroup, provisioningState:provisioningState, detailedStatus:detailedStatus, detailedStatusMessage:detailedStatusMessage, createdAt:systemData.createdAt, kubernetesVersion:kubernetesVersion}" -o table
154+
az networkcloud virtualmachine list --sub <CUSTOMER_SUB_ID> --query "reverse(sort_by([?clusterId=='<CLUSTER_RID>'].{name:name, createdAt:systemData.createdAt, resourceGroup:resourceGroup, powerState:powerState, provisioningState:provisioningState, detailedStatus:detailedStatus,bareMetalMachineId:bareMetalMachineIdi,CPUCount:cpuCores, EmulatorStatus:isolateEmulatorThread}, &createdAt))" -o table
155+
az networkcloud kubernetescluster list --sub <CUSTOMER_SUB_ID> --query "[?clusterId=='<CLUSTER_RID>'].{name:name, resourceGroup:resourceGroup, provisioningState:provisioningState, detailedStatus:detailedStatus, detailedStatusMessage:detailedStatusMessage, createdAt:systemData.createdAt, kubernetesVersion:kubernetesVersion}" -o table
163156
```
164157

165158
4. Review Operator Nexus Release notes for required checks and configuration updates not included in this document.
@@ -190,20 +183,20 @@ If `updateStrategy` isn't set, the default values are as follows:
190183

191184
### Set a deployment threshold and wait time different than default
192185
```
193-
az networkcloud cluster update -n $CLUSTER_NAME -g $CLUSTER_RG --update-strategy strategy-type="Rack" threshold-type="PercentSuccess" threshold-value=$THRESHOLD wait-time-minutes=$PAUSE_MINS --subscription $SUBSCRIPTION_ID
186+
az networkcloud cluster update -n <CLUSTER_NAME> -g <CLUSTER_RG> --update-strategy strategy-type="Rack" threshold-type="PercentSuccess" threshold-value=<DEPLOYMENT_THRESHOLD> wait-time-minutes=<DEPLOYMENT_PAUSE_MINS> --subscription <CUSTOMER_SUB_ID>
194187
```
195188
>[!Important]
196189
> If 100% threshold is required, review the BMM status reported during pre-checks and make sure all BMM are healthy before proceeding with the upgrade.
197190
198191
Verify update:
199192
```
200-
az networkcloud cluster show -n $CLUSTER_NAME -g $CLUSTER_RG --subscription $SUBSCRIPTION_ID| grep -A5 updateStrategy
193+
az networkcloud cluster show -n <CLUSTER_NAME> -g <CLUSTER_RG> --subscription <CUSTOMER_SUB_ID>| grep -A5 updateStrategy
201194
"updateStrategy": {
202195
"maxUnavailable": 32767,
203196
"strategyType": "Rack",
204197
"thresholdType": "PercentSuccess",
205-
"thresholdValue": $THRESHOLD,
206-
"waitTimeMinutes": $PAUSE_MINS
198+
"thresholdValue": <DEPLOYMENT_THRESHOLD>,
199+
"waitTimeMinutes": <DEPLOYMENT_PAUSE_MINS>
207200
}
208201
```
209202

@@ -212,7 +205,7 @@ az networkcloud cluster show -n $CLUSTER_NAME -g $CLUSTER_RG --subscription $SUB
212205

213206
To configure strategy to use `PauseAfterRack`:
214207
```
215-
az networkcloud cluster update -n $CLUSTER_NAME -g $CLUSTER_RG --update-strategy strategy-type="PauseAfterRack" wait-time-minutes=0 threshold-type="PercentSuccess" threshold-value=$THRESHOLD --subscription $SUBSCRIPTION_ID
208+
az networkcloud cluster update -n <CLUSTER_NAME> -g <CLUSTER_RG> --update-strategy strategy-type="PauseAfterRack" wait-time-minutes=0 threshold-type="PercentSuccess" threshold-value=<DEPLOYMENT_THRESHOLD> --subscription <CUSTOMER_SUB_ID>
216209
```
217210

218211
Verify update:
@@ -222,15 +215,15 @@ az networkcloud cluster show -g <CLUSTER_RG> -n <CLUSTER_NAME> --subscription <C
222215
"maxUnavailable": 32767,
223216
"strategyType": "PauseAfterRack",
224217
"thresholdType": "PercentSuccess",
225-
"thresholdValue": $THRESHOLD,
218+
"thresholdValue": <DEPLOYMENT_THRESHOLD>,
226219
"waitTimeMinutes": 0
227220
```
228221

229222
### Run upgrade from either portal or cli
230223
* To start upgrade from Azure portal, go to Cluster resource, click `Update`, select <CLUSTER_VERSION>, then click `Update`
231224
* To run upgrade from Azure CLI, run the following command:
232225
```
233-
az networkcloud cluster update-version --subscription $SUBSCRIPTION_ID --cluster-name $CLUSTER_NAME --target-cluster-version $CLUSTER_VERSION --resource-group $CLUSTER_RG --no-wait --debug
226+
az networkcloud cluster update-version --subscription <CUSTOMER_SUB_ID> --cluster-name <CLUSTER_NAME> --target-cluster-version <CLUSTER_VERSION> --resource-group <CLUSTER_RG> --no-wait --debug
234227
```
235228

236229
Gather ASYNC URL and Correlation ID info for further troubleshooting if needed.
@@ -246,19 +239,19 @@ Once a compute Rack meets the success threshold, the upgrade pauses until the us
246239

247240
Use the following command to continue upgrade once a Compute Rack is paused after meeting the deployment threshold for the Rack:
248241
```
249-
az networkcloud cluster continue-update-version -g $CLUSTER_RG -n $CLUSTER_NAME$ --subscription $SUBSCRIPTION_ID
242+
az networkcloud cluster continue-update-version -g <CLUSTER_RG> -n <CLUSTER_NAME> --subscription <CUSTOMER_SUB_ID>
250243
```
251244

252245
### Monitor status of Cluster
253246
```
254-
az networkcloud cluster list -g $CLUSTER_RG --subscription $SUBSCRIPTION_ID -o table
247+
az networkcloud cluster list -g <CLUSTER_RG> --subscription <CUSTOMER_SUB_ID> -o table
255248
```
256249
The Cluster `Detailed status` shows `Running` and the `Detailed status message` shows 'Cluster is up and running.` when the upgrade is complete.
257250

258251
### Monitor status of Bare Metal Machines
259252
```
260-
az networkcloud baremetalmachine list -g $CLUSTER_MRG --subscription $SUBSCRIPTION_ID -o table
261-
az networkcloud baremetalmachine list -g $CLUSTER_MRG --subscription $SUBSCRIPTION_ID --query "sort_by([].{name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,cordonStatus:cordonStatus,powerState:powerState,kubernetesVersion:kubernetesVersion,machineClusterVersion:machineClusterVersion,machineRoles:machineRoles| join(', ', @),createdAt:systemData.createdAt}, &name)" -o table
253+
az networkcloud baremetalmachine list -g <CLUSTER_MRG> --subscription <CUSTOMER_SUB_ID> -o table
254+
az networkcloud baremetalmachine list -g <CLUSTER_MRG> --subscription <CUSTOMER_SUB_ID> --query "sort_by([].{name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,cordonStatus:cordonStatus,powerState:powerState,kubernetesVersion:kubernetesVersion,machineClusterVersion:machineClusterVersion,machineRoles:machineRoles| join(', ', @),createdAt:systemData.createdAt}, &name)" -o table
262255
```
263256

264257
Validate the following states for each BMM (except spare):
@@ -319,12 +312,12 @@ az networkcloud baremetalmachine list -g <CLUSTER_MRG> --subscription <CUSTOMER_
319312
az networkcloud storageappliance list -g <CLUSTER_MRG> --subscription <CUSTOMER_SUB_ID> -o table
320313
321314
# Tenant Workloads
322-
az networkcloud virtualmachine list --sub $SUBSCRIPTION_ID --query "reverse(sort_by([?clusterId=='$CLUSTER_RID'].{name:name, createdAt:systemData.createdAt, resourceGroup:resourceGroup, powerState:powerState, provisioningState:provisioningState, detailedStatus:detailedStatus,bareMetalMachineId:bareMetalMachineIdi,CPUCount:cpuCores, EmulatorStatus:isolateEmulatorThread}, &createdAt))" -o table
323-
az networkcloud kubernetescluster list --sub $SUBSCRIPTION_ID --query "[?clusterId=='$CLUSTER_RID'].{name:name, resourceGroup:resourceGroup, provisioningState:provisioningState, detailedStatus:detailedStatus, detailedStatusMessage:detailedStatusMessage, createdAt:systemData.createdAt, kubernetesVersion:kubernetesVersion}" -o table
315+
az networkcloud virtualmachine list --sub <CUSTOMER_SUB_ID> --query "reverse(sort_by([?clusterId=='<CLUSTER_RID>'].{name:name, createdAt:systemData.createdAt, resourceGroup:resourceGroup, powerState:powerState, provisioningState:provisioningState, detailedStatus:detailedStatus,bareMetalMachineId:bareMetalMachineIdi,CPUCount:cpuCores, EmulatorStatus:isolateEmulatorThread}, &createdAt))" -o table
316+
az networkcloud kubernetescluster list --sub <CUSTOMER_SUB_ID> --query "[?clusterId=='<CLUSTER_RID>'].{name:name, resourceGroup:resourceGroup, provisioningState:provisioningState, detailedStatus:detailedStatus, detailedStatusMessage:detailedStatusMessage, createdAt:systemData.createdAt, kubernetesVersion:kubernetesVersion}" -o table
324317
```
325318

326319
> [!Note]
327-
> IRT validation provides a complete functional test of networking and workloads across all components of the Nexus Instance. Simple validation does not provide functional tesing.
320+
> IRT validation provides a complete functional test of networking and workloads across all components of the Nexus Instance. Simple validation does not provide functional testing.
328321
329322
</details>
330323

0 commit comments

Comments
 (0)