Skip to content

Commit 3b7dad2

Browse files
authored
Update howto-cluster-runtime-upgrade.md
API changes using PauseAfterRack from PauseRack. Changes to pause before each rack starts. Renaming PauseRack doc to PauseAfterRack
1 parent 7b82b6b commit 3b7dad2

File tree

1 file changed

+44
-43
lines changed

1 file changed

+44
-43
lines changed

articles/operator-nexus/howto-cluster-runtime-upgrade.md

Lines changed: 44 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
---
22
title: "Azure Operator Nexus: Runtime upgrade"
33
description: Learn to execute a cluster runtime upgrade for Operator Nexus
4-
author: gedrivera
5-
ms.author: eduardori
4+
author: bartpinto
5+
ms.author: bpinto
66
ms.service: azure-operator-nexus
77
ms.custom: azure-operator-nexus, devx-track-azurecli
88
ms.topic: how-to
9-
ms.date: 06/06/2023
9+
ms.date: 02/25/2025
1010
# ms.custom: template-include
1111
---
1212

@@ -16,11 +16,14 @@ This how-to guide explains the steps for installing the required Azure CLI and e
1616

1717
## Prerequisites
1818

19-
- The [Install Azure CLI](/cli/azure/install-azure-cli) must be installed.
20-
- The `networkcloud` CLI extension is required. If the `networkcloud` extension isn't installed, it can be installed following the steps listed [here](./howto-install-cli-extensions.md).
21-
- Access to the Azure portal for the target cluster to be upgraded.
22-
- You must be logged in to the same subscription as your target cluster via `az login`
23-
- Target cluster must be in a running state, with all control plane nodes healthy and 80+% of compute nodes in a running and healthy state.
19+
1. Install the latest version of the [appropriate CLI extensions](howto-install-cli-extensions.md).
20+
1. The latest `networkcloud` CLI extension is required. It can be installed following the steps listed [here](./howto-install-cli-extensions.md).
21+
1. Subscription access to run the Azure Operator Nexus network fabric (NF) and network cloud (NC) CLI extension commands.
22+
1. Collect the following information:
23+
- Subscription ID (`SUBSCRIPTION`)
24+
- Cluster name (`CLUSTER`)
25+
- Resource group (`CLUSTER_RG`)
26+
1. Target cluster must be healthy in a running state, with all control plane nodes healthy.
2427

2528
## Checking current runtime version
2629
Verify current cluster runtime version before upgrade:
@@ -43,9 +46,9 @@ From the **available upgrade versions** tab, we're able to see the different clu
4346
Available upgrades are retrievable via the Azure CLI:
4447

4548
```azurecli
46-
az networkcloud cluster show --name "<clusterName>" /
47-
--resource-group "<resourceGroup>" /
48-
--subscription <subscriptionID>
49+
az networkcloud cluster show --name "<CLUSTER>" \
50+
--resource-group "<CLUSTER_RG>" \
51+
--subscription <SUBSCRIPTION> | grep -A8 availableUpgradeVersions
4952
```
5053

5154
In the output, you can find the `availableUpgradeVersions` property and look at the `targetClusterVersion` field:
@@ -70,17 +73,16 @@ If there are no available cluster upgrades, the list is empty.
7073
The following Azure CLI command is used to configure the compute threshold parameters for a runtime upgrade:
7174

7275
```azurecli
73-
az networkcloud cluster update /
74-
--name "<clusterName>" /
75-
--resource-group "<resourceGroup>" /
76-
--update-strategy strategy-type="<strategyType>" threshold-type="<thresholdType" /
77-
threshold-value="<thresholdValue>" max-unavailable=<maxNodesOffline> /
78-
wait-time-minutes=<waitTimeBetweenRacks> /
79-
--subscription <subscriptionID>
76+
az networkcloud cluster update --name "<CLUSTER>" /
77+
--resource-group "<CLUSTER_RG>" \
78+
--update-strategy strategy-type="<strategyType>" threshold-type="<thresholdType" \
79+
threshold-value="<thresholdValue>" max-unavailable=<maxNodesOffline> \
80+
wait-time-minutes=<waitTimeBetweenRacks> \
81+
--subscription <SUBSCRIPTION>
8082
```
8183

8284
Required parameters:
83-
- strategy-type: Defines the update strategy. This can be `Rack` (Rack by Rack) OR `PauseAfterRack` (Pause for user before each Rack starts). The default value is `Rack`. To carry out a Cluster runtime upgrade using the `PauseAfterRack` strategy follow the steps outlined in [Upgrading cluster runtime with a pause rack strategy](howto-cluster-runtime-upgrade-with-pauserack-strategy.md)
85+
- strategy-type: Defines the update strategy. This can be `Rack` (Rack by Rack) OR `PauseAfterRack` (Pause for user before each Rack starts). The default value is `Rack`. To carry out a Cluster runtime upgrade using the `PauseAfterRack` strategy follow the steps outlined in [Upgrading cluster runtime with a pause rack strategy](howto-cluster-runtime-upgrade-with-pauseafterrack-strategy.md)
8486
- threshold-type: Determines how the threshold should be evaluated, applied in the units defined by the strategy. This can be `PercentSuccess` OR `CountSuccess`. The default value is `PercentSuccess`.
8587
- threshold-value: The numeric threshold value used to evaluate an update. The default value is `80`.
8688

@@ -91,19 +93,19 @@ Optional parameters:
9193
The following example is for a customer using Rack by Rack strategy with a Percent Success of 60% and a 1-minute pause.
9294

9395
```azurecli
94-
az networkcloud cluster update --name "<clusterName>" /
95-
--resource-group "<resourceGroup>" /
96-
--update-strategy strategy-type="Rack" threshold-type="PercentSuccess" /
97-
threshold-value=60 wait-time-minutes=1 /
98-
--subscription <subscriptionID>
96+
az networkcloud cluster update --name "<CLUSTER>" \
97+
--resource-group "<CLUSTER_RG>" \
98+
--update-strategy strategy-type="Rack" threshold-type="PercentSuccess" \
99+
threshold-value=60 wait-time-minutes=1 \
100+
--subscription <SUBSCRIPTION>
99101
```
100102

101103
Verify update:
102104

103105
```
104-
az networkcloud cluster show --resource-group "<resourceGroup>" /
105-
--name "<clusterName>" /
106-
--subscription <subscriptionID>| grep -A5 updateStrategy
106+
az networkcloud cluster show --name "<CLUSTER>" \
107+
--resource-group "<CLUSTER_RG>" \
108+
--subscription <SUBSCRIPTION> | grep -A5 updateStrategy
107109
108110
"updateStrategy": {
109111
"maxUnavailable": 32767,
@@ -118,19 +120,19 @@ In this example, if less than 60% of the compute nodes being provisioned in a ra
118120
The following example is for a customer using Rack by Rack strategy with a threshold type CountSuccess of 10 nodes per rack and a 1-minute pause.
119121

120122
```azurecli
121-
az networkcloud cluster update --name "<clusterName>" /
122-
--resource-group "<resourceGroup>" /
123-
--update-strategy strategy-type="Rack" threshold-type="CountSuccess" /
124-
threshold-value=10 wait-time-minutes=1 /
125-
--subscription <subscriptionID>
123+
az networkcloud cluster update --name "<CLUSTER>" \
124+
--resource-group "<CLUSTER_RG>" \
125+
--update-strategy strategy-type="Rack" threshold-type="CountSuccess" \
126+
threshold-value=10 wait-time-minutes=1 \
127+
--subscription <SUBSCRIPTION>
126128
```
127129

128130
Verify update:
129131

130132
```
131-
az networkcloud cluster show --resource-group "<resourceGroup>" /
132-
--name "<clusterName>" /
133-
--subscription <subscriptionID>| grep -A5 updateStrategy
133+
az networkcloud cluster show --name "<CLUSTER>" \
134+
--resource-group "<CLUSTER_RG>" \
135+
--subscription <SUBSCRIPTION> | grep -A5 updateStrategy
134136
135137
"updateStrategy": {
136138
"maxUnavailable": 32767,
@@ -151,10 +153,10 @@ In this example, if less than 10 compute nodes being provisioned in a rack fail
151153
To perform an upgrade of the runtime, use the following Azure CLI command:
152154

153155
```azurecli
154-
az networkcloud cluster update-version --cluster-name "<clusterName>" /
155-
--target-cluster-version "<versionNumber>" /
156-
--resource-group "<resourceGroupName>" /
157-
--subscription <subscriptionID>
156+
az networkcloud cluster update-version --cluster-name "<CLUSTER>" \
157+
--target-cluster-version "<versionNumber>" \
158+
--resource-group "<CLUSTER_RG>" \
159+
--subscription <SUBSCRIPTION>
158160
```
159161

160162
The runtime upgrade is a long process. The upgrade first upgrades the management nodes and then sequentially Rack by Rack for the worker nodes.
@@ -176,9 +178,9 @@ The Cluster upgrade is complete when detailedStatus is set to `Running` and deta
176178
To view the upgrade status through the Azure CLI, use `az networkcloud cluster show`.
177179

178180
```azurecli
179-
az networkcloud cluster show --cluster-name "<clusterName>" /
180-
--resource-group "<resourceGroupName>" /
181-
--subscription <subscriptionID>
181+
az networkcloud cluster show --cluster-name "<CLUSTER>" \
182+
--resource-group "<CLUSTER_RG>" \
183+
--subscription <SUBSCRIPTION>
182184
```
183185

184186
The output should be the target cluster's information and the cluster's detailed status and detail status message should be present.
@@ -201,7 +203,6 @@ A guide for identifying issues with provisioning worker nodes is provided at [Tr
201203
### Hardware Failure doesn't require Upgrade re-execution
202204

203205
If a hardware failure during an upgrade occurs, the runtime upgrade continues as long as the set thresholds are met for the compute and management/control nodes. Once the machine is fixed or replaced, it gets provisioned with the current platform runtime's OS, which contains the targeted version of the runtime. If a rack was updated before a failure, then the upgraded runtime version would be used when the nodes are reprovisioned. If the rack's spec wasn't updated to the upgraded runtime version before the hardware failure, the machine would be provisioned with the previous runtime version when it is repaired. It will be upgraded along with the rack when the rack starts its upgrade.
204-
205206
### After a runtime upgrade, the cluster shows "Failed" Provisioning State
206207

207208
During a runtime upgrade, the cluster enters a state of `Upgrading`. If the runtime upgrade fails, the cluster goes into a `Failed` provisioning state. Infrastructure components (e.g the Storage Appliance) may cause failures during the upgrade. In some scenarios, it may be necessary to diagnose the failure with Microsoft support.

0 commit comments

Comments
 (0)