You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/operator-nexus/howto-cluster-runtime-upgrade.md
+38-71Lines changed: 38 additions & 71 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,7 +17,7 @@ This how-to guide explains the steps for installing the required Azure CLI and e
17
17
## Prerequisites
18
18
19
19
1. The [Install Azure CLI](/cli/azure/install-azure-cli) must be installed.
20
-
2. The `networkcloud` CLI extension is required. If the `networkcloud` extension isn't installed, it can be installed following the steps listed [here](./howto-install-cli-extensions.md)
20
+
2. The `networkcloud` CLI extension is required. If the `networkcloud` extension isn't installed, it can be installed following the steps listed [here](./howto-install-cli-extensions.md).
21
21
3. Access to the Azure portal for the target cluster to be upgraded.
22
22
4. You must be logged in to the same subscription as your target cluster via `az login`
23
23
5. Target cluster must be in a running state, with all control plane nodes healthy and 80+% of compute nodes in a running and healthy state.
@@ -59,42 +59,46 @@ In the output, you can find the `availableUpgradeVersions` property and look at
59
59
60
60
If there are no available cluster upgrades, the list is empty.
61
61
62
-
### Set Deployment Threshold
63
-
64
-
***--update-strategy - The strategy for updating the cluster indicating the allowable compute node failures during bootstrap provisioning.***
62
+
## Configure compute threshold parameters for runtime upgrade using cluster updateStrategy
65
63
66
-
If the customer requests an `update-strategy` threshold that is different from the default of 80%, you can run the following cluster update command.
64
+
The following Azure CLI command is used to configure the compute threshold parameters for a runtime upgrade:
- strategy-type: Defines the update strategy. This can be `"Rack"` (Rack by Rack) OR `"PauseAfterRack"` (Upgrade one rack at a time and then wait for confirmation before proceeding to the next rack. The default value is `Rack`. To carry out a Cluster runtime upgrade using the "PauseRack" strategy follow the steps outlined in [Upgrading cluster runtime with a pause rack strategy](howto-cluster-runtime-upgrade-with-pauserack-strategy.md)
78
+
- threshold-type: Determines how the threshold should be evaluated, applied in the units defined by the strategy. This can be `"PercentSuccess"` OR `"CountSuccess"`. The default value is `PercentSuccess`.
79
+
- threshold-value: The numeric threshold value used to evaluate an update. The default value is `80`.
85
80
81
+
Optional parameters:
82
+
- max-unavailable: The maximum number of worker nodes that can be offline, that is, upgraded rack at a time. The default value is `32767`.
83
+
- wait-time-minutes: The delay or waiting period before updating a rack. The default value is `15`.
86
84
87
85
The following example is for a customer using Rack by Rack strategy with a Percent Success of 60% and a 1-minute pause.
@@ -122,7 +132,9 @@ az networkcloud cluster show -g <CLUSTER_RG> -n <CLUSTER_NAME> --subscription <S
122
132
123
133
In this example, if less than 10 compute nodes being provisioned in a rack fail to provision (on a rack by rack basis), the cluster deployment fails. If 10 or more of the compute nodes are successfully provisioned, cluster deployment moves on to the next rack of compute nodes.
124
134
125
-
***NOTE: `update-strategy` cannot be changed after the cluster runtime upgrade has started.***
135
+
> [!NOTE]
136
+
> ***`update-strategy` cannot be changed after the cluster runtime upgrade has started.***
137
+
> When a threshold value below 100% is set, it’s possible that any unhealthy nodes might not be upgraded, yet the “Cluster” status could still indicate that upgrade was successful. For troubleshooting issues with bare metal machines, please refer to [Troubleshoot Azure Operator Nexus server problems](troubleshoot-reboot-reimage-replace.md)
126
138
127
139
## Upgrading cluster runtime using CLI
128
140
@@ -158,52 +170,7 @@ az networkcloud cluster show --cluster-name "clusterName" --resource-group "reso
158
170
The output should be the target cluster's information and the cluster's detailed status and detail status message should be present.
159
171
For more detailed insights on the upgrade progress, the individual node in each Rack can be checked for status. An example of checking the status is provided in the reference section under [BareMetal Machine roles](./reference-near-edge-baremetal-machine-roles.md).
160
172
161
-
## Configure compute threshold parameters for runtime upgrade using cluster updateStrategy
162
-
163
-
The following Azure CLI command is used to configure the compute threshold parameters for a runtime upgrade:
- strategy-type: Defines the update strategy. In this case, "Rack" means updates occur rack-by-rack. The default value is `Rack`.
176
-
- threshold-type: Determines how the threshold should be evaluated, applied in the units defined by the strategy. The default value is `PercentSuccess`.
177
-
- threshold-value: The numeric threshold value used to evaluate an update. The default value is 80.
178
-
179
-
Optional parameters:
180
-
- max-unavailable: The maximum number of worker nodes that can be offline, that is, upgraded rack at a time. The default value is 32767.
181
-
- wait-time-minutes: The delay or waiting period before updating a rack. The default value is 15.
Upon successful execution of the command, the updateStrategy values specified are applied to the cluster:
190
-
191
-
```
192
-
"updateStrategy": {
193
-
"maxUnavailable": 16,
194
-
"strategyType": "Rack",
195
-
"thresholdType": "PercentSuccess",
196
-
"thresholdValue": 70,
197
-
"waitTimeMinutes": 15,
198
-
}
199
-
```
200
-
201
-
> [!NOTE]
202
-
> When a threshold value below 100% is set, it’s possible that any unhealthy nodes might not be upgraded, yet the “Cluster” status could still indicate that upgrade was successful. For troubleshooting issues with bare metal machines, please refer to [Troubleshoot Azure Operator Nexus server problems](troubleshoot-reboot-reimage-replace.md)
203
-
204
-
## Upgrade with PauseRack strategy
205
173
206
-
Starting with API version 2024-06-01-preview, you can trigger runtime upgrades using a "PauseRack" strategy. When you execute a Cluster runtime upgrade with the PauseRack" strategy, it will update one rack at a time in the Cluster and then stop, awaiting confirmation before proceeding to the next rack. All existing thresholds continue to be respected with the "PauseRack" strategy. To carry out a Cluster runtime upgrade using the "PauseRack" strategy follow the steps outlined in [Upgrading cluster runtime with a pause rack strategy](howto-cluster-runtime-upgrade-with-pauserack-strategy.md)
0 commit comments