Skip to content

Commit bc94216

Browse files
authored
Update howto-cluster-runtime-upgrade.md
acrolinx fixes
1 parent 71ed9fa commit bc94216

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

articles/operator-nexus/howto-cluster-runtime-upgrade.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,8 @@ wait-time-minutes="<waitTimeBetweenRacks>" \
8282
```
8383

8484
Required parameters:
85-
- strategy-type: Defines the update strategy. This can be `Rack` (Rack by Rack) OR `PauseAfterRack` (Pause for user before each Rack starts). The default value is `Rack`. To carry out a Cluster runtime upgrade using the `PauseAfterRack` strategy follow the steps outlined in [Upgrading cluster runtime with a pause rack strategy](howto-cluster-runtime-upgrade-with-pauseafterrack-strategy.md)
86-
- threshold-type: Determines how the threshold should be evaluated, applied in the units defined by the strategy. This can be `PercentSuccess` OR `CountSuccess`. The default value is `PercentSuccess`.
85+
- strategy-type: Defines the update strategy. Setting used are `Rack` (Rack by Rack) OR `PauseAfterRack` (Pause for user before each Rack starts). The default value is `Rack`. To perform a cluster runtime upgrade using the `PauseAfterRack` strategy, follow the steps outlined in [Upgrade Cluster Runtime with PauseAfterRack Strategy](howto-cluster-runtime-upgrade-with-pauseafterrack-strategy.md).
86+
- threshold-type: Determines how the threshold should be evaluated, applied in the units defined by the strategy. Settings used are `PercentSuccess` OR `CountSuccess`. The default value is `PercentSuccess`.
8787
- threshold-value: The numeric threshold value used to evaluate an update. The default value is `80`.
8888

8989
Optional parameters:
@@ -115,7 +115,7 @@ az networkcloud cluster show --name "<CLUSTER>" \
115115
"waitTimeMinutes": 1
116116
```
117117

118-
In this example, if less than 60% of the compute nodes being provisioned in a rack fail to provision (on a Rack by Rack basis), the cluster upgrade will wait indefintely until the condition is met. If 60% or more of the compute nodes are successfully provisioned, cluster deployment moves on to the next rack of compute nodes. If there are too many failures in the rack, the hadware must be repaired before the upgrade can continue.
118+
In this example, if less than 60% of the compute nodes being provisioned in a rack fail to provision (on a Rack by Rack basis), the cluster upgrade waits indefinitely until the condition is met. If 60% or more of the compute nodes are successfully provisioned, cluster deployment moves on to the next rack of compute nodes. If there are too many failures in the rack, the hardware must be repaired before the upgrade can continue.
119119

120120
The following example is for a customer using Rack by Rack strategy with a threshold type CountSuccess of 10 nodes per rack and a 1-minute pause.
121121

@@ -142,11 +142,11 @@ az networkcloud cluster show --name "<CLUSTER>" \
142142
"waitTimeMinutes": 1
143143
```
144144

145-
In this example, if less than 10 compute nodes being provisioned in a rack fail to provision (on a Rack by Rack basis), the cluster upgrade will wait indefintely until the condition is met. If 10 or more of the compute nodes are successfully provisioned, cluster deployment moves on to the next rack of compute nodes. If there are too many failures in the rack, the hadware must be repaired before the upgrade can continue.
145+
In this example, if less than 10 compute nodes being provisioned in a rack fail to provision (on a Rack by Rack basis), the cluster upgrade will wait indefinitely until the condition is met. If 10 or more of the compute nodes are successfully provisioned, cluster deployment moves on to the next rack of compute nodes. If there are too many failures in the rack, the hardware must be repaired before the upgrade can continue.
146146

147147
> [!NOTE]
148148
> ***`update-strategy` cannot be changed after the cluster runtime upgrade has started.***
149-
> When a threshold value below 100% is set, it’s possible that any unhealthy nodes might not be upgraded, yet the Cluster status could still indicate that upgrade was successful. For troubleshooting issues with bare metal machines, please refer to [Troubleshoot Azure Operator Nexus server problems](troubleshoot-reboot-reimage-replace.md)
149+
> When a threshold value below 100% is set, it’s possible that any unhealthy nodes might not be upgraded, yet the "Cluster" status could still indicate that upgrade was successful. For troubleshooting issues with bare metal machines, refer to [Troubleshoot Azure Operator Nexus server problems](troubleshoot-reboot-reimage-replace.md)
150150
151151
## Upgrade cluster runtime using CLI
152152

@@ -161,7 +161,7 @@ az networkcloud cluster update-version --cluster-name "<CLUSTER>" \
161161

162162
The runtime upgrade is a long process. The upgrade first upgrades the management nodes and then sequentially Rack by Rack for the worker nodes.
163163
The upgrade is considered to be finished when 80% of worker nodes per rack and 100% of management nodes are successfully upgraded.
164-
Workloads might be impacted while the worker nodes in a rack are in the process of being upgraded, however workloads in all other racks are not impacted. Consideration of workload placement in light of this implementation design is encouraged.
164+
Workloads might be impacted while the worker nodes in a rack are in the process of being upgraded, however workloads in all other racks aren't impacted. Consideration of workload placement in light of this implementation design is encouraged.
165165

166166
Upgrading all the nodes takes multiple hours, depending upon how many racks exist for the Cluster.
167167
Due to the length of the upgrade process, the Cluster's detail status should be checked periodically for the current state of the upgrade.
@@ -202,7 +202,7 @@ A guide for identifying issues with provisioning worker nodes is provided at [Tr
202202

203203
### Hardware Failure doesn't require Upgrade re-execution
204204

205-
If a hardware failure during an upgrade occurs, the runtime upgrade continues as long as the set thresholds are met for the compute and management/control nodes. Once the machine is fixed or replaced, it gets provisioned with the current platform runtime's OS, which contains the targeted version of the runtime. If a rack was updated before a failure, then the upgraded runtime version would be used when the nodes are reprovisioned. If the rack's spec wasn't updated to the upgraded runtime version before the hardware failure, the machine would be provisioned with the previous runtime version when it is repaired. It will be upgraded along with the rack when the rack starts its upgrade.
205+
If a hardware failure during an upgrade occurs, the runtime upgrade continues as long as the set thresholds are met for the compute and management/control nodes. Once the machine is fixed or replaced, it gets provisioned with the current platform runtime's OS, which contains the targeted version of the runtime. If a rack was updated before a failure, then the upgraded runtime version would be used when the nodes are reprovisioned. If the rack's spec wasn't updated to the upgraded runtime version before the hardware failure, the machine would be provisioned with the previous runtime version when it's repaired. It is upgraded along with the rack when the rack starts its upgrade.
206206
### After a runtime upgrade, the cluster shows "Failed" Provisioning State
207207

208208
During a runtime upgrade, the cluster enters a state of `Upgrading`. If the runtime upgrade fails, the cluster goes into a `Failed` provisioning state. Infrastructure components (e.g the Storage Appliance) may cause failures during the upgrade. In some scenarios, it may be necessary to diagnose the failure with Microsoft support.

0 commit comments

Comments
 (0)