address PauseRack PR review comments

vivekjMSFT · vivekjMSFT · commit 80b44dad59e4 · 2024-09-19T13:59:46.000-07:00
diff --git a/articles/operator-nexus/howto-cluster-runtime-upgrade-with-pauserack-strategy.md b/articles/operator-nexus/howto-cluster-runtime-upgrade-with-pauserack-strategy.md
@@ -1,54 +1,65 @@
 ---
-title: "Azure Operator Nexus: Runtime upgrade with rack pause strategy"
-description: Learn to execute a cluster runtime upgrade for Operator Nexus with a pause rack strategy
+title: "Azure Operator Nexus: Runtime upgrade with PauseRack strategy"
+description: Learn to execute a cluster runtime upgrade for Operator Nexus with a PauseRack strategy
 author: vivekjMSFT
 ms.author: vija
 ms.service: azure-operator-nexus
 ms.topic: how-to
 ms.date: 08/16/2024
 # ms.custom: template-include
 ---
-## Upgrading cluster runtime with a pause rack strategy
+# Upgrading cluster runtime with a PauseRack strategy
 
-This how-to guide explains the steps to execute a cluster runtime upgrade with pasue rack strategy. Executing cluster runtime upgrade with "PauseRack" strategy will update a single rack in a cluster and then pause to wait for confirmation before moving to the next rack. All existing thresholds will still be honoried with pause rack strategy.
+This how-to guide explains the steps to execute a cluster runtime upgrade with PauseRack strategy. Executing cluster runtime upgrade with PauseRack strategy will update a single rack in a cluster and then pause to wait for confirmation before moving to the next rack. All existing thresholds will still be honored.
 
 ## Prerequisites
 
 > [!NOTE]
 > Upgrades with the PauseRack strategy is available starting  API version 2024-06-01-preview.
 
-Please follow the steps mentioned in prerequistie section of [Upgrading cluster runtime from Azure CLI](./howto-cluster-runtime-upgrade.md)
+1. The [Install Azure CLI][installation-instruction] must be installed.
+2. The `networkcloud` CLI extension is required. If the `networkcloud` extension isn't installed, it can be installed following the steps listed [here](https://github.com/MicrosoftDocs/azure-docs-pr/blob/main/articles/operator-nexus/howto-install-cli-extensions.md).
+3. Access to the Azure portal for the target cluster to be upgraded.
+4. You must be logged in to the same subscription as your target cluster via `az login`
+5. Target cluster must be in a running state, with all control plane nodes healthy and 80+% of compute nodes in a running and healthy state.
 
 ## Procedure
 
-1. Enable Rack Pause upgrade strategy on a Nexus cluster
-
-    Example:
+1. Enable PauseRack upgrade strategy on a Nexus cluster
 
     ```azurecli
-    az networkcloud cluster update --name "clusterName" --resource-group "resourceGroupName" --update-strategy \
-        strategy-type="PauseRack" \
-        wait-time-minutes=0
+    az networkcloud cluster update 
+    --name $CLUSTER_NAME \
+    --resource-group $RESOURCE_GROUP \
+    --update-strategy strategy-type="PauseRack" wait-time-minutes=0
     ```
 
-2. Confirm that the cluster resource JSON in the JSON View reflects the rack pause upgrade strategy.
+2. Confirm that the cluster resource JSON in the JSON View reflects the PauseRack upgrade strategy.
 
     ```azurecli
     az networkcloud cluster show --cluster-name "clusterName" --resource-group "resourceGroupName"
     ```
 
-:::image type="content" source="media/runtime-upgrade-cluster-pause-rack-strategy.png" alt-text="Runtime upgrade strategy property details":::
+    ```  
+    "updateStrategy": {
+      "maxUnavailable": 2,
+      "strategyType": "PauseAfterRack",
+      "thresholdType": "PercentSuccess",
+      "thresholdValue": 70,
+      "waitTimeMinutes": 15,
+    }
+    ```
 
-3.Trigger runtime bundle upgrade as usual from Azure portal / CLI. for reference [Upgrading cluster runtime from Azure CLI](./howto-cluster-runtime-upgrade.md)
+3. Trigger runtime bundle upgrade as usual from Azure portal / CLI. For reference [Upgrading cluster runtime from Azure CLI](./howto-cluster-runtime-upgrade.md)
 
-4.Once Rack 1 has completed, the runtime upgrade will pause, awaiting user action to resume the runtime upgrade for Rack 2.
+4. Once Rack 1 completes, the runtime upgrade will be paused, awaiting user action to resume the upgrade for Rack 2.
 
 :::image type="content" source="media/runtime-upgrade-cluster-paused.png" alt-text="Paused Runtime Upgrade":::
 
 > [!NOTE]
 > This message will be available in logs for programtic access, for more details follow [List of logs available for streaming in Azure Operator Nexus](list-logs-available.md)
 
-5.To resume the runtime upgrade, execute the following `az networkcloud` cli command to trigger the continue upgrade version action.
+5. To resume the runtime upgrade, execute the following `az networkcloud` cli command.
 
 ```shell
 az networkcloud cluster continue-update-version \
@@ -57,7 +68,7 @@ az networkcloud cluster continue-update-version \
     --cluster-name=$CLUSTER_NAME
 ```
 
-6.Continue repeating step 5 for each rack until all racks have been upgraded to the latest runtime bundle.
+6. Repeat step 5 for each rack until all racks have been upgraded to the latest runtime bundle.
 
 ## Related content
 
diff --git a/articles/operator-nexus/howto-cluster-runtime-upgrade.md b/articles/operator-nexus/howto-cluster-runtime-upgrade.md
@@ -98,15 +98,20 @@ For more detailed insights on the upgrade progress, the individual BMM in each R
 The following Azure CLI command is used to configure the compute threshold parameters for a runtime upgrade:
 
 ```azurecli
-az networkcloud cluster update --name "<clusterName>" --resource-group "<resourceGroup>" --update-strategy strategy-type="Rack" threshold-type="PercentSuccess" threshold-value="<thresholdValue>" max-unavailable=<maxNodesOffline> wait-time-minutes=<waitTimeBetweenRacks>
+az networkcloud cluster update /
+--name "<clusterName>" /
+--resource-group "<resourceGroup>" /
+--update-strategy strategy-type="Rack" threshold-type="PercentSuccess" /
+threshold-value="<thresholdValue>" max-unavailable=<maxNodesOffline> /
+wait-time-minutes=<waitTimeBetweenRacks>
 ```
 
-Required arguments:
-- strategy-type: Defines the update strategy. In this case, "Rack" means updates occur rack-by-rack. The default value is "Rack"
+Required parameters:
+- strategy-type: Defines the update strategy. In this case, "Rack" means updates occur rack-by-rack. The default value is "Rack".
 - threshold-type: Determines how the threshold should be evaluated, applied in the units defined by the strategy. The default value is "PercentSuccess".
 - threshold-value: The numeric threshold value used to evaluate an update. The default value is 80.
 
-Optional arguments:
+Optional parameters:
 - max-unavailable: The maximum number of worker nodes that can be offline, that is, upgraded rack at a time. The default value is 32767.
 - wait-time-minutes: The delay or waiting period before updating a rack. The default value is 15.
 
@@ -118,21 +123,22 @@ az networkcloud cluster update --name "cluster01" --resource-group "cluster01-rg
 
 Upon successful execution of the command, the updateStrategy values specified will be applied to the cluster:
 
-```  "updateStrategy": {
+```  
+    "updateStrategy": {
       "maxUnavailable": 16,
       "strategyType": "Rack",
       "thresholdType": "PercentSuccess",
       "thresholdValue": 70,
       "waitTimeMinutes": 15,
-    },
+    }
 ```
 
 > [!NOTE]
-> When a threshold value below 100% is set, it’s possible that any unhealthy nodes might not be upgraded, yet the “Cluster” status could still indicate that upgrade was sucessfull. For troubleshooting issues with bare metal machines, please refer to [Troubleshoot Azure Operator Nexus server problems](troubleshoot-reboot-reimage-replace.md)
+> When a threshold value below 100% is set, it’s possible that any unhealthy nodes might not be upgraded, yet the “Cluster” status could still indicate that upgrade was successful. For troubleshooting issues with bare metal machines, please refer to [Troubleshoot Azure Operator Nexus server problems](troubleshoot-reboot-reimage-replace.md)
 
-## Upgrade with PauseRack Strategy
+## Upgrade with Pause Rack Strategy
 
-Starting with API version 2024-06-01-preview, runtime upgrades can be triggered using a "PauseRack" strategy. When you execute a cluster runtime upgrade with the PauseRack" strategy, it will update one rack at a time in the cluster and then pause, awaiting confirmation before proceeding to the next rack. All existing thresholds will continue to be respected with the "PauseRack" strategy. To carry out a cluster runtime upgrade using the "PauseRack" strategy follow the steps outlined in [Upgrading cluster runtime with a pause rack strategy](howto-cluster-runtime-upgrade-with-pauserack-strategy.md)
+Starting with API version 2024-06-01-preview, runtime upgrades can be triggered using a "PauseRack" strategy. When you execute a cluster runtime upgrade with the "PauseRack" strategy, it will update one rack at a time in the cluster and then pause, awaiting confirmation before proceeding to the next rack. All existing thresholds will continue to be respected with the "PauseRack" strategy. To carry out a cluster runtime upgrade using the "PauseRack" strategy follow the steps outlined in [Upgrading cluster runtime with a PauseRack strategy](howto-cluster-runtime-upgrade-with-pauserack-strategy.md)
 
 ## Frequently Asked Questions
 
@@ -162,15 +168,12 @@ Once the cordon and drain process of the tenant cluster node is completed, the u
 
 It's important to note that the Nexus Kubernetes cluster node won't be shut down after the cordon and drain process. The BMH is rebooted with the new image as soon as all the Nexus Kubernetes cluster nodes are cordoned and drained, after 10 minutes if the drain process isn't completed. Additionally, the cordon and drain is not initiated for power-off or restart actions of the BMH; it's exclusively activated only during a runtime upgrade.
 
-It is important to note that following the runtime upgrade, there could be instance where a Nexus Kubernetes Cluster node remains cordoned. For such scenario, you can manually uncordon the node by executing the following commands via(./includes/kubernetes-cluster/cluster-connect.md)
+It is important to note that following the runtime upgrade, there could be instance where a Nexus Kubernetes Cluster node remains cordoned. For such scenario, you can manually uncordon the node by executing the following command
 
-```kubectl get nodes  | grep SchedulingDisabled > /dev/null
-if [ $? -eq 0 ]; then
-for node in $(kubectl get nodes | grep SchedulingDisabled | awk '{print $1}'); do
-    kubectl uncordon $node
-done
-fi
-```
+```azurecli
+az networkcloud baremetalmachine list -g $mrg --subscription $sub --query "sort_by([].{name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,powerState:powerState,tags:tags.Status,machineRoles:join(', ', machineRoles),cordonStatus:cordonStatus,createdAt:systemData.createdAt}, &name)" 
+--output table
 
+```
 <!-- LINKS - External -->
 [installation-instruction]: https://aka.ms/azcli
diff --git a/articles/operator-nexus/media/runtime-upgrade-cluster-pause-rack-strategy.png b/articles/operator-nexus/media/runtime-upgrade-cluster-pause-rack-strategy.png