|
| 1 | +--- |
| 2 | +title: "Azure Operator Nexus: Fabric runtime upgrade template" |
| 3 | +description: Learn the process for upgrading Fabric for Operator Nexus with step-by-step parameterized template. |
| 4 | +author: bartpinto |
| 5 | +ms.author: bpinto |
| 6 | +ms.service: azure-operator-nexus |
| 7 | +ms.date: 04/23/2025 |
| 8 | +ms.topic: how-to |
| 9 | +ms.custom: azure-operator-nexus, template-include |
| 10 | +--- |
| 11 | + |
| 12 | +# Fabric runtime upgrade template |
| 13 | + |
| 14 | +This how-to guide provides a step-by-step template for upgrading a Nexus Fabric designed to assist users in managing a reproducible end-to-end upgrade through Azure APIs and standard operating procedures. Regular updates are crucial for maintaining system integrity and accessing the latest product improvements. |
| 15 | + |
| 16 | +## Overview |
| 17 | + |
| 18 | +**Runtime bundle components**: These components require operator consent for upgrades that may affect traffic behavior or necessitate device reboots. The network fabric's design allows for updates to be applied while maintaining continuous data traffic flow. |
| 19 | + |
| 20 | +Runtime changes are categorized as follows: |
| 21 | +- **Operating system updates**: Necessary to support new features or resolve issues. |
| 22 | +- **Base configuration updates**: Initial settings applied during device bootstrapping. |
| 23 | +- **Configuration structure updates**: Generated based on user input for conf |
| 24 | + |
| 25 | +## Prerequisites |
| 26 | + |
| 27 | +1. Install the latest version of [Azure CLI](https://aka.ms/azcli). |
| 28 | +2. The latest `managednetworkfabric` CLI extension is required. It can be installed following the steps listed in [Install CLI Extension](howto-install-cli-extensions.md). |
| 29 | +3. Subscription access to run the Azure Operator Nexus Network Fabric (NF) and network cloud (NC) CLI extension commands. |
| 30 | +4. Target Fabric must be healthy in a running state, with all Devices healthy. |
| 31 | + |
| 32 | +## Required Parameters: |
| 33 | +- <START_DATE>: Planned start date/time of upgrade |
| 34 | +- \<ENVIRONMENT\>: Instance name |
| 35 | +- <AZURE_REGION>: - Azure region of instance |
| 36 | +- <CUSTOMER_SUB_NAME>: Subscription name |
| 37 | +- <CUSTOMER_SUB_ID>: Subscription ID |
| 38 | +- <NEXUS_VERSION>: Operator Nexus release version (for example, 2504.1) |
| 39 | +- <NNF_VERSION>: Operator Nexus Fabric release version (for example, 8.1) |
| 40 | +- <NF_VERSION>: NF runtime version for upgrade (for example, 5.0.0) |
| 41 | +- <NF_DEVICE_NAME>: Network Fabric Device Name |
| 42 | +- <NF_DEVICE_RID>: Network Fabric Device Resource ID |
| 43 | +- <NF_NAME>: Network Fabric Name |
| 44 | +- <NF_RG>: Network Fabric Resource Group |
| 45 | +- <NF_RID>: Network Fabric ARM ID |
| 46 | +- <NFC_NAME>: Associated Network Fabric Controller (NFC) |
| 47 | +- <NFC_RG>: NFC Resource Group |
| 48 | +- <NFC_RID>: NFC ARM ID |
| 49 | +- <NFC_MRG>: NFC Managed Resource Group |
| 50 | +- \<DURATION\>: Estimated Duration of upgrade |
| 51 | +- <DE_ID>: Deployment Engineer performing upgrade |
| 52 | +- <CLUSTER_NAME>: Associated Cluster name |
| 53 | +- <MISE_CID>: Microsoft.Identity.ServiceEssentials (MISE) Correlation ID in debug output for Device updates |
| 54 | +- <CORRELATION_ID>: Operation Correlation ID in debug output for Device updates |
| 55 | +- <ASYNC_URL>: Asynchronous (ASYNC) URL in debug output for Device updates |
| 56 | + |
| 57 | + |
| 58 | +## Links |
| 59 | +- [Azure portal](https://aka.ms/nexus-portal) |
| 60 | +- [Network Fabric Upgrade](howto-upgrade-nexus-fabric.md) |
| 61 | +- [Azure CLI](https://aka.ms/azcli) |
| 62 | +- [Install CLI Extension](howto-install-cli-extensions.md) |
| 63 | + |
| 64 | +## Pre-Checks |
| 65 | + |
| 66 | +1. The following role permissions should be assigned to end users responsible for Fabric create, upgrade, and delete operations. |
| 67 | + |
| 68 | + These permissions can be granted temporarily, limited to the duration required to perform the upgrade. |
| 69 | + * Microsoft.NexusIdentity/identitySets/read |
| 70 | + * Microsoft.NexusIdentity/identitySets/write |
| 71 | + * Microsoft.NexusIdentity/identitySets/delete |
| 72 | + * Ensure that `Role Based Access Control Administrator` is successfully activated. |
| 73 | + * Check in Azure portal from the following path: `Network Fabrics` -> <NF_NAME> -> `Access control (IAM)` -> `View my access`. |
| 74 | + * In current 'Role assignments', you should see the following two roles: |
| 75 | + - Nexus Contributor |
| 76 | + - Role Based Access Control Administrator |
| 77 | + |
| 78 | +2. Validate the provisioning status for the Network Fabric Controller (NFC), Fabric, and Fabric Devices. |
| 79 | + |
| 80 | + Set up the subscription, NFC, and NF parameters: |
| 81 | + ``` |
| 82 | + export SUBSCRIPTION_ID=<CUSTOMER_SUB_ID> |
| 83 | + export NFC_RG=<NFC_RG> |
| 84 | + export NFC_NAME=<NFC_NAME> |
| 85 | + export NF_RG=<NF_RG> |
| 86 | + export NF_NAME=<NF_NAME> |
| 87 | + ``` |
| 88 | + |
| 89 | + Check that the NFC is in Provisioned state: |
| 90 | + ``` |
| 91 | + az networkfabric controller show -g $NFC_RG --resource-name $NFC_NAME --subscription $SUBSCRIPTION_ID -o table |
| 92 | + ``` |
| 93 | + |
| 94 | + Check the NF status: |
| 95 | + ``` |
| 96 | + az networkfabric fabric show -g $NF_RG --resource-name $NF_NAME --subscription $SUBSCRIPTION_ID -o table |
| 97 | + ``` |
| 98 | + Record down the `fabricVersion` and `provisioningState`. |
| 99 | + |
| 100 | + Check the Devices status. |
| 101 | + ``` |
| 102 | + az networkfabric device list -g $NF_RG -o table --subscription $SUBSCRIPTION_ID |
| 103 | + ``` |
| 104 | + |
| 105 | + >[!Note] |
| 106 | + > If `provisioningState` is not `Succeeded`, stop the upgrade until issues are resolved.** |
| 107 | +
|
| 108 | +3. Check `Microsoft.NexusIdentity` user Resource Provider (RP) is registered on the customer subscription: |
| 109 | + ``` |
| 110 | + az provider show --namespace Microsoft.NexusIdentity -o table --subscription $SUBSCRIPTION_ID |
| 111 | + Namespace RegistrationPolicy RegistrationState |
| 112 | + ----------------------- -------------------- ------------------- |
| 113 | + Microsoft.NexusIdentity RegistrationRequired Registered |
| 114 | + ``` |
| 115 | + |
| 116 | + If not registered, run the following to register: |
| 117 | + ``` |
| 118 | + az provider register --namespace Microsoft.NexusIdentity --wait --subscription $SUBSCRIPTION_ID |
| 119 | +
|
| 120 | + az provider show --namespace Microsoft.NexusIdentity -o table |
| 121 | + Namespace RegistrationPolicy RegistrationState |
| 122 | + ----------------------- -------------------- ------------------- |
| 123 | + Microsoft.NexusIdentity RegistrationRequired Registered |
| 124 | + ``` |
| 125 | + |
| 126 | +4. Minimum available disk space on each device must be more than 3.5 GB for a successful device upgrade. |
| 127 | + |
| 128 | + Verify the available space on each Fabric Devices using the following Azure CLI command. |
| 129 | + ``` |
| 130 | + az networkfabric device run-ro --resource-name <ND_DEVICE_NAME> --resource-group <NF_RG> --ro-command "dir flash" --subscription <CUSTOMER_SUB_ID> --debug |
| 131 | + ``` |
| 132 | + |
| 133 | + Contact Microsoft support if there isn't enough space to perform the upgrade. Archived Extensible Operating System (EOS) images and support bundle files can be removed at the direction of support. |
| 134 | + |
| 135 | +5. Check the Fabric's Network Packet Broker (NPB) for any orphaned `Network Taps` in Azure portal. |
| 136 | + * Select `Network Fabrics` under `Azure Services` and then select the <NF_NAME>. |
| 137 | + * Click on the `Resource group` for the Fabric. |
| 138 | + * In the Resources list, filter on `Network Packet Broker`. |
| 139 | + * Click on the `Network Packet Broker` name in the list. |
| 140 | + * Click on `Network Taps` tab on the `Overview` screen. |
| 141 | + * All `Network Taps` should be `Succeeded` for `Configuration State` and `Provisioning State`. |
| 142 | + * Look for any Taps with a red `X`, and a status of `Not Found`, `Failed`, or `Error`. |
| 143 | + |
| 144 | + >[!Note] |
| 145 | + > If any Taps show `Not Found`, `Failed`, or `Error` status, stop the upgrade until issues are cleared. Provide this information to Microsoft Support when opening a support ticket for Tap issues. |
| 146 | + |
| 147 | +6. Run and validate the Fabric cable validation report. |
| 148 | + Follow [Validate Cables for Nexus Network Fabric](how-to-validate-cables.md) to set up and run the report |
| 149 | + |
| 150 | + >[!Note] |
| 151 | + > Resolve any connection and cable issues before continuing the upgrade. |
| 152 | +
|
| 153 | +7. Review Operator Nexus Release notes for required checks and configuration updates not included in this document. |
| 154 | + |
| 155 | +## Send notification to Operations of upgrade schedule for the Fabric. |
| 156 | + |
| 157 | +The following template can be used through email or support ticket: |
| 158 | +``` |
| 159 | +Title: <ENVIRONMENT> <AZURE_REGION> <NF_NAME> Runtime upgrade to <NF_VERSION> <START_TIME> - Completion ETA <DURATION> |
| 160 | +
|
| 161 | +Operations Support: |
| 162 | +
|
| 163 | +Deployment Team notification for <ENVIRONMENT> <AZURE_REGION> <NF_NAME> runtime upgrade to <NF_VERSION> <START_TIME> - Completion ETA <DURATION> |
| 164 | +
|
| 165 | +Subscription: <CUSTOMER_SUB_ID> |
| 166 | +NFC: <NFC_NAME> |
| 167 | +CM: <CM_NAME> |
| 168 | +Fabric: <NF_NAME> |
| 169 | +Cluster: <CLUSTER_NAME> |
| 170 | +Region: <AZURE_REGION> |
| 171 | +Version: <NEXUS_VERSION> |
| 172 | +
|
| 173 | +CC: stakeholder-list |
| 174 | +``` |
| 175 | + |
| 176 | +## Add resource tag on Fabric resource in Azure portal |
| 177 | +To help track upgrades, add a tag to the Fabric resource in Azure portal (optional): |
| 178 | +``` |
| 179 | +|Name | Value | |
| 180 | +|----------------|----------------- |
| 181 | +|BF in progress |<DE_ID> | |
| 182 | +``` |
| 183 | + |
| 184 | +## Upgrade Procedure |
| 185 | + |
| 186 | +### Verify current Fabric runtime version. |
| 187 | +[How to check current cluster runtime version.](./howto-check-runtime-version.md#check-current-fabric-runtime-version) |
| 188 | + |
| 189 | +``` |
| 190 | +az networkfabric fabric list -g $NF_RG --query "[].{name:name,fabricVersion:fabricVersion,configurationState:configurationState,provisioningState:provisioningState}" -o table --subscription $SUBSCRIPTION_ID |
| 191 | +az networkfabric fabric show -g $NF_RG --resource-name $NF_NAME --subscription $SUBSCRIPTION_ID |
| 192 | +``` |
| 193 | + |
| 194 | +### Initiate Fabric upgrade. |
| 195 | +Start the upgrade with the following command: |
| 196 | +```Azure CLI |
| 197 | +az networkfabric fabric upgrade -g [resource-group] --resource-name [fabric-name] --action start --version "5.0.0" |
| 198 | +{} |
| 199 | +``` |
| 200 | + |
| 201 | +>[!Note] |
| 202 | +> Output showing `{}` indicates successful execution of upgrade command. |
| 203 | +
|
| 204 | +The Fabric Resource Provider validates if the version upgrade is allowed from the existing Fabric version to the target version. Only N+1 major release upgrades are allowed (for example, 4.0.0->5.0.0). |
| 205 | + |
| 206 | +On successful completion, the command puts the Fabric status into `Under Maintenance` and prevents any other operation on the Fabric. |
| 207 | + |
| 208 | +### Device-specific workflow: |
| 209 | + |
| 210 | +Nexus Network Fabric Racks are composed of the following Devices types: |
| 211 | +- Customer Edge (CE) Switches |
| 212 | +- Management (MGMT) Switches |
| 213 | +- Top Of Rack (TOR) Switches |
| 214 | +- Network Packet Brokers (NPB) |
| 215 | + |
| 216 | +Eight Rack environments have 30 Devices: |
| 217 | +- Aggregate Rack - two CE, two NPB, two MGMT Switches (six Devices) |
| 218 | +- Eight Compute Racks - Each Compute Rack has two TOR's and one MGMT Switch (24 Devices) |
| 219 | + |
| 220 | +Four Rack environments have 17 Devices: |
| 221 | +- Aggregate Rack - two CE's, one NPB, two MGMT Switches (five Devices) |
| 222 | +- Four Compute Racks - Each Compute Rack has two TOR's and one MGMT Switch (12 Devices) |
| 223 | + |
| 224 | +>[!Important] |
| 225 | +>The Devices must be upgraded in the following specific order to maintain networking service during the upgrade. |
| 226 | +
|
| 227 | +1. Compute Rack odd numbered TOR upgrade together in parallel. |
| 228 | +2. Compute Rack even numbered TOR upgrade together in parallel. |
| 229 | +3. Compute Rack MGMT switches upgrade together in parallel. |
| 230 | +4. Aggregate Rack CEs upgrade one after the other in serial. |
| 231 | + >[!Important] |
| 232 | + > After each CE upgrade, wait for a duration of five minutes to ensure that the recovery process is complete before proceeding to the next CE |
| 233 | +5. Aggregate Rack NPBs upgrade one after the other in serial. |
| 234 | +6. Aggregate Rack MGMT switches upgrade one after the other in serial. |
| 235 | + |
| 236 | +>[!NOTE] |
| 237 | +> Wait for successful upgrade on all Devices in a group before moving to the next group. |
| 238 | +
|
| 239 | +### Device-specific upgrade: |
| 240 | +Run the following command to upgrade the version on each Device: |
| 241 | +``` |
| 242 | +az networkfabric device upgrade --version $NF_VERSION -g $NF_RG --resource-name $NF_DEVICE_NAME --subscription $SUBSCRIPTION_ID --debug |
| 243 | +``` |
| 244 | + |
| 245 | +As part of the upgrade, the Devices are put into maintenance mode. The Device drains all traffic and stops advertising routes so that the traffic flow to the device stops. At completion, the Nexus Network Fabric (NNF) service updates the Device resource version property to the new version. |
| 246 | + |
| 247 | +Gather ASYNC URL and Correlation ID info for further troubleshooting if needed. |
| 248 | +``` |
| 249 | +cli.azure.cli.core.sdk.policies: 'mise-correlation-id': '<MISE_CID>' |
| 250 | +cli.azure.cli.core.sdk.policies: 'x-ms-correlation-request-id': '<CORRELATION_ID>' |
| 251 | +cli.azure.cli.core.sdk.policies: 'Azure-AsyncOperation': '<ASYNC_URL>' |
| 252 | +``` |
| 253 | +Provide this information to Microsoft Support when opening a support ticket for upgrade issues. |
| 254 | + |
| 255 | +After Device upgrades are complete, make sure that all the Devices are showing with <NF_VERSION> by running the following command: |
| 256 | +``` |
| 257 | +az networkfabric device list -g $NF_RG --query "[].{name:name,version:version}" -o table --subscription $SUBSCRIPTION_ID |
| 258 | +``` |
| 259 | + |
| 260 | +### Complete Network Fabric Upgrade |
| 261 | +Once all the Devices are upgraded, run the following command to take the Network Fabric out of maintenance state. |
| 262 | +``` |
| 263 | +az networkfabric fabric upgrade --action Complete --version $NF_VERSION -g $NF_RG --resource-name $NF_NAME --debug --subscription $SUBSCRIPTION_ID |
| 264 | +``` |
| 265 | + |
| 266 | +## Troubleshooting Device update failures. |
| 267 | +1. Collect any errors in the Azure CLI output. |
| 268 | +2. Collect device operation state from Azure portal or Azure CLI. |
| 269 | +3. Create Azure Support Request for any device upgrade failures and attach any errors along with ASYNC URL, correlation ID, and operation state of Fabric and Devices. |
| 270 | + |
| 271 | +## Post-upgrade Validation |
| 272 | +Once complete, run the following commands to check the status of the Fabric and Devices: |
| 273 | +``` |
| 274 | +az networkfabric fabric list -g $NF_RG --query "[].{name:name,fabricVersion:fabricVersion,configurationState:configurationState,provisioningState:provisioningState}" -o table --subscription $SUBSCRIPTION_ID |
| 275 | +az networkfabric fabric show -g $NF_RG --resource-name $NF_NAME --subscription $SUBSCRIPTION_ID |
| 276 | +az networkfabric device list -g $NF_RG --query "[].{name:name,version:version}" -o table --subscription $SUBSCRIPTION_ID |
| 277 | +``` |
| 278 | + |
| 279 | +## Send notification to Operations of Fabric upgrade completion |
| 280 | + |
| 281 | +The following template can be used through email or ticketing system: |
| 282 | +``` |
| 283 | +Title: <ENVIRONMENT> <AZURE_REGION> <NF_NAME> Runtime <NF_VERSION> Upgrade Complete |
| 284 | +
|
| 285 | +Operations: |
| 286 | +Deployment Team notification for <ENVIRONMENT> <AZURE_REGION> <NF_NAME> runtime <NF_VERSION> Upgrade Complete |
| 287 | +
|
| 288 | +Subscription: <CUSTOMER_SUB_ID> |
| 289 | +NFC: <NFC_NAME> |
| 290 | +CM: <CM_NAME> |
| 291 | +Fabric: <NF_NAME> |
| 292 | +Cluster: <CLUSTER_NAME> |
| 293 | +Region: <AZURE_REGION> |
| 294 | +Version: <NEXUS_VERSION> |
| 295 | + |
| 296 | +CC: stakeholder_list |
| 297 | +``` |
| 298 | + |
| 299 | +## Remove resource tag on Fabric resource in Azure portal |
| 300 | +Remove the resource tag on the Fabric resource tracking the upgrade in Azure portal (if added previously): |
| 301 | +``` |
| 302 | +|Name | Value | |
| 303 | +|----------------|----------------- |
| 304 | +|BF in progress |<DE_ID> | |
| 305 | +``` |
| 306 | + |
| 307 | +## Close out any Work Items in your ticketing system |
| 308 | +* Update Task hours for upgrade duration. |
| 309 | +* Set Fabric upgrade work item to `Complete`. |
| 310 | +* Add any notes on support tickets and issues encountered during upgrade |
0 commit comments