Skip to content

Commit 46f3fd4

Browse files
authored
Update howto-upgrade-nexus-fabric-template.md
1 parent 16bfbc7 commit 46f3fd4

File tree

1 file changed

+135
-92
lines changed

1 file changed

+135
-92
lines changed

articles/operator-nexus/howto-upgrade-nexus-fabric-template.md

Lines changed: 135 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -9,11 +9,13 @@ ms.topic: how-to
99
ms.custom: azure-operator-nexus, template-include
1010
---
1111

12-
# Fabric runtime upgrade template
12+
# Fabric Runtime Upgrade Template
1313

1414
This how-to guide provides a step-by-step template for upgrading a Nexus Fabric designed to assist users in managing a reproducible end-to-end upgrade through Azure APIs and standard operating procedures. Regular updates are crucial for maintaining system integrity and accessing the latest product improvements.
1515

1616
## Overview
17+
<details>
18+
<summary> Overview of Fabric runtime upgrade template </summary>
1719

1820
**Runtime bundle components**: These components require operator consent for upgrades that may affect traffic behavior or necessitate device reboots. The network fabric's design allows for updates to be applied while maintaining continuous data traffic flow.
1921

@@ -22,47 +24,94 @@ Runtime changes are categorized as follows:
2224
- **Base configuration updates**: Initial settings applied during device bootstrapping.
2325
- **Configuration structure updates**: Generated based on user input for conf
2426

27+
</details>
28+
2529
## Prerequisites
30+
<details>
31+
<summary> Prerequisites for using this template to upgrade a Fabric </summary>
32+
33+
- Latest version of [Azure CLI](https://aka.ms/azcli).
34+
- Latest `managednetworkfabric` [CLI extension](howto-install-cli-extensions.md).
35+
- Latest `networkcloud` [CLI extension](howto-install-cli-extensions.md).
36+
- Subscription access to run the Azure Operator Nexus Network Fabric (NF) and Network Cloud (NC) CLI extension commands.
37+
- Target Fabric must be healthy in a running state, with all Devices healthy.
2638

27-
1. Install the latest version of [Azure CLI](https://aka.ms/azcli).
28-
2. The latest `managednetworkfabric` CLI extension is required. It can be installed following the steps listed in [Install CLI Extension](howto-install-cli-extensions.md).
29-
3. Subscription access to run the Azure Operator Nexus Network Fabric (NF) and network cloud (NC) CLI extension commands.
30-
4. Target Fabric must be healthy in a running state, with all Devices healthy.
39+
</details>
3140

32-
## Required Parameters:
33-
- <START_DATE>: Planned start date/time of upgrade
34-
- \<ENVIRONMENT\>: Instance name
41+
## Required Parameters
42+
<details>
43+
<summary> Parameters used in this document </summary>
44+
45+
- \<ENVIRONMENT\>: - Instance name
3546
- <AZURE_REGION>: - Azure region of instance
3647
- <CUSTOMER_SUB_NAME>: Subscription name
3748
- <CUSTOMER_SUB_ID>: Subscription ID
38-
- <NEXUS_VERSION>: Operator Nexus release version (for example, 2504.1)
49+
- \<NEXUS_VERSION\>: Nexus release version (for example, 2504.1)
3950
- <NNF_VERSION>: Operator Nexus Fabric release version (for example, 8.1)
4051
- <NF_VERSION>: NF runtime version for upgrade (for example, 5.0.0)
41-
- <NF_DEVICE_NAME>: Network Fabric Device Name
42-
- <NF_DEVICE_RID>: Network Fabric Device Resource ID
43-
- <NF_NAME>: Network Fabric Name
44-
- <NF_RG>: Network Fabric Resource Group
45-
- <NF_RID>: Network Fabric ARM ID
4652
- <NFC_NAME>: Associated Network Fabric Controller (NFC)
4753
- <NFC_RG>: NFC Resource Group
4854
- <NFC_RID>: NFC ARM ID
4955
- <NFC_MRG>: NFC Managed Resource Group
50-
- \<DURATION\>: Estimated Duration of upgrade
51-
- <DE_ID>: Deployment Engineer performing upgrade
56+
- <NF_NAME>: Network Fabric Name
57+
- <NF_RG>: Network Fabric Resource Group
58+
- <NF_RID>: Network Fabric ARM ID
59+
- <NF_DEVICE_NAME>: Network Fabric Device Name
60+
- <NF_DEVICE_RID>: Network Fabric Device Resource ID
61+
- <CM_NAME>: Associated Cluster Manager (CM)
5262
- <CLUSTER_NAME>: Associated Cluster name
5363
- <MISE_CID>: Microsoft.Identity.ServiceEssentials (MISE) Correlation ID in debug output for Device updates
5464
- <CORRELATION_ID>: Operation Correlation ID in debug output for Device updates
5565
- <ASYNC_URL>: Asynchronous (ASYNC) URL in debug output for Device updates
66+
- <LINK_TO_TELCO_INPUT>: Link to the Instance Telco Input file
5667

68+
</details>
5769

58-
## Links
59-
- [Azure portal](https://aka.ms/nexus-portal)
60-
- [Network Fabric Upgrade](howto-upgrade-nexus-fabric.md)
61-
- [Azure CLI](https://aka.ms/azcli)
62-
- [Install CLI Extension](howto-install-cli-extensions.md)
70+
## Deployment Data
71+
<details>
72+
<summary> Deployment data details </summary>
6373

64-
## Pre-Checks
74+
```
75+
- Nexus: <NEXUS_VERSION>
76+
- NC: <NC_VERSION>
77+
- NF: <NF_VERSION>
78+
- Subscription Name: <CUSTOMER_SUB_NAME>
79+
- Subscription ID: <CUSTOMER_SUB_ID>
80+
- Tenant ID: <CUSTOMER_SUB_TENANT_ID>
81+
- Telco Input: <LINK_TO_TELCO_INPUT>
82+
```
83+
84+
</details>
85+
86+
## Debug information for Azure CLI commands
87+
<details>
88+
<summary> How to collect debug information for Azure CLI commands </summary>
89+
90+
Azure CLI deployment commands issued with `--debug` contain the following information in the command output:
91+
```
92+
cli.azure.cli.core.sdk.policies: 'mise-correlation-id': '<MISE_CID>'
93+
cli.azure.cli.core.sdk.policies: 'x-ms-correlation-request-id': '<CORRELATION_ID>'
94+
cli.azure.cli.core.sdk.policies: 'Azure-AsyncOperation': '<ASYNC_URL>'
95+
```
96+
97+
To view status of long running asynchronous operations, run the following command with `az rest`:
98+
```
99+
az rest -m get -u '<ASYNC_URL>'
100+
```
101+
102+
Command status information is returned along with detailed informational or error messages:
103+
- `"status": "Accepted"`
104+
- `"status": "Succeeded"`
105+
- `"status": "Failed"`
106+
107+
If any failures occur, report the <MISE_CID>, <CORRELATION_ID>, status code, and detailed messages when opening a support request.
108+
109+
</details>
65110

111+
## Pre-Checks
112+
<details>
113+
<summary> Pre-checks before starting Fabric upgrade </summary>
114+
66115
1. The following role permissions should be assigned to end users responsible for Fabric create, upgrade, and delete operations.
67116

68117
These permissions can be granted temporarily, limited to the duration required to perform the upgrade.
@@ -103,7 +152,7 @@ Runtime changes are categorized as follows:
103152
```
104153

105154
>[!Note]
106-
> If `provisioningState` is not `Succeeded`, stop the upgrade until issues are resolved.**
155+
> If `provisioningState` is not `Succeeded`, stop the upgrade until issues are resolved.
107156
108157
3. Check `Microsoft.NexusIdentity` user Resource Provider (RP) is registered on the customer subscription:
109158
```
@@ -151,47 +200,22 @@ Runtime changes are categorized as follows:
151200
> Resolve any connection and cable issues before continuing the upgrade.
152201
153202
7. Review Operator Nexus Release notes for required checks and configuration updates not included in this document.
154-
155-
## Send notification to Operations of upgrade schedule for the Fabric.
156-
157-
The following template can be used through email or support ticket:
158-
```
159-
Title: <ENVIRONMENT> <AZURE_REGION> <NF_NAME> Runtime upgrade to <NF_VERSION> <START_TIME> - Completion ETA <DURATION>
160-
161-
Operations Support:
162-
163-
Deployment Team notification for <ENVIRONMENT> <AZURE_REGION> <NF_NAME> runtime upgrade to <NF_VERSION> <START_TIME> - Completion ETA <DURATION>
164-
165-
Subscription: <CUSTOMER_SUB_ID>
166-
NFC: <NFC_NAME>
167-
CM: <CM_NAME>
168-
Fabric: <NF_NAME>
169-
Cluster: <CLUSTER_NAME>
170-
Region: <AZURE_REGION>
171-
Version: <NEXUS_VERSION>
172-
173-
CC: stakeholder-list
174-
```
175-
176-
## Add resource tag on Fabric resource in Azure portal
177-
To help track upgrades, add a tag to the Fabric resource in Azure portal (optional):
178-
```
179-
|Name | Value |
180-
|----------------|-----------------
181-
|BF in progress |<DE_ID> |
182-
```
183203

204+
</details>
205+
184206
## Upgrade Procedure
185-
186-
### Verify current Fabric runtime version.
207+
<details>
208+
<summary> Fabric runtime upgrade procedure details </summary>
209+
210+
### Verify current Fabric runtime version
187211
[How to check current cluster runtime version.](./howto-check-runtime-version.md#check-current-fabric-runtime-version)
188212

189213
```
190214
az networkfabric fabric list -g $NF_RG --query "[].{name:name,fabricVersion:fabricVersion,configurationState:configurationState,provisioningState:provisioningState}" -o table --subscription $SUBSCRIPTION_ID
191215
az networkfabric fabric show -g $NF_RG --resource-name $NF_NAME --subscription $SUBSCRIPTION_ID
192216
```
193217

194-
### Initiate Fabric upgrade.
218+
### Initiate Fabric upgrade
195219
Start the upgrade with the following command:
196220
```Azure CLI
197221
az networkfabric fabric upgrade -g [resource-group] --resource-name [fabric-name] --action start --version "5.0.0"
@@ -205,7 +229,7 @@ The Fabric Resource Provider validates if the version upgrade is allowed from th
205229

206230
On successful completion, the command puts the Fabric status into `Under Maintenance` and prevents any other operation on the Fabric.
207231

208-
### Device-specific workflow:
232+
### Follow device-specific workflow
209233

210234
Nexus Network Fabric Racks are composed of the following Devices types:
211235
- Customer Edge (CE) Switches
@@ -236,7 +260,7 @@ Four Rack environments have 17 Devices:
236260
>[!NOTE]
237261
> Wait for successful upgrade on all Devices in a group before moving to the next group.
238262
239-
### Device-specific upgrade:
263+
### Follow device-specific upgrade
240264
Run the following command to upgrade the version on each Device:
241265
```
242266
az networkfabric device upgrade --version $NF_VERSION -g $NF_RG --resource-name $NF_DEVICE_NAME --subscription $SUBSCRIPTION_ID --debug
@@ -263,48 +287,67 @@ Once all the Devices are upgraded, run the following command to take the Network
263287
az networkfabric fabric upgrade --action Complete --version $NF_VERSION -g $NF_RG --resource-name $NF_NAME --debug --subscription $SUBSCRIPTION_ID
264288
```
265289

266-
## Troubleshooting Device update failures.
290+
### How to troubleshoot Device update failures
267291
1. Collect any errors in the Azure CLI output.
268292
2. Collect device operation state from Azure portal or Azure CLI.
269293
3. Create Azure Support Request for any device upgrade failures and attach any errors along with ASYNC URL, correlation ID, and operation state of Fabric and Devices.
270294

271-
## Post-upgrade Validation
272-
Once complete, run the following commands to check the status of the Fabric and Devices:
273-
```
274-
az networkfabric fabric list -g $NF_RG --query "[].{name:name,fabricVersion:fabricVersion,configurationState:configurationState,provisioningState:provisioningState}" -o table --subscription $SUBSCRIPTION_ID
275-
az networkfabric fabric show -g $NF_RG --resource-name $NF_NAME --subscription $SUBSCRIPTION_ID
276-
az networkfabric device list -g $NF_RG --query "[].{name:name,version:version}" -o table --subscription $SUBSCRIPTION_ID
277-
```
295+
</details>
278296

279-
## Send notification to Operations of Fabric upgrade completion
297+
## Post-upgrade tasks
298+
<details>
299+
<summary> Detailed steps for post-upgrade tasks </summary>
280300

281-
The following template can be used through email or ticketing system:
282-
```
283-
Title: <ENVIRONMENT> <AZURE_REGION> <NF_NAME> Runtime <NF_VERSION> Upgrade Complete
284-
285-
Operations:
286-
Deployment Team notification for <ENVIRONMENT> <AZURE_REGION> <NF_NAME> runtime <NF_VERSION> Upgrade Complete
287-
288-
Subscription: <CUSTOMER_SUB_ID>
289-
NFC: <NFC_NAME>
290-
CM: <CM_NAME>
291-
Fabric: <NF_NAME>
292-
Cluster: <CLUSTER_NAME>
293-
Region: <AZURE_REGION>
294-
Version: <NEXUS_VERSION>
295-
296-
CC: stakeholder_list
297-
```
301+
### Review Operator Nexus release notes
302+
Review the Operator Nexus release notes for any version specific actions required post-upgrade.
303+
304+
### Validate Nexus Instance
298305

299-
## Remove resource tag on Fabric resource in Azure portal
300-
Remove the resource tag on the Fabric resource tracking the upgrade in Azure portal (if added previously):
306+
Validate the health and status of all the Nexus Instance resources with the [Nexus Instance Readiness Test (IRT)](howto-run-instance-readiness-testing.md).
307+
308+
To perform a resource validation of the Nexus Instance components post-upgrade through Azure CLI:
301309
```
302-
|Name | Value |
303-
|----------------|-----------------
304-
|BF in progress |<DE_ID> |
310+
# NFC
311+
az networkfabric controller list --subscription <CUSTOMER_SUB_ID> -o table
312+
az vm list -o table --query "[?location=='<AZURE_REGION>']" --subscription <CUSTOMER_SUB_ID>
313+
az customlocation list -o table --query "[?location=='<AZURE_REGION>']" | grep <NFC_NAME> --subscription <CUSTOMER_SUB_ID>
314+
315+
# Fabric
316+
az networkfabric fabric list --resource-group <NF_RG> --subscription <CUSTOMER_SUB_ID> -o table
317+
az networkfabric rack list -o table --resource-group <NF_RG> --subscription <CUSTOMER_SUB_ID> -o table
318+
az networkfabric fabric device list --resource-group <NF_RG> --subscription <CUSTOMER_SUB_ID> -o table
319+
az networkfabric nni list -g <NF_RG> --fabric <NF_NAME> --subscription <CUSTOMER_SUB_ID> -o table
320+
az networkfabric acl list -g <NF_RG> --fabric <NF_NAME> --subscription <CUSTOMER_SUB_ID> -o table
321+
az networkfabric l2domain list -g <NF_RG> --fabric <NF_NAME> --subscription <CUSTOMER_SUB_ID> -o table
322+
323+
# CM
324+
az networkcloud clustermanager list --subscription <CUSTOMER_SUB_ID> -o table
325+
326+
# Cluster
327+
az networkcloud cluster list --subscription <CUSTOMER_SUB_ID> -o table
328+
az networkcloud baremetalmachine list -g <CLUSTER_MRG> --subscription <CUSTOMER_SUB_ID> --query "sort_by([]. {name:name,kubernetesNodeName:kubernetesNodeName,location:location,readyState:readyState,provisioningState:provisioningState,detailedStatus:detailedStatus,detailedStatusMessage:detailedStatusMessage,cordonStatus:cordonStatus,powerState:powerState,machineRoles:machineRoles| join(', ', @),createdAt:systemData.createdAt}, &name)" -o table
329+
az networkcloud storageappliance list -g <CLUSTER_MRG> --subscription <CUSTOMER_SUB_ID> -o table
330+
331+
# Tenant Workloads
332+
az networkcloud virtualmachine list --sub $SUBSCRIPTION_ID --query "reverse(sort_by([?clusterId=='$CLUSTER_RID'].{name:name, createdAt:systemData.createdAt, resourceGroup:resourceGroup, powerState:powerState, provisioningState:provisioningState, detailedStatus:detailedStatus,bareMetalMachineId:bareMetalMachineIdi,CPUCount:cpuCores, EmulatorStatus:isolateEmulatorThread}, &createdAt))" -o table
333+
az networkcloud kubernetescluster list --sub $SUBSCRIPTION_ID --query "[?clusterId=='$CLUSTER_RID'].{name:name, resourceGroup:resourceGroup, provisioningState:provisioningState, detailedStatus:detailedStatus, detailedStatusMessage:detailedStatusMessage, createdAt:systemData.createdAt, kubernetesVersion:kubernetesVersion}" -o table
305334
```
306335

307-
## Close out any Work Items in your ticketing system
308-
* Update Task hours for upgrade duration.
309-
* Set Fabric upgrade work item to `Complete`.
310-
* Add any notes on support tickets and issues encountered during upgrade
336+
> [!Note]
337+
> IRT validation provides a complete functional test of networking and workloads across all components of the Nexus Instance. Simple validation does not provide functional tesing.
338+
339+
</details>
340+
341+
## Links
342+
<details>
343+
<summary> Reference Links for Fabric upgrade </summary>
344+
345+
Reference links for Fabric upgrade:
346+
- Access the [Azure portal](https://aka.ms/nexus-portal)
347+
- [Install Azure CLI](https://aka.ms/azcli)
348+
- [Install CLI Extension](howto-install-cli-extensions.md)
349+
- Reference the [Network Fabric Upgrade](howto-upgrade-nexus-fabric.md)
350+
- Reference the [Nexus Telco Input Template](concepts-telco-input-template.md)
351+
- Reference the [Nexus Instance Readiness Test (IRT)](howto-run-instance-readiness-testing.md)
352+
353+
</details>

0 commit comments

Comments
 (0)