@@ -4,7 +4,7 @@ description: Troubleshoot BMM provisioning for Azure Operator Nexus.
4
4
ms.service : azure-operator-nexus
5
5
ms.custom : troubleshooting
6
6
ms.topic : troubleshooting
7
- ms.date : 07/08 /2024
7
+ ms.date : 07/19 /2024
8
8
author : bpinto
9
9
ms.author : bpinto
10
10
---
@@ -15,13 +15,13 @@ As part of cluster deploy action, bare metal machines (BMM) are provisioned with
15
15
16
16
## Prerequisites
17
17
1 . Install the latest version of the [ appropriate CLI extensions] ( howto-install-cli-extensions.md )
18
- 2 . Gather the following information:
18
+ 2 . Collect the following information:
19
19
- Subscription ID (SUBSCRIPTION)
20
20
- Cluster name (CLUSTER)
21
21
- Resource group (CLUSTER_RG)
22
22
- Managed resource group (CLUSTER_MRG)
23
- 3 . The user needs access to the subscription to run Azure Operator Nexus network fabric (NF) and network cloud (NC) CLI extension commands
24
- 4 . Log in to Azure CLI and select the subscription where the cluster is deployed
23
+ 3 . Request subscription access to run Azure Operator Nexus network fabric (NF) and network cloud (NC) CLI extension commands.
24
+ 4 . Log in to Azure CLI and select the subscription where the cluster is deployed.
25
25
26
26
## BMM roles
27
27
For a given SKU, there are required roles to manage and operate the underlying kubernetes cluster.
@@ -49,16 +49,16 @@ Where `STATUS` goes through the following phases through the BMM provisioning pr
49
49
50
50
These phases are defined as follows:
51
51
52
- | Phase | Definition |
52
+ | Phase | Actions |
53
53
| --- | --- |
54
- | ` Registering ` | Verify BMC connectivity and BMC credentials, add BMM to provisioning service |
55
- | ` Preparing ` | Reboot BMM, reset BMC, verify power state |
56
- | ` Inspecting ` | Update firmware, apply BIOS settings, and configure storage |
57
- | ` Available ` | BMM ready to install OS |
58
- | ` Provisioning ` | OS image installing on the BMM and attempts to join cluster |
59
- | ` Provisioned ` | BMM successfully provisioned and joined to cluster |
60
- | ` Deprovisioning ` | BMM provisioning failed and retrying |
61
- | ` Failed ` | BMM provisioning failed and requires recovery action, all retries exhausted |
54
+ | ` Registering ` | Verifying BMC connectivity/ BMC credentials and adding BMM to provisioning service. |
55
+ | ` Preparing ` | Rebooting BMM, resetting BMC, and verifying power state. |
56
+ | ` Inspecting ` | Updating firmware, applying BIOS settings, and configuring storage. |
57
+ | ` Available ` | BMM is ready to install OS. |
58
+ | ` Provisioning ` | OS image installing on the BMM. BMM will attempt to join cluster. |
59
+ | ` Provisioned ` | BMM successfully provisioned and joined to cluster. |
60
+ | ` Deprovisioning ` | BMM provisioning failed. Provisioning service is cleaning up resource for retry. |
61
+ | ` Failed ` | BMM provisioning failed and requires recovery action. All retries exhausted. |
62
62
63
63
During any phase, the BMM detailed status is set to failed and the phase is blocked if any of the following occurs:
64
64
- BMC is unavailable
@@ -79,13 +79,13 @@ Where the output is defined as follows:
79
79
| Output | Definition |
80
80
| --- | --- |
81
81
| BMM_NAME | BMM name |
82
- | RSTATE | Cluster participation status (` True ` ,` False ` ) |
83
- | PROV_STATE | Provisioning state (` Succeeded ` ,` Failed ` ) |
84
- | STATUS | Provisioning detailed status (` Registering ` ,` Preparing ` ,` Inspecting ` ,` Available ` ,` Provisioning ` ,` Provisioned ` ,` Deprovisioning ` ,` Failed ` ) |
85
- | STATUS_MSG | Detailed provisioning status message |
86
- | POWER_STATE | Power state of BMM (` On ` ,` Off ` ) |
87
- | BMM_ROLE | BMM cluster role contains (` control-plane ` ,` management-plane ` ,` compute-plane ` ) |
88
- | CREATE_DATE | BMM creation date |
82
+ | RSTATE | Cluster participation status (` True ` ,` False ` ). |
83
+ | PROV_STATE | Provisioning state (` Succeeded ` ,` Failed ` ). |
84
+ | STATUS | Provisioning detailed status (` Registering ` ,` Preparing ` ,` Inspecting ` ,` Available ` ,` Provisioning ` ,` Provisioned ` ,` Deprovisioning ` ,` Failed ` ). |
85
+ | STATUS_MSG | Detailed provisioning status message. |
86
+ | POWER_STATE | Power state of BMM (` On ` ,` Off ` ). |
87
+ | BMM_ROLE | BMM cluster role (` control-plane ` ,` management-plane ` ,` compute-plane ` ). |
88
+ | CREATE_DATE | BMM creation date. |
89
89
90
90
For example:
91
91
``` azurecli
@@ -98,7 +98,7 @@ To show details and status of a single BMM:
98
98
``` azurecli
99
99
az networkcloud baremetalmachine show -g $CLUSTER_MRG -n $BMM_NAME
100
100
```
101
- For additional BMM details used in troubleshooting:
101
+ For BMM details specific to troubleshooting:
102
102
``` azurecli
103
103
az networkcloud baremetalmachine show -g $CLUSTER_MRG -n $BMM_NAME --query "{name:name,BootMAC:bootMacAddress,BMCMAC:bmcMacAddress,Connect:bmcConnectionString,SN:serialNumber,rackId:rackId,RackSlot:rackSlot}" -o table
104
104
```
@@ -109,17 +109,17 @@ The following conditions can cause provisioning failures:
109
109
110
110
| Error Type | Resolution |
111
111
| ---------- | ---------- |
112
- | BMC shows ` Backplane Comm ` critical error | Remote flea drain, physical flea drain, BMM replace action |
113
- | Boot network data response empty from BMC | Bounce port on fabric device, remote flea drain, physical flea drain, BMM replace action |
114
- | Disk data response empty from BMC | Reseat disk, re-seat storage controller, remote flea drain, physical flea drain, BMM replace action |
115
- | BMC unreachable | Bounce port on fabric device, reseat cable, remote flea drain, physical flea drain, BMM replace action |
116
- | BMC fails log in | Update credentials on BMC, BMM replace action |
117
- | DIMM , CPU, OEM critical errors | Resolve hardware issue, BMM replace action |
118
- | Console stuck at grub menu | Reset NVRAM, BMM replace action |
112
+ | BMC shows ` Backplane Comm ` critical error | Remote flea drain; Physical flea drain; BMM replace action. |
113
+ | Boot network data response empty from BMC | Bounce port on fabric device; Remote flea drain; Physical flea drain; BMM replace action |
114
+ | Disk data response empty from BMC | Reseat disk; Reseat storage controller; Remote flea drain; Physical flea drain; BMM replace action |
115
+ | BMC unreachable | Bounce port on fabric device; Reseat cable; Remote flea drain, Physical flea drain; BMM replace action |
116
+ | BMC fails log in | Update credentials on BMC; BMM replace action |
117
+ | Memory , CPU, OEM critical errors | Resolve hardware issue; BMM replace action |
118
+ | Console stuck at grub menu | Reset NVRAM; BMM replace action |
119
119
120
120
### Azure BMM activity log
121
121
122
- 1 . Log in to [ Azure Portal ] ( https://portal.azure.com/ ) .
122
+ 1 . Log in to [ Azure portal ] ( https://portal.azure.com/ ) .
123
123
2 . Search on the BMM name in the top ` Search ` box.
124
124
3 . Select the ` Bare Metal Machine (Operator Nexus) ` from the search results.
125
125
4 . Select ` Activity log ` on the left side menu.
@@ -168,11 +168,11 @@ Attempt to run ping against the BMC IPv4 address:
168
168
```
169
169
170
170
### Reset port on fabric device
171
- If the BMC_IP is not responsive, a reset of the fabric device port retriggers autonegotiation on the port and may bring it back online.
171
+ If the BMC_IP isn't responsive, a reset of the fabric device port retriggers autonegotiation on the port and may bring it back online.
172
172
173
173
To find the ` Network Fabric ` port from Azure:
174
174
1 . Obtain the ` RackID ` and ` RackSlot ` from the previous ` BMM Details ` section.
175
- 2 . In ` Azure Portal ` , drill down to the ` Network Rack ` RackID for the BMM.
175
+ 2 . In Azure portal , drill down to the ` Network Rack ` RackID for the BMM.
176
176
3 . Select ` Network Devices ` tab and the management (Mgmt) switch for the rack.
177
177
4 . Under ` Resources ` , select ` Network Interfaces ` and then the BMC (iDRAC) or boot (PXE) interface for the port that requires reset.
178
178
0 commit comments