Skip to content

Commit ce737bb

Browse files
authored
Update troubleshoot-bmm-provisioning.md
Acrolinx corrections.
1 parent 10665cf commit ce737bb

File tree

1 file changed

+31
-31
lines changed

1 file changed

+31
-31
lines changed

articles/operator-nexus/troubleshoot-bmm-provisioning.md

Lines changed: 31 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Troubleshoot BMM provisioning for Azure Operator Nexus.
44
ms.service: azure-operator-nexus
55
ms.custom: troubleshooting
66
ms.topic: troubleshooting
7-
ms.date: 07/08/2024
7+
ms.date: 07/19/2024
88
author: bpinto
99
ms.author: bpinto
1010
---
@@ -15,13 +15,13 @@ As part of cluster deploy action, bare metal machines (BMM) are provisioned with
1515

1616
## Prerequisites
1717
1. Install the latest version of the [appropriate CLI extensions](howto-install-cli-extensions.md)
18-
2. Gather the following information:
18+
2. Collect the following information:
1919
- Subscription ID (SUBSCRIPTION)
2020
- Cluster name (CLUSTER)
2121
- Resource group (CLUSTER_RG)
2222
- Managed resource group (CLUSTER_MRG)
23-
3. The user needs access to the subscription to run Azure Operator Nexus network fabric (NF) and network cloud (NC) CLI extension commands
24-
4. Log in to Azure CLI and select the subscription where the cluster is deployed
23+
3. Request subscription access to run Azure Operator Nexus network fabric (NF) and network cloud (NC) CLI extension commands.
24+
4. Log in to Azure CLI and select the subscription where the cluster is deployed.
2525

2626
## BMM roles
2727
For a given SKU, there are required roles to manage and operate the underlying kubernetes cluster.
@@ -49,16 +49,16 @@ Where `STATUS` goes through the following phases through the BMM provisioning pr
4949

5050
These phases are defined as follows:
5151

52-
| Phase | Definition |
52+
| Phase | Actions |
5353
| --- | --- |
54-
| `Registering` | Verify BMC connectivity and BMC credentials, add BMM to provisioning service |
55-
| `Preparing` | Reboot BMM, reset BMC, verify power state |
56-
| `Inspecting` | Update firmware, apply BIOS settings, and configure storage |
57-
| `Available` | BMM ready to install OS |
58-
| `Provisioning` | OS image installing on the BMM and attempts to join cluster |
59-
| `Provisioned` | BMM successfully provisioned and joined to cluster |
60-
| `Deprovisioning` | BMM provisioning failed and retrying |
61-
| `Failed` | BMM provisioning failed and requires recovery action, all retries exhausted |
54+
| `Registering` | Verifying BMC connectivity/BMC credentials and adding BMM to provisioning service. |
55+
| `Preparing` | Rebooting BMM, resetting BMC, and verifying power state. |
56+
| `Inspecting` | Updating firmware, applying BIOS settings, and configuring storage. |
57+
| `Available` | BMM is ready to install OS. |
58+
| `Provisioning` | OS image installing on the BMM. BMM will attempt to join cluster. |
59+
| `Provisioned` | BMM successfully provisioned and joined to cluster. |
60+
| `Deprovisioning` | BMM provisioning failed. Provisioning service is cleaning up resource for retry. |
61+
| `Failed` | BMM provisioning failed and requires recovery action. All retries exhausted. |
6262

6363
During any phase, the BMM detailed status is set to failed and the phase is blocked if any of the following occurs:
6464
- BMC is unavailable
@@ -79,13 +79,13 @@ Where the output is defined as follows:
7979
| Output | Definition |
8080
| --- | --- |
8181
| BMM_NAME | BMM name |
82-
| RSTATE | Cluster participation status (`True`,`False`) |
83-
| PROV_STATE | Provisioning state (`Succeeded`,`Failed`) |
84-
| STATUS | Provisioning detailed status (`Registering`,`Preparing`,`Inspecting`,`Available`,`Provisioning`,`Provisioned`,`Deprovisioning`,`Failed`) |
85-
| STATUS_MSG | Detailed provisioning status message |
86-
| POWER_STATE | Power state of BMM (`On`,`Off`) |
87-
| BMM_ROLE | BMM cluster role contains (`control-plane`,`management-plane`,`compute-plane`) |
88-
| CREATE_DATE | BMM creation date |
82+
| RSTATE | Cluster participation status (`True`,`False`). |
83+
| PROV_STATE | Provisioning state (`Succeeded`,`Failed`). |
84+
| STATUS | Provisioning detailed status (`Registering`,`Preparing`,`Inspecting`,`Available`,`Provisioning`,`Provisioned`,`Deprovisioning`,`Failed`). |
85+
| STATUS_MSG | Detailed provisioning status message. |
86+
| POWER_STATE | Power state of BMM (`On`,`Off`). |
87+
| BMM_ROLE | BMM cluster role (`control-plane`,`management-plane`,`compute-plane`). |
88+
| CREATE_DATE | BMM creation date. |
8989

9090
For example:
9191
```azurecli
@@ -98,7 +98,7 @@ To show details and status of a single BMM:
9898
```azurecli
9999
az networkcloud baremetalmachine show -g $CLUSTER_MRG -n $BMM_NAME
100100
```
101-
For additional BMM details used in troubleshooting:
101+
For BMM details specific to troubleshooting:
102102
```azurecli
103103
az networkcloud baremetalmachine show -g $CLUSTER_MRG -n $BMM_NAME --query "{name:name,BootMAC:bootMacAddress,BMCMAC:bmcMacAddress,Connect:bmcConnectionString,SN:serialNumber,rackId:rackId,RackSlot:rackSlot}" -o table
104104
```
@@ -109,17 +109,17 @@ The following conditions can cause provisioning failures:
109109

110110
| Error Type | Resolution |
111111
| ---------- | ---------- |
112-
| BMC shows `Backplane Comm` critical error | Remote flea drain, physical flea drain, BMM replace action |
113-
| Boot network data response empty from BMC | Bounce port on fabric device, remote flea drain, physical flea drain, BMM replace action |
114-
| Disk data response empty from BMC | Reseat disk, re-seat storage controller, remote flea drain, physical flea drain, BMM replace action |
115-
| BMC unreachable | Bounce port on fabric device, reseat cable, remote flea drain, physical flea drain, BMM replace action |
116-
| BMC fails log in | Update credentials on BMC, BMM replace action |
117-
| DIMM, CPU, OEM critical errors | Resolve hardware issue, BMM replace action |
118-
| Console stuck at grub menu | Reset NVRAM, BMM replace action |
112+
| BMC shows `Backplane Comm` critical error | Remote flea drain; Physical flea drain; BMM replace action. |
113+
| Boot network data response empty from BMC | Bounce port on fabric device; Remote flea drain; Physical flea drain; BMM replace action |
114+
| Disk data response empty from BMC | Reseat disk; Reseat storage controller; Remote flea drain; Physical flea drain; BMM replace action |
115+
| BMC unreachable | Bounce port on fabric device; Reseat cable; Remote flea drain, Physical flea drain; BMM replace action |
116+
| BMC fails log in | Update credentials on BMC; BMM replace action |
117+
| Memory, CPU, OEM critical errors | Resolve hardware issue; BMM replace action |
118+
| Console stuck at grub menu | Reset NVRAM; BMM replace action |
119119

120120
### Azure BMM activity log
121121

122-
1. Log in to [Azure Portal](https://portal.azure.com/).
122+
1. Log in to [Azure portal](https://portal.azure.com/).
123123
2. Search on the BMM name in the top `Search` box.
124124
3. Select the `Bare Metal Machine (Operator Nexus)` from the search results.
125125
4. Select `Activity log` on the left side menu.
@@ -168,11 +168,11 @@ Attempt to run ping against the BMC IPv4 address:
168168
```
169169

170170
### Reset port on fabric device
171-
If the BMC_IP is not responsive, a reset of the fabric device port retriggers autonegotiation on the port and may bring it back online.
171+
If the BMC_IP isn't responsive, a reset of the fabric device port retriggers autonegotiation on the port and may bring it back online.
172172

173173
To find the `Network Fabric` port from Azure:
174174
1. Obtain the `RackID` and `RackSlot` from the previous `BMM Details` section.
175-
2. In `Azure Portal`, drill down to the `Network Rack` RackID for the BMM.
175+
2. In Azure portal, drill down to the `Network Rack` RackID for the BMM.
176176
3. Select `Network Devices` tab and the management (Mgmt) switch for the rack.
177177
4. Under `Resources`, select `Network Interfaces` and then the BMC (iDRAC) or boot (PXE) interface for the port that requires reset.
178178

0 commit comments

Comments
 (0)