You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/operator-nexus/howto-bare-metal-best-practices.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,7 +67,7 @@ See related articles:
67
67
-[How to monitor interface In and Out packet rate for network fabric devices]
68
68
-[How to configure diagnostic settings and monitor configuration differences in Nexus Network Fabric].
69
69
70
-
Evaluate for any Bare Metal Machine warnings or degraded conditions which could indicate the need to resolve hardware, network, or server configuration problems.
70
+
Evaluate for any Bare Metal Machine warnings or degraded conditions that could indicate the need to resolve hardware, network, or server configuration problems.
71
71
For more information, see [Troubleshoot Degraded Status Errors on Bare Metal Machines] and [Troubleshoot Bare Metal Machine Warning Status].
72
72
73
73
#### Determine if firmware update jobs are running
@@ -88,7 +88,7 @@ az networkcloud baremetalmachine run-read-command \
88
88
--output-directory .
89
89
```
90
90
91
-
Here's an example output from the `racadm jobqueue view` command which shows `Firmware Update`.
91
+
Here's an example output from the `racadm jobqueue view` command that shows `Firmware Update`.
92
92
93
93
```
94
94
[Job ID=JID_833540920066]
@@ -127,15 +127,15 @@ Percent Complete=[100]
127
127
128
128
#### Monitor status in Bare Metal Machine JSON properties
129
129
130
-
In version 2509.1 and above, you can view the status of any recent or in progress actions in the `JSON View` of the corresponding Bare Metal Machine (Operator Nexus) resource. This shows the following information in the Bare Metal Machine JSON properties, when using API Version `2025-07-01-preview` or higher.
130
+
In version 2509.1 and above, you can view the status of any recent or in progress actions in the `JSON View` of the corresponding Bare Metal Machine (Operator Nexus) resource. This view shows the following information in the Bare Metal Machine JSON properties, when using API Version `2025-07-01-preview` or higher.
131
131
132
132
- Start and end time of the action.
133
-
- Status of the action (e.g., `Succeeded`, `Failed`, `InProgress`).
134
-
- Any additional context or error message associated with the status.
133
+
- Status of the action (`Succeeded`, `Failed`, or`InProgress`).
134
+
- Any extra context or error message associated with the status.
135
135
- The Correlation ID for the original action request, as shown in the Azure Activity Log for the resource.
136
-
- The detailed steps included in the action, such as `Hardware Validation`, `Deprovisioning`, `Provisioning` and `Cloud Init` for a BMM Replace action.
136
+
- The detailed steps included in the action, such as `Hardware Validation`, `Deprovisioning`, `Provisioning`, and `Cloud Init` for a BMM Replace action.
137
137
138
-
The most recent (or currently in progress) occurrence of each action type is shown (Replace, Reimage, Restart etc).
138
+
The most recent occurrence of each action type is shown, including any currently in-progress action.
139
139
140
140
Example output for a Replace action:
141
141
@@ -202,7 +202,7 @@ Before initiating any `reimage` operation, ensure the following preconditions ar
202
202
203
203
- Make sure the Bare Metal Machine's workloads are drained using the [`cordon`](./howto-baremetal-functions.md#make-a-bare-metal-machine-unschedulable-cordon) command with the parameter `evacuate` set to `True`.
204
204
- Perform high level checks covered in the article [Troubleshoot Bare Metal Machine Provisioning].
205
-
- Evaluate any Bare Metal Machine warnings or degraded conditions which could indicate the need to resolve hardware, network, or server configuration problems before a `reimage` operation.
205
+
- Evaluate any Bare Metal Machine warnings or degraded conditions that could indicate the need to resolve hardware, network, or server configuration problems before a `reimage` operation.
206
206
For more information, read [Troubleshoot Degraded Status Errors on Bare Metal Machines] and [Troubleshoot Bare Metal Machine Warning Status].
207
207
- If the Bare Metal Machine reports a failed state with the reason of hardware validation (seen in the Bare Metal Machine `Detailed Status` and `Detailed Status Message` fields), then the Bare Metal Machine needs a `replace` instead.
208
208
See the [Best Practices for a Bare Metal Machine Replace](#best-practices-for-a-bare-metal-machine-replace).
@@ -230,7 +230,7 @@ Before initiating any `replace` operation, ensure the following preconditions ar
230
230
231
231
- Make sure the Bare Metal Machine's workloads are drained using the [`cordon`](./howto-baremetal-functions.md#make-a-bare-metal-machine-unschedulable-cordon) command with the parameter `evacuate` set to `True`.
232
232
- Perform high level checks covered in the article [Troubleshoot Bare Metal Machine Provisioning].
233
-
- Evaluate any Bare Metal Machine warnings or degraded conditions which could indicate the need to resolve hardware, network, or server configuration problems before a `replace` operation.
233
+
- Evaluate any Bare Metal Machine warnings or degraded conditions that could indicate the need to resolve hardware, network, or server configuration problems before a `replace` operation.
234
234
For more information, see [Troubleshoot Degraded Status Errors on Bare Metal Machines] and [Troubleshoot Bare Metal Machine Warning Status].
235
235
- Validate Bare Metal Machine is powered on.
236
236
- Validate that there are no running firmware upgrade jobs.
Copy file name to clipboardExpand all lines: articles/operator-nexus/howto-baremetal-functions.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,7 +85,7 @@ Existing workloads continue to run on the Bare Metal Machine unless the workload
85
85
86
86
### Drain Bare Metal Machine workloads
87
87
88
-
The cordon command supports the `evacuate` parameter which its default value `False` means that the `cordon` command prevents scheduling new workloads.
88
+
The cordon command supports the `evacuate` parameter, for which its default value `False` means that the `cordon` command prevents scheduling new workloads.
89
89
To drain workloads with the `cordon` command, the `evacuate` parameter must be set to `True`.
90
90
The workloads running on the Bare Metal Machine are `stopped` and the Bare Metal Machine is set to `pending` state.
91
91
@@ -176,7 +176,7 @@ az networkcloud baremetalmachine replace \
176
176
177
177
If the `replace` action fails due to a hardware validation failure, the specific error or test failure is shown in the `replace` response, as shown in the following examples.
178
178
This information can also be found in the Activity Log for the Bare Metal Machine (Operator Nexus).
179
-
The error code and error message are included the JSON properties of the corresponding `BareMetalMachines_Replace` operation.
179
+
The error code and error message are also included in the JSON properties of the corresponding `BareMetalMachines_Replace` operation.
180
180
181
181
**Example 1: Hardware validation fails due to invalid Key Vault URI for Baseboard Management Controller (BMC) credentials**
Copy file name to clipboardExpand all lines: articles/operator-nexus/troubleshoot-bare-metal-machine-warning.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ ms.reviewer: ekarandjeff
12
12
13
13
# Troubleshoot _'Warning'_ detailed status messages on an Azure Operator Nexus Cluster Bare Metal Machine
14
14
15
-
This document provides basic troubleshooting information for Bare Metal Machine (BMM) resources which are reporting a _Warning_ message in the BMM detailed status message.
15
+
This document provides basic troubleshooting information for Bare Metal Machine (BMM) resources that are reporting a _Warning_ message in the BMM detailed status message.
16
16
17
17
## Symptoms
18
18
@@ -92,7 +92,7 @@ Review the `lastTransitionTime` and `message` fields for more information about
92
92
}
93
93
```
94
94
95
-
You can also check for any potentially related recent lifecycle actions (such as Restart or Power off actions) in the Azure portal. See [Monitor status in Bare Metal Machine JSON properties](./howto-bare-metal-best-practices.md#monitor-status-in-bare-metal-machine-json-properties). If available, this information is also visible in the output of the above`run-read-command` in the `actionStates` status field.
95
+
You can also check for any potentially related recent lifecycle actions (such as Restart or Power off actions) in the Azure portal. See [Monitor status in Bare Metal Machine JSON properties](./howto-bare-metal-best-practices.md#monitor-status-in-bare-metal-machine-json-properties). If available, this information is also visible in the output of the previous`run-read-command` in the `actionStates` status field.
96
96
97
97
## `Warning: PXE port is unhealthy`
98
98
@@ -116,8 +116,8 @@ To troubleshoot this issue:
116
116
- review the `conditions` status of the kubernetes `bmm` object, as described in the [Troubleshooting](#troubleshooting) section
117
117
- this information should identify the specific root cause (port down or port flapping) and approximate time of the issue
118
118
- check the Ethernet cabling and Top Of Rack (TOR) switch for the affected PXE port
119
-
- check for any other BMMs which are also reporting unhealthy PXE status or other network-related problems
120
-
- check for any recent deployment or infrastructure changes which coincide with the time of failure.
119
+
- check for any other BMMs that are also reporting unhealthy PXE status or other network-related problems
120
+
- check for any recent deployment or infrastructure changes that coincide with the time of failure.
121
121
122
122
**Example `conditions` output for PXE warning**
123
123
@@ -145,11 +145,11 @@ This message can indicate an issue with the underlying compute host or baseboard
145
145
To troubleshoot this issue:
146
146
147
147
- review the `conditions` status of the kubernetes `bmm` object, as described in the [Troubleshooting](#troubleshooting) section
148
-
- review the `actionStates` status field of the kubernetes `bmm` object for any recently initiated lifecycle actions (e.g. Restart or Power off) as described in the [Troubleshooting](#troubleshooting) section
148
+
- review the `actionStates` status field of the kubernetes `bmm` object for any recently initiated lifecycle actions (such as a Restart or Power off) as described in the [Troubleshooting](#troubleshooting) section
149
149
- this information should identify the approximate time of the issue and any other available details
150
150
- check the power feed, power cables, and physical hardware for the specified BMM
151
151
- check whether any other BMMs are also reporting an unexpected power state Warning, which might indicate a broader issue with the underlying infrastructure
152
-
- check for any recent deployment or infrastructure changes which coincide with the time of failure
152
+
- check for any recent deployment or infrastructure changes that coincide with the time of failure
153
153
- review the power state and logs on the BMC for the affected host.
154
154
155
155
For more information about logging into the BMC, see [Troubleshoot Hardware Validation Failure](./troubleshoot-hardware-validation-failure.md).
Copy file name to clipboardExpand all lines: articles/operator-nexus/troubleshoot-reboot-reimage-replace.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -180,7 +180,7 @@ Servers contain many physical components that can fail over time. It's important
180
180
A hardware validation process is invoked to ensure the integrity of the physical host in advance of deploying the OS image. Like the reimage action, the Tenant data isn't modified during replacement.
181
181
182
182
> [!IMPORTANT]
183
-
> When run with default options, the RAID controller is reset during BMM replace, wiping all data from the server's virtual disks. Baseboard Management Controller (BMC) virtual disk alerts triggered during BMM replace can be ignored unless there are other physical disk and/or RAID controllers alerts. Starting with the 2025-07-01preview version of the NetworkCloud API, and generally available with the 2025-09-01 GA version, use `replace` with `storage-policy="Preserve"` to retain virtual disk data.
183
+
> When run with default options, the RAID controller is reset during BMM replace, wiping all data from the server's virtual disks. Baseboard Management Controller (BMC) virtual disk alerts triggered during BMM replace can be ignored unless there are other physical disk and/or RAID controllers alerts. Starting with the `2025-07-01-preview` version of the NetworkCloud API, and generally available with the `2025-09-01` GA version, use `replace` with `storage-policy="Preserve"` to retain virtual disk data.
0 commit comments