You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`Degraded: NIC failed`|[`Degraded: NIC failed`](#degraded-nic-failed)|
28
29
|`Degraded: port down`|[`Degraded: port down`](#degraded-port-down)|
29
-
|`Degraded: port flapping`|[`Degraded: port flapping`](#degraded-port-flapping)|
30
30
|`Degraded: LACP status is down`|[`Degraded: LACP status is down`](#degraded-lacp-status-is-down)|
31
+
|`Degraded: port flapping`|[`Degraded: port flapping`](#degraded-port-flapping)|
31
32
32
33
_Degraded_ status messages and associated automatic cordoning behavior are present in Azure Operator Nexus version 2502.1 and higher.
33
34
@@ -131,7 +132,7 @@ This example shows an automatically cordoned BMM with two active _Degraded_ cond
131
132
"cordonStatus": "Cordoned",
132
133
"degradedStartTime": "2025-03-04T03:27:00Z",
133
134
"detailedStatus": "Provisioned",
134
-
"detailedStatusMessage": "The OS is provisioned to the machine. Degraded: port flapping Degraded: port down",
135
+
"detailedStatusMessage": "The OS is provisioned to the machine. Degraded: port flapping Degraded: port down"
135
136
}
136
137
}
137
138
```
@@ -150,6 +151,34 @@ Note: only BMMs used for _Compute_ are automatically cordoned. Control and Manag
150
151
151
152
For more information about investigating the root cause of an automatic cordon, see [Troubleshooting](#troubleshooting).
152
153
154
+
## `Degraded: NIC Failed`
155
+
156
+
This message indicates that one of the expected Mellanox Network Interface Cards (NICs) on the underlying compute host is failed or missing.
157
+
This message typically indicates a hardware failure on the NIC, or that the card isn't correctly seated in the host.
158
+
159
+
To troubleshoot this issue:
160
+
161
+
- to identify the nonoperational NIC, check the Ethernet link status indicators on the underlying compute host
162
+
- check that the NIC is correctly installed and seated
163
+
- sign into the Baseboard Management Controller (BMC) to check the hardware status of the NIC
164
+
- review detailed hardware logs by generating a Dell TSR (Technical Support Report) as described in the Dell Knowledge Base article [Export a SupportAssist Collection Using an iDRAC](https://www.dell.com/support/kbdoc/en-us/000126308/export-a-supportassist-collection-via-idrac9)
165
+
- review the most recent time of failure reported by the Bare Metal Machine `conditions`, as described in the [Troubleshooting](#troubleshooting) section
166
+
- power cycle the host by executing a "Restart" action on the Bare Metal Machine resource, and see if the condition clears.
167
+
168
+
**Example `conditions` output for NIC failed**
169
+
170
+
```json
171
+
"conditions": [
172
+
{
173
+
"lastTransitionTime": "2025-05-21T16:49:29Z",
174
+
"message": "Expected 2 devices in oam-bond, found 1: 98_pf0vf0_vf",
175
+
"reason": "OamDevicesUnhealthy",
176
+
"status": "False",
177
+
"type": "BmmNicsHealthy"
178
+
},
179
+
],
180
+
```
181
+
153
182
## `Degraded: port down`
154
183
155
184
This message in the BMM _Detailed status message_ field indicates that the physical link is down on one or more of the Mellanox interfaces on the underlying compute host.
0 commit comments