You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/operator-nexus/troubleshoot-hardware-validation-failure.md
+36-36Lines changed: 36 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -50,7 +50,7 @@ Expanding `result_detail` for a given category shows detailed results.
50
50
* Memory/RAM Related Failure (memory_capacity_GB) (measured in GiB)
51
51
* Memory specs are defined in the SKU. Memory below threshold value indicates missing or failed Dual In-Line Memory Module (DIMM). A failed DIMM would also be reflected in the `health_info` category. The following example shows a failed memory check.
52
52
53
-
```json
53
+
```yaml
54
54
{
55
55
"field_name": "memory_capacity_GB",
56
56
"comparison_result": "Fail",
@@ -74,7 +74,7 @@ Expanding `result_detail` for a given category shows detailed results.
74
74
* CPU Related Failure (cpu_sockets)
75
75
* CPU specs are defined in the SKU. Failed `cpu_sockets` check indicates a failed CPU or CPU count mismatch. The following example shows a failed CPU check.
76
76
77
-
```json
77
+
```yaml
78
78
{
79
79
"field_name": "cpu_sockets",
80
80
"comparison_result": "Fail",
@@ -98,7 +98,7 @@ Expanding `result_detail` for a given category shows detailed results.
98
98
* Model Check Failure (Model)
99
99
* Failed `Model` check indicates that wrong server is racked in the slot or there's a cabling mismatch. The following example shows a failed model check.
100
100
101
-
```json
101
+
```yaml
102
102
{
103
103
"field_name": "Model",
104
104
"comparison_result": "Fail",
@@ -122,7 +122,7 @@ Expanding `result_detail` for a given category shows detailed results.
122
122
* Serial Number Check Failure (Serial_Number)
123
123
* The server's serial number, also referred as the service tag, is defined in the cluster. Failed `Serial_Number` check indicates a mismatch between the serial number in the cluster and the actual serial number of the machine. The following example shows a failed serial number check.
124
124
125
-
```json
125
+
```yaml
126
126
{
127
127
"field_name": "Serial_Number",
128
128
"comparison_result": "Fail",
@@ -146,7 +146,7 @@ Expanding `result_detail` for a given category shows detailed results.
146
146
* iDRAC License Check Failure
147
147
* All iDRACs require a perpetual/production iDRAC datacenter or enterprise license. Trial licenses are valid for only 30 days. A failed `iDRAC License Check` indicates that the required iDRAC license is missing. The following examples show a failed iDRAC license check for a trial license and missing license respectively.
148
148
149
-
```json
149
+
```yaml
150
150
{
151
151
"field_name": "iDRAC License Check",
152
152
"comparison_result": "Fail",
@@ -155,7 +155,7 @@ Expanding `result_detail` for a given category shows detailed results.
155
155
}
156
156
```
157
157
158
-
```json
158
+
```yaml
159
159
{
160
160
"field_name": "iDRAC License Check",
161
161
"comparison_result": "Fail",
@@ -171,7 +171,7 @@ Expanding `result_detail` for a given category shows detailed results.
171
171
* Firmware Version Checks
172
172
* Firmware version checks were introduced in release 3.9. The following example shows the expected log for release versions before 3.9.
173
173
174
-
```json
174
+
```yaml
175
175
{
176
176
"system_info": {
177
177
"system_info_result": "Pass",
@@ -184,7 +184,7 @@ Expanding `result_detail` for a given category shows detailed results.
184
184
185
185
* Firmware versions are determined based on the `cluster version` value in the cluster object. The following example shows a failed check due to indeterminate cluster version. If this problem is encountered, verify the version in the cluster object.
186
186
187
-
```json
187
+
```yaml
188
188
{
189
189
"system_info": {
190
190
"system_info_result": "Fail",
@@ -200,7 +200,7 @@ Expanding `result_detail` for a given category shows detailed results.
200
200
* Disk Checks Failure
201
201
* Drive specs are defined in the SKU. Mismatched capacity values indicate incorrect drives or drives inserted in to incorrect slots. Missing capacity and type fetched values indicate drives that are failed, missing, or inserted in to incorrect slots.
202
202
203
-
```json
203
+
```yaml
204
204
{
205
205
"field_name": "Disk_0_Capacity_GB",
206
206
"comparison_result": "Fail",
@@ -209,7 +209,7 @@ Expanding `result_detail` for a given category shows detailed results.
209
209
}
210
210
```
211
211
212
-
```json
212
+
```yaml
213
213
{
214
214
"field_name": "Disk_0_Capacity_GB",
215
215
"comparison_result": "Fail",
@@ -218,7 +218,7 @@ Expanding `result_detail` for a given category shows detailed results.
218
218
}
219
219
```
220
220
221
-
```json
221
+
```yaml
222
222
{
223
223
"field_name": "Disk_0_Type",
224
224
"comparison_result": "Fail",
@@ -244,7 +244,7 @@ Expanding `result_detail` for a given category shows detailed results.
244
244
* Network Interface Cards (NIC) Check Failure
245
245
* Dell server NIC specs are defined in the SKU. A mismatched link status indicates loose or faulty cabling or crossed cables. A mismatched model indicates incorrect NIC card is inserted in to slot. Missing link/model fetched values indicate NICs that are failed, missing, or inserted in to incorrect slots.
246
246
247
-
```json
247
+
```yaml
248
248
{
249
249
"field_name": "NIC.Slot.3-1-1_LinkStatus",
250
250
"comparison_result": "Fail",
@@ -253,7 +253,7 @@ Expanding `result_detail` for a given category shows detailed results.
253
253
}
254
254
```
255
255
256
-
```json
256
+
```yaml
257
257
{
258
258
"field_name": "NIC.Embedded.2-1-1_LinkStatus",
259
259
"comparison_result": "Fail",
@@ -262,7 +262,7 @@ Expanding `result_detail` for a given category shows detailed results.
262
262
}
263
263
```
264
264
265
-
```json
265
+
```yaml
266
266
{
267
267
"field_name": "NIC.Slot.3-1-1_Model",
268
268
"comparison_result": "Fail",
@@ -271,7 +271,7 @@ Expanding `result_detail` for a given category shows detailed results.
271
271
}
272
272
```
273
273
274
-
```json
274
+
```yaml
275
275
{
276
276
"field_name": "NIC.Slot.3-1-1_LinkStatus",
277
277
"comparison_result": "Fail",
@@ -280,7 +280,7 @@ Expanding `result_detail` for a given category shows detailed results.
280
280
}
281
281
```
282
282
283
-
```json
283
+
```yaml
284
284
{
285
285
"field_name": "NIC.Slot.3-1-1_Model",
286
286
"comparison_result": "Fail",
@@ -310,7 +310,7 @@ Expanding `result_detail` for a given category shows detailed results.
310
310
* NIC Check L2 Switch Information
311
311
* HWV reports L2 switch information for each of the server interfaces. The switch connection ID (switch interface MAC) and switch port connection ID (switch interface label) are informational.
@@ -331,7 +331,7 @@ Expanding `result_detail` for a given category shows detailed results.
331
331
* Cabling Checks for Bonded Interfaces
332
332
* Mismatched cabling is reported in the result_log. Cable check validates that that bonded NICs connect to switch ports with same Port ID. In the following example Peripheral Component Interconnect (PCI) 3/1 and 3/2 connect to "Ethernet1/1" and "Ethernet1/3" respectively on TOR, triggering a failure for HWV.
333
333
334
-
```json
334
+
```yaml
335
335
{
336
336
"network_info": {
337
337
"network_info_result": "Fail",
@@ -357,7 +357,7 @@ Expanding `result_detail` for a given category shows detailed results.
357
357
* iDRAC (BMC) MAC Address Check Failure
358
358
* The iDRAC MAC address is defined in the cluster for each BMM. A failed `iDRAC_MAC` check indicates a mismatch between the iDRAC/BMC MAC in the cluster and the actual MAC address retrieved from the machine.
359
359
360
-
```json
360
+
```yaml
361
361
{
362
362
"field_name": "iDRAC_MAC",
363
363
"comparison_result": "Fail",
@@ -371,7 +371,7 @@ Expanding `result_detail` for a given category shows detailed results.
371
371
* Preboot execution environment (PXE) MAC Address Check Failure
372
372
* The PXE MAC address is defined in the cluster for each BMM. A failed `PXE_MAC` check indicates a mismatch between the PXE MAC in the cluster and the actual MAC address retrieved from the machine.
373
373
374
-
```json
374
+
```yaml
375
375
{
376
376
"field_name": "NIC.Embedded.1-1_PXE_MAC",
377
377
"comparison_result": "Fail",
@@ -387,7 +387,7 @@ Expanding `result_detail` for a given category shows detailed results.
387
387
* Health Check Sensor Failure
388
388
* Server health checks cover various hardware component sensors. A failed health sensor indicates a problem with the corresponding hardware component. The following examples indicate fan, drive, and CPU failures respectively.
389
389
390
-
```json
390
+
```yaml
391
391
{
392
392
"field_name": "System Board Fan1A",
393
393
"comparison_result": "Fail",
@@ -396,7 +396,7 @@ Expanding `result_detail` for a given category shows detailed results.
396
396
}
397
397
```
398
398
399
-
```json
399
+
```yaml
400
400
{
401
401
"field_name": "Solid State Disk 0:1:1",
402
402
"comparison_result": "Fail",
@@ -405,7 +405,7 @@ Expanding `result_detail` for a given category shows detailed results.
405
405
}
406
406
```
407
407
408
-
```json
408
+
```yaml
409
409
{
410
410
"field_name": "CPU.Socket.1",
411
411
"comparison_result": "Fail",
@@ -429,7 +429,7 @@ Expanding `result_detail` for a given category shows detailed results.
429
429
* Health Check LifeCycle (LC) Log Failures
430
430
* Dell server health checks fail for recent Critical LC Log Alarms. The hardware validation plugin logs the alarm ID, name, and timestamp. Recent LC Log's critical alarms indicate need for further investigation. The following example shows a failure for a critical backplane voltage alarm.
431
431
432
-
```json
432
+
```yaml
433
433
{
434
434
"field_name": "LCLog_Critical_Alarms",
435
435
"comparison_result": "Fail",
@@ -441,7 +441,7 @@ Expanding `result_detail` for a given category shows detailed results.
441
441
* Virtual disk errors typically indicate a RAID cleanup false positive condition and are logged due to the timing of raid cleanup and system power off pre HWV. The following example shows an LC log critical error on virtual disk 238. If multiple errors are encountered blocking deployment, delete cluster, wait two hours, then reattempt cluster deployment. If the failures aren't deployment blocking, wait two hours then run BMM replace.
442
442
* Virtual disk errors are allowlisted starting with release 3.13 and don't trigger a health check failure.
443
443
444
-
```json
444
+
```yaml
445
445
{
446
446
"field_name": "LCLog_Critical_Alarms",
447
447
"comparison_result": "Fail",
@@ -465,7 +465,7 @@ Expanding `result_detail` for a given category shows detailed results.
465
465
* Health Check Server Power Control Action Failures
466
466
* Dell server health checks fail for failed server power-up or failed iDRAC reset. A failed server control action indicates an underlying hardware issue. The following example shows failed power on attempt.
467
467
468
-
```json
468
+
```yaml
469
469
{
470
470
"field_name": "Server Control Actions",
471
471
"comparison_result": "Fail",
@@ -474,7 +474,7 @@ Expanding `result_detail` for a given category shows detailed results.
474
474
}
475
475
```
476
476
477
-
```json
477
+
```yaml
478
478
"result_log": [
479
479
"Server power up failed with: server OS is powered off after successful power on attempt",
480
480
]
@@ -495,7 +495,7 @@ Expanding `result_detail` for a given category shows detailed results.
495
495
* RAID Cleanup Failures
496
496
* RAID cleanup was added to HWV in release 3.13. As part of RAID cleanup the RAID controller configuration is reset. Dell server health check fails for RAID controller reset failure. A failed RAID cleanup action indicates an underlying hardware issue. The following example shows a failed RAID controller reset.
497
497
498
-
```json
498
+
```yaml
499
499
{
500
500
"field_name": "Server Control Actions",
501
501
"comparison_result": "Fail",
@@ -504,7 +504,7 @@ Expanding `result_detail` for a given category shows detailed results.
504
504
}
505
505
```
506
506
507
-
```json
507
+
```yaml
508
508
"result_log": [
509
509
"RAID cleanup failed with: raid deletion failed after 2 attempts",
510
510
]
@@ -527,7 +527,7 @@ Expanding `result_detail` for a given category shows detailed results.
527
527
* Health Check Power Supply Failure and Redundancy Considerations
528
528
* Dell server health checks warn when one power supply is missing or failed. Power supply "field_name" might be displayed as 0/PS0/Power Supply 0 and 1/PS1/Power Supply 1 for the first and second power supplies respectively. A failure of one power supply doesn't trigger an HWV device failure.
529
529
530
-
```json
530
+
```yaml
531
531
{
532
532
"field_name": "Power Supply 1",
533
533
"comparison_result": "Warning",
@@ -536,7 +536,7 @@ Expanding `result_detail` for a given category shows detailed results.
536
536
}
537
537
```
538
538
539
-
```json
539
+
```yaml
540
540
{
541
541
"field_name": "System Board PS Redundancy",
542
542
"comparison_result": "Warning",
@@ -563,7 +563,7 @@ Expanding `result_detail` for a given category shows detailed results.
563
563
* The `boot_device_name` check is currently informational.
564
564
* Mismatched boot device name shouldn't trigger a device failure.
565
565
566
-
```json
566
+
```yaml
567
567
{
568
568
"field_name": "boot_device_name",
569
569
"comparison_result": "Info",
@@ -578,7 +578,7 @@ Expanding `result_detail` for a given category shows detailed results.
578
578
* Failed `pxe_device_1_name` or `pxe_device_1_state` checks indicate a problem with the PXE configuration.
579
579
* Failed settings need to be fixed to enable system boot during deployment.
580
580
581
-
```json
581
+
```yaml
582
582
{
583
583
"field_name": "pxe_device_1_name",
584
584
"comparison_result": "Fail",
@@ -587,7 +587,7 @@ Expanding `result_detail` for a given category shows detailed results.
587
587
}
588
588
```
589
589
590
-
```json
590
+
```yaml
591
591
{
592
592
"field_name": "pxe_device_1_state",
593
593
"comparison_result": "Fail",
@@ -615,7 +615,7 @@ Expanding `result_detail` for a given category shows detailed results.
615
615
* Device Login Check Considerations
616
616
* The `device_login` check fails if the iDRAC isn't accessible or if the hardware validation plugin isn't able to sign-in.
0 commit comments