You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/operator-nexus/troubleshoot-hardware-validation-failure.md
+55Lines changed: 55 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -168,6 +168,32 @@ Expanding `result_detail` for a given category shows detailed results.
168
168
169
169
`BMC` -> `Configuration` -> `Licenses`
170
170
171
+
* Firmware Version Checks
172
+
* Firmware version checks were introduced in release 3.9. The following example shows the expected log for release versions before 3.9.
173
+
174
+
```json
175
+
{
176
+
"system_info": {
177
+
"system_info_result": "Pass",
178
+
"result_log": [
179
+
"Firmware validation not supported in release 3.8"
180
+
]
181
+
},
182
+
}
183
+
```
184
+
185
+
* Firmware versions are determined based on the `cluster version` value in the cluster object. The following example shows a failed check due to indeterminate cluster version. If this problem is encountered, verify the version in the cluster object.
186
+
187
+
```json
188
+
{
189
+
"system_info": {
190
+
"system_info_result": "Fail",
191
+
"result_log": [
192
+
"Unable to determine firmware release"
193
+
]
194
+
},
195
+
}
196
+
```
171
197
172
198
### Drive info category
173
199
@@ -412,6 +438,17 @@ Expanding `result_detail` for a given category shows detailed results.
412
438
}
413
439
```
414
440
441
+
* Virtual disk errors typically indicate a RAID cleanup false positive condition and are logged due to the timing of raid cleanup and system power off pre HWV. The following example shows an LC log critical error on virtual disk 238. If multiple errors are encountered blocking deployment, delete cluster, wait two hours, then reattempt cluster deployment. If the failures aren't deployment blocking, wait two hours then run BMM replace.
442
+
443
+
```json
444
+
{
445
+
"field_name": "LCLog_Critical_Alarms",
446
+
"comparison_result": "Fail",
447
+
"expected": "No Critical Errors",
448
+
"fetched": "104473 2024-07-26T16:05:19-05:00 Virtual Disk 238 on RAID Controller in SL 3 has failed."
449
+
}
450
+
```
451
+
415
452
* To check LC logs in BMC webui:
416
453
417
454
`BMC` -> `Maintenance` -> `Lifecycle Log`
@@ -562,6 +599,24 @@ Expanding `result_detail` for a given category shows detailed results.
562
599
563
600
* To troubleshoot, ping the iDRAC from a jumpbox with access to the BMC network. If iDRAC pings check that passwords match.
564
601
602
+
### Special considerations
603
+
604
+
* Servers Failing Multiple Health and Network Checks
605
+
* Raid deletion is performed during cluster deploy and cluster delete actions for all releases inclusive of 3.12.
606
+
* If we observe servers getting powered off during hardware validation with multiple failed health and network checks, we need to reattempt cluster deployment.
607
+
* If issues persist, raid deletion needs to be performed manually on `control` nodes in the cluster.
0 commit comments