-
Notifications
You must be signed in to change notification settings - Fork 217
Open
Milestone
Description
I'm not sure what's happening, or if this is even the right place to file the issue, but I don't have a better idea.
On Tuesday, I found the sidecar on dublin's sled 16 failed apparently with a PCI error. I filed this as oxidecomputer/dendrite#173. I captured the scrimlet and dendrite state, but didn't look at the sidecar itself.
On Wednesday, I found both sidecars on madrid powered off. Will looked at one of the sidecar SPs and found a possible thermal issue:
humility: ring buffer task_thermal::__RINGBUF in thermal:
TOTAL VARIANT
45223 ControlPwm
16 AutoState(Boot)
4 AutoState(Running)
1 AutoState(Overheated)
1 AutoState(Uncontrollable)
13 AddedDynamicInput
8 FanAdded
6 RemovedDynamicInput
2 PowerModeChanged
2 FanControllerInitialized
1 Start
1 ThermalMode(Auto)
1 CriticalDueTo
1 PowerDownAt
1 SetFanWatchdogOk
Today (Friday) I tried to use london and again found both sidecars powered off:
03:09:55 castle:/data/local/env/dublin/nils$ echo $PILOT_RACK
london
03:10:05 castle:/data/local/env/dublin/nils$ pilot sp st BRM44220013
BRM44220013 off (A2)
03:16:25 castle:/data/local/env/dublin/nils$ pilot sp st BRM44220004
BRM44220004 off (A2)
I haven't found the humility archive yet, so haven't looked any deeper.
Metadata
Metadata
Assignees
Labels
No labels