Skip to content

FM: design for collecting ereports from the host through sled-agent #10163

@hawkw

Description

@hawkw

We'll have to figure this out eventually.

Some questions on my mind:

  • Do we do this by just having sled-agent shell out to fmdump -ej or whatever makes it list ereports and format the NVLists as JSON? Or do we want to have proper Rust bindings for the FMA C libraries?
  • I suspect that the host is a lot better at spewing out a giant pile of identical ereports than Hubris, so we may actually want to worry about deduplication/debouncing a bit more here...
  • Can we reuse the sled UUID as the restart ID? That would make stuff easier. Figure out whether this is practical.
    • Note that unlike in Hubris, all this stuff is getting written to disk, so we don't inherently lose data on a sled-agent restart; the restart ID may not need to be tied to the lifetime of a single sled-agent process. Figure out where the restart boundary actually is and where data loss occurs on the host side.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Sled AgentRelated to the Per-Sled Configuration and Managementfault-managementEverything related to the fault-management initiative (RFD480 and others)nexusRelated to nexus
    No fields configured for Enhancement.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions