You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[nexus] Add part to service bundle ereport paths (#8767)
In #8739, I added code for collecting ereports into support bundles
which stores the ereport JSON in directories for each sled/switch/PSC
serial number from which an ereport was received. Unfortunately, I
failed failed to consider that the version 1 Oxide serial numbers are
only unique within the namespace of a particular part, and not globally
--- so (for example) a switch and a compute sled may have colliding
serials. This means that the current code could incorrectly group
ereports reported by two totally different devices. While the ereport
JSON files _do_ contain additional information that disambiguates this
(it includes includes the part number, as well as MGS metadata with the
SP type for SP ereports), and restart IDs are additionally capable of
distinguishing between reporters, putting ereports from two different
systems within the same directory still has the potential to be quite
misleading.
Thus, this branch changes the paths for ereports to include the part
number as well as the serial number, in the format:
```
{part_number}-{serial_number}/{restart_id}/{ENA}.json
```
In order to include part numbers for host OS ereports, I decided to add
a part number column to the `host_ereport` table as well. Initially, I
had opted not to do this, as I was thinking that, since `host_ereport`
includes a sled UUID, we could just join with the `sled` table to get
the part number. However, it occurred to me that ereports may be
received from a sled that's later expunged from the rack, and the `sled`
record for the sled may eventually be deleted, so such a join would
fail. We might retain such ereports past the lifetime of the sled in the
rack. So, I thought it was better to always include the part number in
the ereport record.
I've added a migration that attempts to backfill the
`host_ereport.part_number` column from the `sled` table for existing
host OS ereport records. In practice, this won't do anything, since
we're not collecting them yet,but it seemed nice to have. Sadly, the
column had to be left nullable, since we may theoretically encounter an
ereport with a sled UUID that points to an already-deleted sled record,
but...whatever. Since there aren't currently any host OS ereport records
anyway, this shouldn't happen, and we'll just handle the nullability;
this isn't terrible as we must already do so for SP ereport records.
Fixes#8765
0 commit comments