Skip to content

Conversation

jgallagher
Copy link
Contributor

When trying Reconfigurator-based updates on a racklette today with a real TUF repo, the planner skipped all the RoT bootloader and RoT updates because none of the artifact names matched the boards reported in inventory. Our tests were assuming these were the same (and in prod they are the same on the SP), but in real TUF repos these don't match: all the RoT and bootloaders report a board of oxide-rot-1, but the artifact names are pretty varied:

                    name                   |          version           |          kind          |                               sign
-------------------------------------------+----------------------------+------------------------+-------------------------------------------------------------------
  gimlet_rot_bootloader-production-release | 1.4.0                      | gimlet_rot_bootloader  | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  gimlet_rot_bootloader-bart               | 1.4.0                      | gimlet_rot_bootloader  | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  gimlet_rot_bootloader-staging-devel      | 1.4.0                      | gimlet_rot_bootloader  | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  oxide-rot-1-staging-devel                | 1.0.35                     | gimlet_rot_image_a     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | gimlet_rot_image_a     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  oxide-rot-1-production-release           | 1.0.35                     | gimlet_rot_image_a     | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | gimlet_rot_image_a     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | gimlet_rot_image_b     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  oxide-rot-1-production-release           | 1.0.35                     | gimlet_rot_image_b     | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | gimlet_rot_image_b     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  oxide-rot-1-staging-devel                | 1.0.35                     | gimlet_rot_image_b     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  psc_rot_bootloader-staging-devel         | 1.4.0                      | psc_rot_bootloader     | f592d8f109b81881221eed5af6438abad9b5df8c220b9129c03763e7e10b22c7
  psc_rot_bootloader-bart                  | 1.4.0                      | psc_rot_bootloader     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  psc_rot_bootloader-production-release    | 1.4.0                      | psc_rot_bootloader     | 31942f8d53dc908c5cb338bdcecb204785fa87834e8b18f706fc972a42886c8b
  oxide-rot-1-production-release           | 1.0.35                     | psc_rot_image_a        | 31942f8d53dc908c5cb338bdcecb204785fa87834e8b18f706fc972a42886c8b
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | psc_rot_image_a        | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | psc_rot_image_a        | f592d8f109b81881221eed5af6438abad9b5df8c220b9129c03763e7e10b22c7
  oxide-rot-1-staging-devel                | 1.0.35                     | psc_rot_image_a        | f592d8f109b81881221eed5af6438abad9b5df8c220b9129c03763e7e10b22c7
  oxide-rot-1-production-release           | 1.0.35                     | psc_rot_image_b        | 31942f8d53dc908c5cb338bdcecb204785fa87834e8b18f706fc972a42886c8b
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | psc_rot_image_b        | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | psc_rot_image_b        | f592d8f109b81881221eed5af6438abad9b5df8c220b9129c03763e7e10b22c7
  oxide-rot-1-staging-devel                | 1.0.35                     | psc_rot_image_b        | f592d8f109b81881221eed5af6438abad9b5df8c220b9129c03763e7e10b22c7
  switch_rot_bootloader-production-release | 1.4.0                      | switch_rot_bootloader  | 5c69a42ee1f1e6cd5f356d14f81d46f8dbee783bb28777334226c689f169c0eb
  switch_rot_bootloader-bart               | 1.4.0                      | switch_rot_bootloader  | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  switch_rot_bootloader-staging-devel      | 1.4.0                      | switch_rot_bootloader  | 1432cc4cfe5688c51b55546fe37837c753cfbc89e8c3c6aabcf977fdf0c41e27
  oxide-rot-1-staging-devel                | 1.0.35                     | switch_rot_image_a     | 1432cc4cfe5688c51b55546fe37837c753cfbc89e8c3c6aabcf977fdf0c41e27
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | switch_rot_image_a     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | switch_rot_image_a     | 1432cc4cfe5688c51b55546fe37837c753cfbc89e8c3c6aabcf977fdf0c41e27
  oxide-rot-1-production-release           | 1.0.35                     | switch_rot_image_a     | 5c69a42ee1f1e6cd5f356d14f81d46f8dbee783bb28777334226c689f169c0eb
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | switch_rot_image_b     | 1432cc4cfe5688c51b55546fe37837c753cfbc89e8c3c6aabcf977fdf0c41e27
  oxide-rot-1-production-release           | 1.0.35                     | switch_rot_image_b     | 5c69a42ee1f1e6cd5f356d14f81d46f8dbee783bb28777334226c689f169c0eb
  oxide-rot-1-staging-devel                | 1.0.35                     | switch_rot_image_b     | 1432cc4cfe5688c51b55546fe37837c753cfbc89e8c3c6aabcf977fdf0c41e27
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | switch_rot_image_b     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d

A couple things of note here:

  1. The bootloader artifact names do not contain the string oxide-rot-1 at all.
  2. The oxide-rot-1-staging-devel and oxide-rot-1-selfsigned-staging-devel artifacts of each kind have the same SIGN value.

The first point means we can't (easily) determine which artifact matches the reported BORD, because we don't always have any artifact metadata that contains the board name. The second point means we can't ignore the BORDs entirely and work of just the SIGN.

Instead, this PR adds a new board value to the tuf_artifact table and the TufArtifactMeta struct. Outside of tests, the changes are almost identical to those from #8729 that added a sign field in this same spot. This allows us to stop assuming that artifact.name == inventory_caboose.board, and instead check artifact.board == inventory_caboose.board. It depends on oxidecomputer/tufaceous#40, which changes the way our fake data is generated to be more consistent with the state of production devices and artifacts. This perturbs some of the reconfiguartor-cli tests a bit, which significantly inflates the diff; the actual code changes here are pretty small.

Cargo.toml Outdated
tufaceous-artifact = { git = "https://github.com/oxidecomputer/tufaceous", branch = "main", features = ["proptest", "schemars"] }
tufaceous-brand-metadata = { git = "https://github.com/oxidecomputer/tufaceous", branch = "main" }
tufaceous-lib = { git = "https://github.com/oxidecomputer/tufaceous", branch = "main" }
tufaceous = { git = "https://github.com/oxidecomputer/tufaceous", branch = "john/fake-rot-distinct-sign" }
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • TODO: change this back to main

@@ -124,30 +124,42 @@ blueprint-diff latest
sled-set serial1 mupdate-override unset
inventory-generate

# This will attempt to update the first sled's host OS. Walk through that update
# and the host OS of the two other sleds.
# This will attempt to update the RoT bootloader on the first sled.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of followup to #8832 (comment). On that PR, I had to change this test to update the host OS so that we could move on to the planner doing things with zones, but in review, the question came up of "why don't we also have to do the same thing for the bootloader, RoT, and SP?". The superficial answer was "the fake TUF repo doesn't have any matching artifacts for those so the planner is skipping them", but I didn't dig into why that is. With the changes on this PR, the fake TUF repo now does have matching artifacts. So instead of just updating the host OS here, we have to do all the MGS-based updates for all three fake sleds.

// - "kind" matching one of the known RoT kinds
// - "sign" matching the rkth (found above from caboose)

if a.id.name != *board {
if a.board.as_ref() != Some(board) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (and the corresponding change in rot_bootloader.rs) is the actual bugfix.

// - "kind" matching one of the known SP kinds

if a.id.name != *board {
if a.board.as_ref() != Some(board) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this for the SP is not strictly required, but seems cleaner.

None,
),
make_artifact(
"oxide-rot-1",
"oxide-rot-1-fake-key",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you changed this to verify that we're not accidentally depending on the target's name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, exactly.

@jgallagher
Copy link
Contributor Author

Testing on dublin looks good; we see the board column matches what we expect (e.g., for gimlet; sidecar and psc are similar):

root@[fd00:1122:3344:104::4]:32221/omicron> select id,name,board,kind,version,sign from tuf_artifact order by kind,name asc;                                                                                                            (5 results) 16:09:18 [37/5492]
                   id                  |                   name                   |         board          |          kind          |          version           |                               sign
---------------------------------------+------------------------------------------+------------------------+------------------------+----------------------------+-------------------------------------------------------------------
  8b445290-da59-45c7-ab9a-546e3a770277 | gimlet_rot_bootloader-bart               | oxide-rot-1            | gimlet_rot_bootloader  | 1.4.0                      | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  ecff03d3-bf37-493f-aa70-1af43c7f81f3 | gimlet_rot_bootloader-production-release | oxide-rot-1            | gimlet_rot_bootloader  | 1.4.0                      | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  ca011026-7ac1-412b-a7a9-38b55e73b2df | gimlet_rot_bootloader-staging-devel      | oxide-rot-1            | gimlet_rot_bootloader  | 1.4.0                      | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  eb2ae451-c6cb-43ff-9a1e-83d1714e60a3 | oxide-rot-1-production-release           | oxide-rot-1            | gimlet_rot_image_a     | 1.0.35                     | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  ea91648a-a43b-46fa-8ff2-c55f178be6c1 | oxide-rot-1-selfsigned-bart              | oxide-rot-1-selfsigned | gimlet_rot_image_a     | 1.0.35                     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  e3de3af7-7dd0-4675-8c91-5b70c8bba0dc | oxide-rot-1-selfsigned-staging-devel     | oxide-rot-1-selfsigned | gimlet_rot_image_a     | 1.0.35                     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  417ba792-71d8-4436-b91c-de4e2518d5cc | oxide-rot-1-staging-devel                | oxide-rot-1            | gimlet_rot_image_a     | 1.0.35                     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  e49bd3ec-c533-4470-9bec-f264954f9edd | oxide-rot-1-production-release           | oxide-rot-1            | gimlet_rot_image_b     | 1.0.35                     | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  701b9e4b-861c-47de-bd96-574f3e4e487c | oxide-rot-1-selfsigned-bart              | oxide-rot-1-selfsigned | gimlet_rot_image_b     | 1.0.35                     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  581efa75-0144-4372-b370-54ebe7728cb7 | oxide-rot-1-selfsigned-staging-devel     | oxide-rot-1-selfsigned | gimlet_rot_image_b     | 1.0.35                     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  7f2a5f60-3c59-4584-a809-65102693866f | oxide-rot-1-staging-devel                | oxide-rot-1            | gimlet_rot_image_b     | 1.0.35                     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf

and the planner performed both bootloader and RoT updates.

Copy link
Contributor

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a bunch for catching and fixing this bug!!!!

Comment on lines -16 to -19
board "SimRotStage0" name "SimGimletRot" version "0.0.200" git_commit "dadadadad" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
board "SimRotStage0" name "SimSidecarRot" version "0.0.200" git_commit "dadadadad" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
board "SimRotStage0" name "SimGimletRot" version "0.0.200" git_commit "ddddddddd" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
board "SimRotStage0" name "SimSidecarRot" version "0.0.200" git_commit "ddddddddd" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that the cabooses for the RoT bootloader are being removed from these tests, I'm not sure this is something we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'm confused; I'll look at this tomorrow. We do still see the RoT bootloader cabooses down below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I get this, and I believe the change is correct albeit surprising. Inventory collections contain two maps that are relevant here:

    /// unique caboose contents that were found in this collection
    ///
    /// In practice, these will be inserted into the `sw_caboose` table.
    pub cabooses: BTreeSet<Arc<Caboose>>,

    /// all caboose contents found, keyed first by the kind of caboose
    /// (`CabooseWhich`), then the baseboard id of the sled where they were
    /// found
    ///
    /// In practice, these will be inserted into the `inv_caboose` table.
    #[serde_as(as = "BTreeMap<_, Vec<(_, _)>>")]
    pub cabooses_found:
        BTreeMap<CabooseWhich, BTreeMap<Arc<BaseboardId>, CabooseFound>>,

The table lower in this file is dumping cabooses_found, and it confirms we did still collect all of the RoTs and RoT bootloaders. This table that appears to have lost the bootloader cabooses is because now the RoT and RoT bootloader cabooses are identical. Caboose only contains these fields:

pub struct Caboose {
    pub board: String,
    pub git_commit: String,
    pub name: String,
    pub version: String,
    pub sign: Option<String>,
}

In dublin, name, board, sign always match for a given sled's RoT and RoT bootloader; they only differ by git_commit and version (and that's mainly because the bootloader is updated much less frequently than the RoT). In this test data, we use a fixed git commit and version 1.0.0 for both, so all five fields match.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, gotcha! That makes sense.

In dublin, name, board, sign always match for a given sled's RoT and RoT bootloader; they only differ by git_commit and version (and that's mainly because the bootloader is updated much less frequently than the RoT).

This doesn't seem great. If the version and git commit of the RoT/RoT bootloader cabooses start matching this could become troublesome 🤔 Is this something we may want to address in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In dublin, name, board, sign always match for a given sled's RoT and RoT bootloader; they only differ by git_commit and version (and that's mainly because the bootloader is updated much less frequently than the RoT).

This doesn't seem great. If the version and git commit of the RoT/RoT bootloader cabooses start matching this could become troublesome 🤔 Is this something we may want to address in the future?

Can you expand on what doesn't seem great? I think it's fine if these match entirely; we still also record CabooseWhich so we know which kind / slot these fields came from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and merged this to land the db migration, but if there's more to do here (or an issue to file) I'm happy to do so.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries! My concern is unrelated to this PR.

Can you expand on what doesn't seem great?

The line above the cabooses field in the code above says /// In practice, these will be inserted into the sw_caboose table..

That would mean that in a scenario like this, where all of the fields in two or more Cabooses are the same, not every existing caboose would have it's own row.

Is there a possibility that this could break something down the line? It's common all over the codebase to check the length of things to make a decision (e.g. verifying we have the correct amount of retrieved items before proceeding to do something else).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line above the cabooses field in the code above says /// In practice, these will be inserted into the sw_caboose table..

That would mean that in a scenario like this, where all of the fields in two or more Cabooses are the same, not every existing caboose would have it's own row.

Yeah, but this is intentional and by design. We expect to see a bunch of identical Cabooses in practice (e.g., every gimlet-d SP with the same version will have an identical caboose), and this map and its associated table is to intentionally deduplicate those. If we happen to have identical Cabooses across two different kinds of device, that's fine too.

Is there a possibility that this could break something down the line? It's common all over the codebase to check the length of things to make a decision (e.g. verifying we have the correct amount of retrieved items before proceeding to do something else).

I think "no", given the above - anything checking lengths of this map or the associated table already has to account for the fact that it dedups.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! Seems good to me then :)

@@ -79,7 +79,7 @@ archive_a = { kind = "fake", size = "512KiB" }
archive_b = { kind = "fake", size = "512KiB" }

[[artifact.gimlet_rot_bootloader]]
name = "SimRotStage0"
name = "SimRot"
Copy link
Contributor

@karencfv karencfv Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was under the impression that the name of the artifact needed to be unique 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Artifact IDs are the triple of (name, version, kind), so I think duplicate names are okay as long as one of the other fields (kind in this case) is different.

That said: these names are not the same in prod, so I changed them in ef839f7. I think I changed these early in this branch before I fully understood the name vs board thing; nice catch.

Copy link
Contributor

@karencfv karencfv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines -16 to -19
board "SimRotStage0" name "SimGimletRot" version "0.0.200" git_commit "dadadadad" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
board "SimRotStage0" name "SimSidecarRot" version "0.0.200" git_commit "dadadadad" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
board "SimRotStage0" name "SimGimletRot" version "0.0.200" git_commit "ddddddddd" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
board "SimRotStage0" name "SimSidecarRot" version "0.0.200" git_commit "ddddddddd" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahhh, gotcha! That makes sense.

In dublin, name, board, sign always match for a given sled's RoT and RoT bootloader; they only differ by git_commit and version (and that's mainly because the bootloader is updated much less frequently than the RoT).

This doesn't seem great. If the version and git commit of the RoT/RoT bootloader cabooses start matching this could become troublesome 🤔 Is this something we may want to address in the future?

@jgallagher jgallagher merged commit 599481c into main Aug 22, 2025
17 checks passed
@jgallagher jgallagher deleted the john/tuf-artifact-explicit-board branch August 22, 2025 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants