[Reconfigurator] Fix planner skipping RoT / bootloader updates (add `TufArtifactMeta::board`) #8872

jgallagher · 2025-08-20T19:54:34Z

When trying Reconfigurator-based updates on a racklette today with a real TUF repo, the planner skipped all the RoT bootloader and RoT updates because none of the artifact names matched the boards reported in inventory. Our tests were assuming these were the same (and in prod they are the same on the SP), but in real TUF repos these don't match: all the RoT and bootloaders report a board of oxide-rot-1, but the artifact names are pretty varied:

                    name                   |          version           |          kind          |                               sign
-------------------------------------------+----------------------------+------------------------+-------------------------------------------------------------------
  gimlet_rot_bootloader-production-release | 1.4.0                      | gimlet_rot_bootloader  | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  gimlet_rot_bootloader-bart               | 1.4.0                      | gimlet_rot_bootloader  | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  gimlet_rot_bootloader-staging-devel      | 1.4.0                      | gimlet_rot_bootloader  | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  oxide-rot-1-staging-devel                | 1.0.35                     | gimlet_rot_image_a     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | gimlet_rot_image_a     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  oxide-rot-1-production-release           | 1.0.35                     | gimlet_rot_image_a     | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | gimlet_rot_image_a     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | gimlet_rot_image_b     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  oxide-rot-1-production-release           | 1.0.35                     | gimlet_rot_image_b     | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | gimlet_rot_image_b     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  oxide-rot-1-staging-devel                | 1.0.35                     | gimlet_rot_image_b     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  psc_rot_bootloader-staging-devel         | 1.4.0                      | psc_rot_bootloader     | f592d8f109b81881221eed5af6438abad9b5df8c220b9129c03763e7e10b22c7
  psc_rot_bootloader-bart                  | 1.4.0                      | psc_rot_bootloader     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  psc_rot_bootloader-production-release    | 1.4.0                      | psc_rot_bootloader     | 31942f8d53dc908c5cb338bdcecb204785fa87834e8b18f706fc972a42886c8b
  oxide-rot-1-production-release           | 1.0.35                     | psc_rot_image_a        | 31942f8d53dc908c5cb338bdcecb204785fa87834e8b18f706fc972a42886c8b
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | psc_rot_image_a        | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | psc_rot_image_a        | f592d8f109b81881221eed5af6438abad9b5df8c220b9129c03763e7e10b22c7
  oxide-rot-1-staging-devel                | 1.0.35                     | psc_rot_image_a        | f592d8f109b81881221eed5af6438abad9b5df8c220b9129c03763e7e10b22c7
  oxide-rot-1-production-release           | 1.0.35                     | psc_rot_image_b        | 31942f8d53dc908c5cb338bdcecb204785fa87834e8b18f706fc972a42886c8b
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | psc_rot_image_b        | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | psc_rot_image_b        | f592d8f109b81881221eed5af6438abad9b5df8c220b9129c03763e7e10b22c7
  oxide-rot-1-staging-devel                | 1.0.35                     | psc_rot_image_b        | f592d8f109b81881221eed5af6438abad9b5df8c220b9129c03763e7e10b22c7
  switch_rot_bootloader-production-release | 1.4.0                      | switch_rot_bootloader  | 5c69a42ee1f1e6cd5f356d14f81d46f8dbee783bb28777334226c689f169c0eb
  switch_rot_bootloader-bart               | 1.4.0                      | switch_rot_bootloader  | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  switch_rot_bootloader-staging-devel      | 1.4.0                      | switch_rot_bootloader  | 1432cc4cfe5688c51b55546fe37837c753cfbc89e8c3c6aabcf977fdf0c41e27
  oxide-rot-1-staging-devel                | 1.0.35                     | switch_rot_image_a     | 1432cc4cfe5688c51b55546fe37837c753cfbc89e8c3c6aabcf977fdf0c41e27
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | switch_rot_image_a     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | switch_rot_image_a     | 1432cc4cfe5688c51b55546fe37837c753cfbc89e8c3c6aabcf977fdf0c41e27
  oxide-rot-1-production-release           | 1.0.35                     | switch_rot_image_a     | 5c69a42ee1f1e6cd5f356d14f81d46f8dbee783bb28777334226c689f169c0eb
  oxide-rot-1-selfsigned-staging-devel     | 1.0.35                     | switch_rot_image_b     | 1432cc4cfe5688c51b55546fe37837c753cfbc89e8c3c6aabcf977fdf0c41e27
  oxide-rot-1-production-release           | 1.0.35                     | switch_rot_image_b     | 5c69a42ee1f1e6cd5f356d14f81d46f8dbee783bb28777334226c689f169c0eb
  oxide-rot-1-staging-devel                | 1.0.35                     | switch_rot_image_b     | 1432cc4cfe5688c51b55546fe37837c753cfbc89e8c3c6aabcf977fdf0c41e27
  oxide-rot-1-selfsigned-bart              | 1.0.35                     | switch_rot_image_b     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d

A couple things of note here:

The bootloader artifact names do not contain the string oxide-rot-1 at all.
The oxide-rot-1-staging-devel and oxide-rot-1-selfsigned-staging-devel artifacts of each kind have the same SIGN value.

The first point means we can't (easily) determine which artifact matches the reported BORD, because we don't always have any artifact metadata that contains the board name. The second point means we can't ignore the BORDs entirely and work of just the SIGN.

Instead, this PR adds a new board value to the tuf_artifact table and the TufArtifactMeta struct. Outside of tests, the changes are almost identical to those from #8729 that added a sign field in this same spot. This allows us to stop assuming that artifact.name == inventory_caboose.board, and instead check artifact.board == inventory_caboose.board. It depends on oxidecomputer/tufaceous#40, which changes the way our fake data is generated to be more consistent with the state of production devices and artifacts. This perturbs some of the reconfiguartor-cli tests a bit, which significantly inflates the diff; the actual code changes here are pretty small.

jgallagher · 2025-08-20T19:54:53Z

Cargo.toml

-tufaceous-artifact = { git = "https://github.com/oxidecomputer/tufaceous", branch = "main", features = ["proptest", "schemars"] }
-tufaceous-brand-metadata = { git = "https://github.com/oxidecomputer/tufaceous", branch = "main" }
-tufaceous-lib = { git = "https://github.com/oxidecomputer/tufaceous", branch = "main" }
+tufaceous = { git = "https://github.com/oxidecomputer/tufaceous", branch = "john/fake-rot-distinct-sign" }


TODO: change this back to main

jgallagher · 2025-08-20T19:56:45Z

dev-tools/reconfigurator-cli/tests/input/cmds-mupdate-update-flow.txt

@@ -124,30 +124,42 @@ blueprint-diff latest
 sled-set serial1 mupdate-override unset
 inventory-generate

-# This will attempt to update the first sled's host OS. Walk through that update
-# and the host OS of the two other sleds.
+# This will attempt to update the RoT bootloader on the first sled.


This is kind of followup to #8832 (comment). On that PR, I had to change this test to update the host OS so that we could move on to the planner doing things with zones, but in review, the question came up of "why don't we also have to do the same thing for the bootloader, RoT, and SP?". The superficial answer was "the fake TUF repo doesn't have any matching artifacts for those so the planner is skipping them", but I didn't dig into why that is. With the changes on this PR, the fake TUF repo now does have matching artifacts. So instead of just updating the host OS here, we have to do all the MGS-based updates for all three fake sleds.

jgallagher · 2025-08-20T20:06:44Z

nexus/reconfigurator/planning/src/mgs_updates/rot.rs

            // - "kind" matching one of the known RoT kinds
            // - "sign" matching the rkth (found above from caboose)

-            if a.id.name != *board {
+            if a.board.as_ref() != Some(board) {


This (and the corresponding change in rot_bootloader.rs) is the actual bugfix.

jgallagher · 2025-08-20T20:06:56Z

nexus/reconfigurator/planning/src/mgs_updates/sp.rs

            // - "kind" matching one of the known SP kinds

-            if a.id.name != *board {
+            if a.board.as_ref() != Some(board) {


Changing this for the SP is not strictly required, but seems cleaner.

davepacheco · 2025-08-20T20:55:31Z

nexus/reconfigurator/planning/src/mgs_updates/test_helpers.rs

                None,
            ),
            make_artifact(
-                "oxide-rot-1",
+                "oxide-rot-1-fake-key",


I'm guessing you changed this to verify that we're not accidentally depending on the target's name?

Yep, exactly.

jgallagher · 2025-08-21T21:26:18Z

Testing on dublin looks good; we see the board column matches what we expect (e.g., for gimlet; sidecar and psc are similar):

root@[fd00:1122:3344:104::4]:32221/omicron> select id,name,board,kind,version,sign from tuf_artifact order by kind,name asc;                                                                                                            (5 results) 16:09:18 [37/5492]
                   id                  |                   name                   |         board          |          kind          |          version           |                               sign
---------------------------------------+------------------------------------------+------------------------+------------------------+----------------------------+-------------------------------------------------------------------
  8b445290-da59-45c7-ab9a-546e3a770277 | gimlet_rot_bootloader-bart               | oxide-rot-1            | gimlet_rot_bootloader  | 1.4.0                      | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  ecff03d3-bf37-493f-aa70-1af43c7f81f3 | gimlet_rot_bootloader-production-release | oxide-rot-1            | gimlet_rot_bootloader  | 1.4.0                      | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  ca011026-7ac1-412b-a7a9-38b55e73b2df | gimlet_rot_bootloader-staging-devel      | oxide-rot-1            | gimlet_rot_bootloader  | 1.4.0                      | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  eb2ae451-c6cb-43ff-9a1e-83d1714e60a3 | oxide-rot-1-production-release           | oxide-rot-1            | gimlet_rot_image_a     | 1.0.35                     | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  ea91648a-a43b-46fa-8ff2-c55f178be6c1 | oxide-rot-1-selfsigned-bart              | oxide-rot-1-selfsigned | gimlet_rot_image_a     | 1.0.35                     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  e3de3af7-7dd0-4675-8c91-5b70c8bba0dc | oxide-rot-1-selfsigned-staging-devel     | oxide-rot-1-selfsigned | gimlet_rot_image_a     | 1.0.35                     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  417ba792-71d8-4436-b91c-de4e2518d5cc | oxide-rot-1-staging-devel                | oxide-rot-1            | gimlet_rot_image_a     | 1.0.35                     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  e49bd3ec-c533-4470-9bec-f264954f9edd | oxide-rot-1-production-release           | oxide-rot-1            | gimlet_rot_image_b     | 1.0.35                     | 5796ee3433f840519c3bcde73e19ee82ccb6af3857eddaabb928b8d9726d93c0
  701b9e4b-861c-47de-bd96-574f3e4e487c | oxide-rot-1-selfsigned-bart              | oxide-rot-1-selfsigned | gimlet_rot_image_b     | 1.0.35                     | 84332ef8279df87fbb759dc3866cbc50cd246fbb5a64705a7e60ba86bf01c27d
  581efa75-0144-4372-b370-54ebe7728cb7 | oxide-rot-1-selfsigned-staging-devel     | oxide-rot-1-selfsigned | gimlet_rot_image_b     | 1.0.35                     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf
  7f2a5f60-3c59-4584-a809-65102693866f | oxide-rot-1-staging-devel                | oxide-rot-1            | gimlet_rot_image_b     | 1.0.35                     | 11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf

and the planner performed both bootloader and RoT updates.

karencfv

Thanks a bunch for catching and fixing this bug!!!!

karencfv · 2025-08-21T21:26:22Z

nexus/inventory/tests/output/collector_basic.txt

-    board "SimRotStage0" name "SimGimletRot" version "0.0.200" git_commit "dadadadad" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
-    board "SimRotStage0" name "SimSidecarRot" version "0.0.200" git_commit "dadadadad" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
-    board "SimRotStage0" name "SimGimletRot" version "0.0.200" git_commit "ddddddddd" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
-    board "SimRotStage0" name "SimSidecarRot" version "0.0.200" git_commit "ddddddddd" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")


It appears that the cabooses for the RoT bootloader are being removed from these tests, I'm not sure this is something we want?

Hmm, I'm confused; I'll look at this tomorrow. We do still see the RoT bootloader cabooses down below.

Ok, I think I get this, and I believe the change is correct albeit surprising. Inventory collections contain two maps that are relevant here:

/// unique caboose contents that were found in this collection /// /// In practice, these will be inserted into the `sw_caboose` table. pub cabooses: BTreeSet<Arc<Caboose>>, /// all caboose contents found, keyed first by the kind of caboose /// (`CabooseWhich`), then the baseboard id of the sled where they were /// found /// /// In practice, these will be inserted into the `inv_caboose` table. #[serde_as(as = "BTreeMap<_, Vec<(_, _)>>")] pub cabooses_found: BTreeMap<CabooseWhich, BTreeMap<Arc<BaseboardId>, CabooseFound>>,

The table lower in this file is dumping cabooses_found, and it confirms we did still collect all of the RoTs and RoT bootloaders. This table that appears to have lost the bootloader cabooses is because now the RoT and RoT bootloader cabooses are identical. Caboose only contains these fields:

pub struct Caboose { pub board: String, pub git_commit: String, pub name: String, pub version: String, pub sign: Option<String>, }

In dublin, name, board, sign always match for a given sled's RoT and RoT bootloader; they only differ by git_commit and version (and that's mainly because the bootloader is updated much less frequently than the RoT). In this test data, we use a fixed git commit and version 1.0.0 for both, so all five fields match.

Ahhh, gotcha! That makes sense.

In dublin, name, board, sign always match for a given sled's RoT and RoT bootloader; they only differ by git_commit and version (and that's mainly because the bootloader is updated much less frequently than the RoT).

This doesn't seem great. If the version and git commit of the RoT/RoT bootloader cabooses start matching this could become troublesome 🤔 Is this something we may want to address in the future?

In dublin, name, board, sign always match for a given sled's RoT and RoT bootloader; they only differ by git_commit and version (and that's mainly because the bootloader is updated much less frequently than the RoT).

This doesn't seem great. If the version and git commit of the RoT/RoT bootloader cabooses start matching this could become troublesome 🤔 Is this something we may want to address in the future?

Can you expand on what doesn't seem great? I think it's fine if these match entirely; we still also record CabooseWhich so we know which kind / slot these fields came from.

I went ahead and merged this to land the db migration, but if there's more to do here (or an issue to file) I'm happy to do so.

No worries! My concern is unrelated to this PR.

Can you expand on what doesn't seem great?

The line above the cabooses field in the code above says /// In practice, these will be inserted into the sw_caboose table..

That would mean that in a scenario like this, where all of the fields in two or more Cabooses are the same, not every existing caboose would have it's own row.

Is there a possibility that this could break something down the line? It's common all over the codebase to check the length of things to make a decision (e.g. verifying we have the correct amount of retrieved items before proceeding to do something else).

The line above the cabooses field in the code above says /// In practice, these will be inserted into the sw_caboose table..

That would mean that in a scenario like this, where all of the fields in two or more Cabooses are the same, not every existing caboose would have it's own row.

Yeah, but this is intentional and by design. We expect to see a bunch of identical Cabooses in practice (e.g., every gimlet-d SP with the same version will have an identical caboose), and this map and its associated table is to intentionally deduplicate those. If we happen to have identical Cabooses across two different kinds of device, that's fine too.

Is there a possibility that this could break something down the line? It's common all over the codebase to check the length of things to make a decision (e.g. verifying we have the correct amount of retrieved items before proceeding to do something else).

I think "no", given the above - anything checking lengths of this map or the associated table already has to account for the fact that it dedups.

Fair enough! Seems good to me then :)

karencfv · 2025-08-21T21:31:21Z

update-common/manifests/fake.toml

@@ -79,7 +79,7 @@ archive_a = { kind = "fake", size = "512KiB" }
 archive_b = { kind = "fake", size = "512KiB" }

 [[artifact.gimlet_rot_bootloader]]
-name = "SimRotStage0"
+name = "SimRot"


I was under the impression that the name of the artifact needed to be unique 🤔

Artifact IDs are the triple of (name, version, kind), so I think duplicate names are okay as long as one of the other fields (kind in this case) is different.

That said: these names are not the same in prod, so I changed them in ef839f7. I think I changed these early in this branch before I fully understood the name vs board thing; nice catch.

…licit-board

karencfv

Thanks!

karencfv · 2025-08-22T17:45:33Z

nexus/inventory/tests/output/collector_basic.txt

-    board "SimRotStage0" name "SimGimletRot" version "0.0.200" git_commit "dadadadad" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
-    board "SimRotStage0" name "SimSidecarRot" version "0.0.200" git_commit "dadadadad" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
-    board "SimRotStage0" name "SimGimletRot" version "0.0.200" git_commit "ddddddddd" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")
-    board "SimRotStage0" name "SimSidecarRot" version "0.0.200" git_commit "ddddddddd" sign Some("11594bb5548a757e918e6fe056e2ad9e084297c9555417a025d8788eacf55daf")


Ahhh, gotcha! That makes sense.

In dublin, name, board, sign always match for a given sled's RoT and RoT bootloader; they only differ by git_commit and version (and that's mainly because the bootloader is updated much less frequently than the RoT).

This doesn't seem great. If the version and git commit of the RoT/RoT bootloader cabooses start matching this could become troublesome 🤔 Is this something we may want to address in the future?

jgallagher added 5 commits August 20, 2025 15:43

add TufArtifactMeta::board

91a77aa

schema migration

dd59ec4

use board field in planner

3912365

fixup reconfigurator-cli tests

8a5f72c

point to tufaceous branch

056faa4

jgallagher requested review from iliana, davepacheco and karencfv August 20, 2025 19:54

jgallagher commented Aug 20, 2025

View reviewed changes

tufaceous bump

1df04d1

jgallagher commented Aug 20, 2025

View reviewed changes

davepacheco approved these changes Aug 20, 2025

View reviewed changes

jgallagher added 3 commits August 21, 2025 10:29

openapi

a95504d

expectorate

da02ebd

trigger rebuild for a different TUF repo

300baec

karencfv reviewed Aug 21, 2025

View reviewed changes

jgallagher added 3 commits August 22, 2025 08:19

back to tufaceous main

570c210

don't duplicate artifact names in fake TUF repo manifest

ef839f7

Merge remote-tracking branch 'origin/main' into john/tuf-artifact-exp…

311f4c9

…licit-board

karencfv approved these changes Aug 22, 2025

View reviewed changes

jgallagher merged commit 599481c into main Aug 22, 2025
17 checks passed

jgallagher deleted the john/tuf-artifact-explicit-board branch August 22, 2025 19:52

[Reconfigurator] Fix planner skipping RoT / bootloader updates (add TufArtifactMeta::board) #8872

[Reconfigurator] Fix planner skipping RoT / bootloader updates (add TufArtifactMeta::board) #8872

Uh oh!

Conversation

jgallagher commented Aug 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jgallagher commented Aug 21, 2025

Uh oh!

karencfv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karencfv Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

karencfv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

[Reconfigurator] Fix planner skipping RoT / bootloader updates (add `TufArtifactMeta::board`) #8872

[Reconfigurator] Fix planner skipping RoT / bootloader updates (add `TufArtifactMeta::board`) #8872

karencfv Aug 21, 2025 •

edited

Loading