Skip to content

Commit 65a968e

Browse files
authored
nexus-mgs-updates: Cleanup and refactoring to support host OS updates (#8736)
This is mostly a "refactoring things a bit" PR to reduce diff churn with the upcoming host OS updates PR, but there are some small behavioral changes, mostly but not entirely affecting tests: * In tests, simulated sled-agent now reports valid boot partition contents instead of an error * Fix a small bug in calculating elapsed time that caused `reconfigurator-sp-updater` to always show "updating for 0s" instead of the actual elapsed time * Replace `RESET_TIMEOUT` with a function so we can have different values for different kinds of updates. (This is mostly important because the "reset a host" timeout needs to be _much_ longer than "reset an SP/RoT" timeouts, but also allows us to go back to 60s for sled/psc SPs while keeping 120s for switch SPs.) * In `nexus-mgs-updates` tests, create MGS clients that have longer timeouts. (Reason for this is in comments; this definitely affects the host OS tests that don't exist as of this PR, but other tests that are on main are already racy and can fail the same way on a busier system.) * In `nexus-mgs-updates` tests, make the `deployed_caboose` field an `Option`, because host OS artifacts won't have one. I don't love this in that it's an option dependent on other fields (i.e., SP/RoT/RoT bootloader updates always fill this with `Some(_)`, and host OS updates always have `None`). Usually I'd use an enum to squash the impossible states out, but (a) this is test-only code and (b) using that enum in tests seems slightly more onerous than just unwrapping. Happy to change this if others feel differently though.
1 parent 4025747 commit 65a968e

File tree

19 files changed

+435
-235
lines changed

19 files changed

+435
-235
lines changed

Cargo.lock

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

dev-tools/reconfigurator-cli/tests/output/cmds-example-stdout

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1194,9 +1194,15 @@ LEDGERED SLED CONFIG
11941194
path on boot disk: /fake/path/install/mupdate_override.json
11951195
no override on boot disk
11961196
no non-boot disks
1197-
boot disk slot: FAILED TO DETERMINE: constructed via debug_assume_success()
1198-
slot A details UNAVAILABLE: constructed via debug_assume_success()
1199-
slot B details UNAVAILABLE: constructed via debug_assume_success()
1197+
boot disk slot: A
1198+
slot A details:
1199+
artifact: 0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a (1000 bytes)
1200+
image name: fake from debug_assume_success()
1201+
phase 2 hash: 0000000000000000000000000000000000000000000000000000000000000000
1202+
slot B details:
1203+
artifact: 0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b (1000 bytes)
1204+
image name: fake from debug_assume_success()
1205+
phase 2 hash: 0101010101010101010101010101010101010101010101010101010101010101
12001206
last reconciled config: matches ledgered config
12011207
no mupdate override to clear
12021208
no orphaned datasets
@@ -1302,9 +1308,15 @@ LEDGERED SLED CONFIG
13021308
path on boot disk: /fake/path/install/mupdate_override.json
13031309
no override on boot disk
13041310
no non-boot disks
1305-
boot disk slot: FAILED TO DETERMINE: constructed via debug_assume_success()
1306-
slot A details UNAVAILABLE: constructed via debug_assume_success()
1307-
slot B details UNAVAILABLE: constructed via debug_assume_success()
1311+
boot disk slot: A
1312+
slot A details:
1313+
artifact: 0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a (1000 bytes)
1314+
image name: fake from debug_assume_success()
1315+
phase 2 hash: 0000000000000000000000000000000000000000000000000000000000000000
1316+
slot B details:
1317+
artifact: 0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b (1000 bytes)
1318+
image name: fake from debug_assume_success()
1319+
phase 2 hash: 0101010101010101010101010101010101010101010101010101010101010101
13081320
last reconciled config: matches ledgered config
13091321
no mupdate override to clear
13101322
no orphaned datasets
@@ -1503,9 +1515,15 @@ LEDGERED SLED CONFIG
15031515
path on boot disk: /fake/path/install/mupdate_override.json
15041516
no override on boot disk
15051517
no non-boot disks
1506-
boot disk slot: FAILED TO DETERMINE: constructed via debug_assume_success()
1507-
slot A details UNAVAILABLE: constructed via debug_assume_success()
1508-
slot B details UNAVAILABLE: constructed via debug_assume_success()
1518+
boot disk slot: A
1519+
slot A details:
1520+
artifact: 0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a (1000 bytes)
1521+
image name: fake from debug_assume_success()
1522+
phase 2 hash: 0000000000000000000000000000000000000000000000000000000000000000
1523+
slot B details:
1524+
artifact: 0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b (1000 bytes)
1525+
image name: fake from debug_assume_success()
1526+
phase 2 hash: 0101010101010101010101010101010101010101010101010101010101010101
15091527
last reconciled config: matches ledgered config
15101528
no mupdate override to clear
15111529
no orphaned datasets

dev-tools/reconfigurator-cli/tests/output/cmds-mupdate-update-flow-stdout

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -213,9 +213,15 @@ LEDGERED SLED CONFIG
213213
path on boot disk: /fake/path/install/mupdate_override.json
214214
error obtaining override on boot disk: reconfigurator-cli simulated mupdate-override error
215215
no non-boot disks
216-
boot disk slot: FAILED TO DETERMINE: constructed via debug_assume_success()
217-
slot A details UNAVAILABLE: constructed via debug_assume_success()
218-
slot B details UNAVAILABLE: constructed via debug_assume_success()
216+
boot disk slot: A
217+
slot A details:
218+
artifact: 0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a (1000 bytes)
219+
image name: fake from debug_assume_success()
220+
phase 2 hash: 0000000000000000000000000000000000000000000000000000000000000000
221+
slot B details:
222+
artifact: 0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b (1000 bytes)
223+
image name: fake from debug_assume_success()
224+
phase 2 hash: 0101010101010101010101010101010101010101010101010101010101010101
219225
last reconciled config: matches ledgered config
220226
error reading mupdate override, so sled agent didn't attempt to clear it
221227
no orphaned datasets
@@ -320,9 +326,15 @@ LEDGERED SLED CONFIG
320326
path on boot disk: /fake/path/install/mupdate_override.json
321327
override on boot disk: 6123eac1-ec5b-42ba-b73f-9845105a9971
322328
no non-boot disks
323-
boot disk slot: FAILED TO DETERMINE: constructed via debug_assume_success()
324-
slot A details UNAVAILABLE: constructed via debug_assume_success()
325-
slot B details UNAVAILABLE: constructed via debug_assume_success()
329+
boot disk slot: A
330+
slot A details:
331+
artifact: 0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a (1000 bytes)
332+
image name: fake from debug_assume_success()
333+
phase 2 hash: 0000000000000000000000000000000000000000000000000000000000000000
334+
slot B details:
335+
artifact: 0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b (1000 bytes)
336+
image name: fake from debug_assume_success()
337+
phase 2 hash: 0101010101010101010101010101010101010101010101010101010101010101
326338
last reconciled config: matches ledgered config
327339
mupdate override present, but sled agent was not instructed to clear it
328340
no orphaned datasets
@@ -416,9 +428,15 @@ LEDGERED SLED CONFIG
416428
path on boot disk: /fake/path/install/mupdate_override.json
417429
override on boot disk: 203fa72c-85c1-466a-8ed3-338ee029530d
418430
no non-boot disks
419-
boot disk slot: FAILED TO DETERMINE: constructed via debug_assume_success()
420-
slot A details UNAVAILABLE: constructed via debug_assume_success()
421-
slot B details UNAVAILABLE: constructed via debug_assume_success()
431+
boot disk slot: A
432+
slot A details:
433+
artifact: 0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a0a (1000 bytes)
434+
image name: fake from debug_assume_success()
435+
phase 2 hash: 0000000000000000000000000000000000000000000000000000000000000000
436+
slot B details:
437+
artifact: 0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b (1000 bytes)
438+
image name: fake from debug_assume_success()
439+
phase 2 hash: 0101010101010101010101010101010101010101010101010101010101010101
422440
last reconciled config: matches ledgered config
423441
mupdate override present, but sled agent was not instructed to clear it
424442
no orphaned datasets

dev-tools/reconfigurator-sp-updater/Cargo.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ oxide-tokio-rt.workspace = true
2525
qorb.workspace = true
2626
serde_json.workspace = true
2727
slog.workspace = true
28+
swrite.workspace = true
2829
tokio = { workspace = true, features = [ "full" ] }
2930
tufaceous-artifact.workspace = true
3031
omicron-workspace-hack.workspace = true

dev-tools/reconfigurator-sp-updater/src/main.rs

Lines changed: 24 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -29,10 +29,11 @@ use qorb::resolver::Resolver;
2929
use qorb::resolvers::fixed::FixedResolver;
3030
use slog::{info, o, warn};
3131
use std::collections::BTreeMap;
32-
use std::fmt::Write;
3332
use std::net::SocketAddr;
3433
use std::sync::Arc;
3534
use std::time::Duration;
35+
use swrite::SWrite;
36+
use swrite::swriteln;
3637
use tokio::sync::watch;
3738
use tufaceous_artifact::ArtifactHash;
3839
use tufaceous_artifact::ArtifactVersion;
@@ -295,33 +296,34 @@ fn cmd_config(
295296
let configured = updater_state.requests_tx.borrow();
296297

297298
let mut s = String::new();
298-
writeln!(&mut s, "configured updates ({}):", configured.len())?;
299+
swriteln!(s, "configured updates ({}):", configured.len());
299300
for update in &*configured {
300301
let baseboard_id = &update.baseboard_id;
301-
writeln!(
302-
&mut s,
302+
swriteln!(
303+
s,
303304
" part {} serial {} (type {:?} slot {}):",
304305
baseboard_id.part_number,
305306
baseboard_id.serial_number,
306307
update.sp_type,
307308
update.slot_id,
308-
)?;
309-
writeln!(&mut s, " artifact hash: {}", update.artifact_hash)?;
310-
writeln!(
311-
&mut s,
309+
);
310+
swriteln!(s, " artifact hash: {}", update.artifact_hash);
311+
swriteln!(
312+
s,
312313
" user-provided artifact version: {}",
313314
update.artifact_version,
314-
)?;
315+
);
315316
match &update.details {
316317
PendingMgsUpdateDetails::Sp {
317318
expected_active_version,
318319
expected_inactive_version,
319320
} => {
320-
writeln!(
321-
&mut s,
321+
swriteln!(
322+
s,
322323
" preconditions: active slot {:?}, inactive slot {:?}",
323-
expected_active_version, expected_inactive_version,
324-
)?;
324+
expected_active_version,
325+
expected_inactive_version,
326+
);
325327
}
326328
PendingMgsUpdateDetails::Rot {
327329
expected_active_slot,
@@ -330,8 +332,8 @@ fn cmd_config(
330332
expected_pending_persistent_boot_preference,
331333
expected_transient_boot_preference,
332334
} => {
333-
writeln!(
334-
&mut s,
335+
swriteln!(
336+
s,
335337
" preconditions: expected active slot {:?}
336338
expected active version {:?}
337339
expected inactive version {:?}
@@ -342,21 +344,22 @@ fn cmd_config(
342344
expected_inactive_version, expected_persistent_boot_preference,
343345
expected_pending_persistent_boot_preference,
344346
expected_transient_boot_preference,
345-
)?;
347+
);
346348
}
347349
PendingMgsUpdateDetails::RotBootloader {
348350
expected_stage0_version,
349351
expected_stage0_next_version,
350352
} => {
351-
writeln!(
352-
&mut s,
353+
swriteln!(
354+
s,
353355
" preconditions: stage 0 {:?}, stage 0 next {:?}",
354-
expected_stage0_version, expected_stage0_next_version,
355-
)?;
356+
expected_stage0_version,
357+
expected_stage0_next_version,
358+
);
356359
}
357360
}
358361

359-
writeln!(&mut s)?;
362+
swriteln!(s);
360363
}
361364

362365
Ok(Some(s))

nexus-sled-agent-shared/src/inventory.rs

Lines changed: 27 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -222,13 +222,34 @@ impl ConfigReconcilerInventory {
222222
orphaned_datasets: IdOrdMap::new(),
223223
zones,
224224
boot_partitions: {
225-
// None of our callers care about this; if that changes, we
226-
// could pass in boot partition contents.
227-
let err = "constructed via debug_assume_success()".to_string();
228225
BootPartitionContents {
229-
boot_disk: Err(err.clone()),
230-
slot_a: Err(err.clone()),
231-
slot_b: Err(err),
226+
boot_disk: Ok(M2Slot::A),
227+
slot_a: Ok(BootPartitionDetails {
228+
header: BootImageHeader {
229+
flags: 0,
230+
data_size: 1000,
231+
image_size: 1000,
232+
target_size: 1000,
233+
sha256: [0; 32],
234+
image_name: "fake from debug_assume_success()"
235+
.to_string(),
236+
},
237+
artifact_hash: ArtifactHash([0x0a; 32]),
238+
artifact_size: 1000,
239+
}),
240+
slot_b: Ok(BootPartitionDetails {
241+
header: BootImageHeader {
242+
flags: 0,
243+
data_size: 1000,
244+
image_size: 1000,
245+
target_size: 1000,
246+
sha256: [1; 32],
247+
image_name: "fake from debug_assume_success()"
248+
.to_string(),
249+
},
250+
artifact_hash: ArtifactHash([0x0b; 32]),
251+
artifact_size: 1000,
252+
}),
232253
}
233254
},
234255
clear_mupdate_override,

nexus/inventory/tests/output/collector_basic.txt

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,14 @@ sled agents found:
9595
host_phase_2.slot_a: CurrentContents
9696
host_phase_2.slot_b: CurrentContents
9797
zone 8b88a56f-3eb6-4d80-ba42-75d867bc427d type oximeter
98-
no completed reconciliation
99-
reconciler task not yet run
98+
last reconciled config:
99+
generation: 3
100+
remove_mupdate_override: None
101+
host_phase_2.slot_a: CurrentContents
102+
host_phase_2.slot_b: CurrentContents
103+
zone 8b88a56f-3eb6-4d80-ba42-75d867bc427d type oximeter
104+
result for zone 8b88a56f-3eb6-4d80-ba42-75d867bc427d: Ok
105+
reconciler task idle
100106
sled 9cb9b78f-5614-440c-b66d-e8e81fab69b0 (Scrimlet)
101107
baseboard Some(BaseboardId { part_number: "sim-gimlet", serial_number: "sim-9cb9b78f-5614-440c-b66d-e8e81fab69b0" })
102108
ledgered sled config:
@@ -105,7 +111,13 @@ sled agents found:
105111
host_phase_2.slot_a: CurrentContents
106112
host_phase_2.slot_b: CurrentContents
107113
zone 5125277f-0988-490b-ac01-3bba20cc8f07 type oximeter
108-
no completed reconciliation
109-
reconciler task not yet run
114+
last reconciled config:
115+
generation: 3
116+
remove_mupdate_override: None
117+
host_phase_2.slot_a: CurrentContents
118+
host_phase_2.slot_b: CurrentContents
119+
zone 5125277f-0988-490b-ac01-3bba20cc8f07 type oximeter
120+
result for zone 5125277f-0988-490b-ac01-3bba20cc8f07: Ok
121+
reconciler task idle
110122

111123
errors:

nexus/inventory/tests/output/collector_sled_agent_errors.txt

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,14 @@ sled agents found:
9494
host_phase_2.slot_a: CurrentContents
9595
host_phase_2.slot_b: CurrentContents
9696
zone 5125277f-0988-490b-ac01-3bba20cc8f07 type oximeter
97-
no completed reconciliation
98-
reconciler task not yet run
97+
last reconciled config:
98+
generation: 3
99+
remove_mupdate_override: None
100+
host_phase_2.slot_a: CurrentContents
101+
host_phase_2.slot_b: CurrentContents
102+
zone 5125277f-0988-490b-ac01-3bba20cc8f07 type oximeter
103+
result for zone 5125277f-0988-490b-ac01-3bba20cc8f07: Ok
104+
reconciler task idle
99105

100106
errors:
101107
error: Sled Agent "http://[100::1]:45678": inventory: Communication Error <<redacted>>

0 commit comments

Comments
 (0)