Skip to content

Commit a3ee742

Browse files
authored
[reconfigurator] MGS updates to skip sled boards with zones that should not be shut down (#9044)
SP and Host OS updates now have an additional check to verify if the sled they're going to update contains a zone that is unsafe to shut down. If so, they will return an error and those updates will be marked as blocked. Closes: #8482 Closes: #9067
1 parent f33aa73 commit a3ee742

File tree

12 files changed

+2815
-570
lines changed

12 files changed

+2815
-570
lines changed
Lines changed: 105 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,105 @@
1+
# Load example system
2+
load-example --nsleds 3 --ndisks-per-sled 3
3+
4+
# Create a TUF repository from a fake manifest. (The output TUF repo is
5+
# written to a temporary directory that this invocation of `reconfigurator-cli`
6+
# is running out of as its working directory.)
7+
#
8+
# This is used to simulate the initial version of the system.
9+
tuf-assemble ../../update-common/manifests/fake-0.0.1.toml
10+
11+
# Load the target release from the assembled TUF repository.
12+
set target-release repo-0.0.1.zip
13+
14+
# Print the default target release.
15+
show
16+
17+
# Update the install dataset on all sleds to the target release.
18+
# This will cause zones to be noop converted over to Artifact,
19+
# unblocking upgrades.
20+
sled-update-install-dataset serial0 --to-target-release
21+
sled-update-install-dataset serial1 --to-target-release
22+
sled-update-install-dataset serial2 --to-target-release
23+
24+
# Generate inventory, then do a planning run to ensure that all zones
25+
# are set to Artifact.
26+
inventory-generate
27+
blueprint-plan latest latest
28+
blueprint-diff latest
29+
# The above blueprint includes a pending MGS update, which we should delete
30+
# (we want to start from a fresh state).
31+
blueprint-edit latest delete-sp-update serial0
32+
# Also set the Omicron config for all sleds to reflect the
33+
# corresponding image sources.
34+
sled-set serial0 omicron-config latest
35+
sled-set serial1 omicron-config latest
36+
sled-set serial2 omicron-config latest
37+
# Generate inventory once more to reflect the omicron config changes.
38+
inventory-generate
39+
inventory-show latest
40+
41+
# Setup is now done -- create another TUF repository which will act as the
42+
# target release being updated to.
43+
tuf-assemble ../../update-common/manifests/fake.toml
44+
45+
# Load the target release from the assembled TUF repository.
46+
set target-release repo-1.0.0.zip
47+
48+
# First, print out sled information.
49+
sled-list
50+
51+
# Retrieve blueprint information to know the ID of the zone to expunge
52+
blueprint-show latest
53+
54+
# We expunge a single internal_dns zone
55+
#
56+
# Note that since we are not running the command to emulate the sled-agent
57+
# performing an inventory collection and seeing the DNS zone has gone away, the
58+
# planner will not attemt to restore the internal DNS zone. This is intentional
59+
# as we want the zone to stay in this state for the purposes of this test
60+
blueprint-edit latest expunge-zones 99e2f30b-3174-40bf-a78a-90da8abba8ca
61+
blueprint-diff latest
62+
63+
# Attempt to upgrade one RoT bootloader. This should successfully plan the
64+
# pending RoT bootloader update
65+
blueprint-plan latest latest
66+
blueprint-diff latest
67+
68+
# We generate another plan, there should be no changes
69+
blueprint-plan latest latest
70+
blueprint-diff latest
71+
72+
# Now, forcibly update the simulated RoT bootloader in all sleds to reflect that
73+
# the update completed.
74+
# Collect inventory from it and use that collection for another planning step.
75+
# This should report that the update completed, and successfully plan a
76+
# pending RoT update
77+
sled-update-rot-bootloader serial0 --stage0 1.0.0
78+
sled-update-rot-bootloader serial1 --stage0 1.0.0
79+
sled-update-rot-bootloader serial2 --stage0 1.0.0
80+
inventory-generate
81+
blueprint-plan latest latest
82+
blueprint-diff latest
83+
84+
# We repeat the same process with the RoT, but now expecting to see a single
85+
# blocked SP update due to serial0 containing an unsafe zone. We also see a
86+
# pending SP update on serial1 which has no unsafe zones.
87+
sled-update-rot serial0 --slot-b 1.0.0 --active-slot b --persistent-boot-preference b
88+
sled-update-rot serial1 --slot-b 1.0.0 --active-slot b --persistent-boot-preference b
89+
sled-update-rot serial2 --slot-b 1.0.0 --active-slot b --persistent-boot-preference b
90+
inventory-generate
91+
blueprint-plan latest latest
92+
blueprint-diff latest
93+
94+
# Since there was a blocked update on serial0, the planner has moved on to
95+
# serial1. This sled has no unsafe zones, so we forcibly update it along with
96+
# the SPs to see the errors with the Host OS on serial2 and serial0, and plan no
97+
# pending updates
98+
sled-update-sp serial0 --active 1.0.0
99+
sled-update-sp serial1 --active 1.0.0
100+
sled-update-sp serial2 --active 1.0.0
101+
sled-update-host-phase2 serial1 --boot-disk B --slot-b d944ae205b61ccf4322448f7d0311a819c53d9844769de066c5307c1682abb47
102+
sled-update-host-phase1 serial1 --active B --slot-b b99d5273ba1418bebb19d74b701d716896409566d41de76ada71bded4c9b166b
103+
inventory-generate
104+
blueprint-plan latest latest
105+
blueprint-diff latest

dev-tools/reconfigurator-cli/tests/output/cmds-unsafe-zone-mgs-stderr

Whitespace-only changes.

dev-tools/reconfigurator-cli/tests/output/cmds-unsafe-zone-mgs-stdout

Lines changed: 1727 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)