Conversation
This commit introduces a Nexus lockstep API and omdb support for
triggering upgrade from LRTQ.
In order to do so, we need to enable the `trust_quorum::NodeTask` to
read the lrtq state from disk if exists and there is no existing trust
quorum state on disk.
Notably, LRTQ upgrade did not require modifying the trust quorum
background task as prepare and commit operations are polled identically
between a normal reconfiguration and an LRTQ upgrade.
This was tested on a4x2. However, testing it required a slight deviation
from what would be required on a real upgrade in production. There
are no LRTQ ledgers until RSS runs, and the `trust_quorum::NodeTask`
starts before RSS. Therefore it doesn't see the shares because it
only loads them on startup. On a real system in the field the shares
would exist as soon as the sled-agent was ugpraded and the `NodeTask`
would see them. To rectify this via manual testing, I waited until RSS
completed and then restarted sled-agent on all nodes. I then issued an
lrtq-upgrade via omdb and it worked.
Note that for the above strategy to work, trust quorum RSS must be
disabled so that a real trust quorum configuration is not generated
during RSS. This is done via the following constant, which remains
unset:
```rust
pub const TRUST_QUORUM_INTEGRATION_ENABLED: bool = false;
```
Since that constant doesn't stop trust quorum from running, but
just prohibits RSS setup initializing a trust quorum, we are allowed
to upgrade out of LRTQ. At that point we have a real trust quorum
configuration and can also proceed to add sleds. I did that successfully
as well. What follows are some logged commands to show all this
working in a4x2. It looks like I lost the scrollback for issuing the
lrtq-upgrade, but it was done with the following command in omdb on
`g0`:
```
omdb nexus trust-quorum lrtq-upgrade -w
```
Here is the trust quorum configuration after the upgrade committed.
```
root@oxz_switch:~# omdb nexus trust-quorum get-config 62c2f638-c330-421e-8b4a-7f097a22281e latest
note: Nexus URL not specified. Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:17:1:d01::6]:12232
TrustQuorumConfig {
rack_id: 62c2f638-c330-421e-8b4a-7f097a22281e (rack),
epoch: Epoch(
2,
),
last_committed_epoch: None,
state: Committed,
threshold: Threshold(
2,
),
commit_crash_tolerance: 0,
coordinator: BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
},
encrypted_rack_secrets: Some(
EncryptedRackSecrets {
salt: Salt(
[
163,
19,
118,
99,
229,
14,
116,
81,
210,
117,
180,
69,
101,
181,
254,
44,
38,
169,
149,
63,
59,
40,
63,
189,
164,
106,
222,
196,
112,
25,
179,
107,
],
),
data: [
241,
144,
251,
158,
0,
19,
155,
183,
228,
30,
218,
227,
212,
100,
159,
158,
160,
13,
199,
185,
20,
142,
61,
26,
217,
92,
247,
170,
110,
38,
238,
91,
75,
78,
71,
65,
54,
93,
208,
90,
44,
2,
185,
10,
62,
167,
222,
57,
198,
217,
174,
172,
70,
145,
22,
206,
],
},
),
members: {
BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
}: TrustQuorumMemberData {
state: Committed,
share_digest: Some(
sha3 digest: 2a25477bc8d7623c81ac24e970f381636e3f84c46e41f6f36ca14e7f6011cf1,
),
time_prepared: Some(
2026-01-23T21:52:19.347850Z,
),
time_committed: Some(
2026-01-23T21:54:27.774935Z,
),
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000001",
}: TrustQuorumMemberData {
state: Committed,
share_digest: Some(
sha3 digest: 5c670dba76c8d5248eac3b1cd6e4bfb88ca12c57bc090ad43ce155b11dcc74,
),
time_prepared: Some(
2026-01-23T21:52:19.324154Z,
),
time_committed: Some(
2026-01-23T21:54:27.774935Z,
),
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000003",
}: TrustQuorumMemberData {
state: Committed,
share_digest: Some(
sha3 digest: d21a25bbd233b5dab31469659e4efa8a79a3ca7face94719f2df3e7565877c36,
),
time_prepared: Some(
2026-01-23T21:52:19.337036Z,
),
time_committed: Some(
2026-01-23T21:55:13.633304Z,
),
},
},
time_created: 2026-01-23T21:51:17.122318Z,
time_committing: Some(
2026-01-23T21:52:19.360465Z,
),
time_committed: Some(
2026-01-23T21:55:13.679196Z,
),
time_aborted: None,
abort_reason: None,
}
```
And here is the oxide CLI command to add a sled:
```
➜ oxide.rs git:(main) ✗ echo '{"sled_ids": [{"part_number": "913-0000019","serial_number": "20000002"}]}' | target/debug/oxide --profile recovery api /v1/system/hardware/racks/62c2f638-c330-421e-8b4a-7f097a22281e/membership/add --method POST --input -
{
"members": [
{
"part_number": "913-0000019",
"serial_number": "20000000"
},
{
"part_number": "913-0000019",
"serial_number": "20000001"
},
{
"part_number": "913-0000019",
"serial_number": "20000002"
},
{
"part_number": "913-0000019",
"serial_number": "20000003"
}
],
"rack_id": "62c2f638-c330-421e-8b4a-7f097a22281e",
"state": "in_progress",
"time_aborted": null,
"time_committed": null,
"time_created": "2026-01-23T22:15:53.974119Z",
"unacknowledged_members": [
{
"part_number": "913-0000019",
"serial_number": "20000000"
},
{
"part_number": "913-0000019",
"serial_number": "20000001"
},
{
"part_number": "913-0000019",
"serial_number": "20000002"
},
{
"part_number": "913-0000019",
"serial_number": "20000003"
}
],
"version": 3
}
```
Polling for a bit, gives the committed status of this add-sled in the CLI:
```
➜ oxide.rs git:(main) ✗ target/debug/oxide --profile recovery api '/v1/system/hardware/racks/62c2f638-c330-421e-8b4a-7f097a22281e/membership'
{
"members": [
{
"part_number": "913-0000019",
"serial_number": "20000000"
},
{
"part_number": "913-0000019",
"serial_number": "20000001"
},
{
"part_number": "913-0000019",
"serial_number": "20000002"
},
{
"part_number": "913-0000019",
"serial_number": "20000003"
}
],
"rack_id": "62c2f638-c330-421e-8b4a-7f097a22281e",
"state": "committed",
"time_aborted": null,
"time_committed": "2026-01-23T22:17:33.936086Z",
"time_created": "2026-01-23T22:15:53.974119Z",
"unacknowledged_members": [],
"version": 3
}
```
And here is the same thing with more detail in omdb:
```
root@oxz_switch:~# omdb nexus trust-quorum get-config 62c2f638-c330-421e-8b4a-7f097a22281e latest
note: Nexus URL not specified. Will pick one from DNS.
note: using DNS from system config (typically /etc/resolv.conf)
note: (if this is not right, use --dns-server to specify an alternate DNS server)
note: using Nexus URL http://[fd00:17:1:d01::6]:12232
TrustQuorumConfig {
rack_id: 62c2f638-c330-421e-8b4a-7f097a22281e (rack),
epoch: Epoch(
3,
),
last_committed_epoch: Some(
Epoch(
2,
),
),
state: Committed,
threshold: Threshold(
3,
),
commit_crash_tolerance: 1,
coordinator: BaseboardId {
part_number: "913-0000019",
serial_number: "20000001",
},
encrypted_rack_secrets: Some(
EncryptedRackSecrets {
salt: Salt(
[
68,
134,
154,
136,
2,
76,
247,
184,
235,
215,
228,
69,
93,
48,
142,
161,
133,
127,
137,
173,
52,
16,
184,
194,
114,
38,
73,
215,
80,
207,
255,
114,
],
),
data: [
6,
67,
96,
7,
231,
106,
134,
234,
229,
116,
209,
76,
162,
172,
175,
139,
200,
74,
202,
28,
127,
55,
44,
61,
166,
60,
135,
156,
53,
42,
66,
189,
92,
56,
7,
93,
205,
125,
98,
20,
233,
99,
128,
208,
223,
134,
64,
32,
137,
248,
119,
159,
192,
57,
142,
127,
109,
162,
254,
177,
86,
112,
21,
115,
251,
94,
51,
24,
135,
242,
113,
127,
71,
241,
50,
32,
185,
218,
240,
1,
178,
200,
71,
173,
88,
120,
254,
177,
146,
205,
16,
133,
246,
184,
212,
118,
],
},
),
members: {
BaseboardId {
part_number: "913-0000019",
serial_number: "20000000",
}: TrustQuorumMemberData {
state: Committed,
share_digest: Some(
sha3 digest: 829cb1c74cde3390a3d8e0abd81399357fc4fd2d19e7cc4cb6ca582d5c1792d,
),
time_prepared: Some(
2026-01-23T22:16:29.159832Z,
),
time_committed: Some(
2026-01-23T22:17:33.885064Z,
),
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000001",
}: TrustQuorumMemberData {
state: Committed,
share_digest: Some(
sha3 digest: 97f0b02f4ab2e1fa54d34c939658e5869c97f3ef417e4a7aecf314697cfa,
),
time_prepared: Some(
2026-01-23T22:16:28.146229Z,
),
time_committed: Some(
2026-01-23T22:17:32.825838Z,
),
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000002",
}: TrustQuorumMemberData {
state: Committed,
share_digest: Some(
sha3 digest: 1c33b428dfb3f345f497868a79a4b73e7252c848ab8ae8d56f2a9137120f8d7,
),
time_prepared: Some(
2026-01-23T22:16:27.764304Z,
),
time_committed: Some(
2026-01-23T22:17:32.825838Z,
),
},
BaseboardId {
part_number: "913-0000019",
serial_number: "20000003",
}: TrustQuorumMemberData {
state: Committed,
share_digest: Some(
sha3 digest: ad6dc4ccc51bf2aeef6fd49bc4c7930d43a946fbc781ecb703428f14f66f11d,
),
time_prepared: Some(
2026-01-23T22:16:28.837407Z,
),
time_committed: Some(
2026-01-23T22:17:32.825838Z,
),
},
},
time_created: 2026-01-23T22:15:53.974119Z,
time_committing: Some(
2026-01-23T22:16:29.183462Z,
),
time_committed: Some(
2026-01-23T22:17:33.936086Z,
),
time_aborted: None,
abort_reason: None,
}
```
| // Read back the real configuration from the database. Importantly this | ||
| // includes a chosen coordinator. | ||
| let Some(new_config) = | ||
| self.db_datastore.tq_get_config(opctx, rack_id, new_epoch).await? |
There was a problem hiding this comment.
Why do we need to read back the config we just inserted, if insertion was successful?
There was a problem hiding this comment.
The ProposedTrustQuorumConfig only contains a small subset of the data for a full configuration. Things like the coordinator and threshold are generated at the DB layer. I suppose I could just return those directly, but in other cases (like adding sleds) we want to read back and return the full configuration.
I would have used a RETURNING clause except that the configuration is built from multiple tables. For simplicity/urgency I'd prefer to keep this as is.
There was a problem hiding this comment.
Ah, gotcha. No objection to keeping this as-is; I assume there's no issue with these queries not being in the same transaction since the get takes the explicit epoch from the insert?
There was a problem hiding this comment.
I assume there's no issue with these queries not being in the same transaction since the get takes the explicit epoch from the insert?
Precisely.
This commit introduces a Nexus lockstep API and omdb support for triggering upgrade from LRTQ.
In order to do so, we need to enable the
trust_quorum::NodeTaskto read the lrtq state from disk if exists and there is no existing trust quorum state on disk.Notably, LRTQ upgrade did not require modifying the trust quorum background task as prepare and commit operations are polled identically between a normal reconfiguration and an LRTQ upgrade.
This was tested on a4x2. However, testing it required a slight deviation from what would be required on a real upgrade in production. There are no LRTQ ledgers until RSS runs, and the
trust_quorum::NodeTaskstarts before RSS. Therefore it doesn't see the shares because it only loads them on startup. On a real system in the field the shares would exist as soon as the sled-agent was ugpraded and theNodeTaskwould see them. To rectify this via manual testing, I waited until RSS completed and then restarted sled-agent on all nodes. I then issued an lrtq-upgrade via omdb and it worked.Note that for the above strategy to work, trust quorum RSS must be disabled so that a real trust quorum configuration is not generated during RSS. This is done via the following constant, which remains unset:
Since that constant doesn't stop trust quorum from running, but just prohibits RSS setup initializing a trust quorum, we are allowed to upgrade out of LRTQ. At that point we have a real trust quorum configuration and can also proceed to add sleds. I did that successfully as well. What follows are some logged commands to show all this working in a4x2. It looks like I lost the scrollback for issuing the lrtq-upgrade, but it was done with the following command in omdb on
g0:Here is the trust quorum configuration after the upgrade committed.
And here is the oxide CLI command to add a sled:
Polling for a bit, gives the committed status of this add-sled in the CLI:
And here is the same thing with more detail in omdb: