Skip to content

Commit 842226d

Browse files
authored
TQ: Add protocol support for LRTQ upgrade (#9065)
This PR completes the first version of the sans-io trust quorum protocol implementation. LRTQ upgrade can now be started via `Node::coordinate_upgrade_from_lrtq`. This triggers the coordinating node to start collecting the LRTQ key shares so that they can be used to construct the LRTQ rack secret via the bootstore code. After this occurs, a Prepare message is sent out with this old rack secret encrypted in a manner identical to a normal reconfiguration. The prepare and commit paths remain the same. The cluster proptest was updated to sometimes start out with an existing LRTQ configuration and then to upgrade from there. Like normal reconfigurations it allows aborting and pre-empting of the LRTQ upgrade with a new attempt at a higher epoch. In production this is how we "retry" if the coordinating node crashes prior to commit, or more accurately, if nexus can't talk to the coordinating node for some period of time and just moves on. After the LRTQ upgrade commits, normal reconfigurations are run. We also remove unnecessary config related messages in this commit. Since a `Configuration` does not contain sensitive information it can be retrieved when Nexus polls the coordinator before it commits. Then Nexus can save this info and send it in `PrepareAndCommit` messages rather than having the receiving node try to find a live peer with the config prior to collecting shares. This is a nice optimization that reduces protocol complexity a bit. This removal allowed removing the TODO in the message `match` statement in `Node::handle` and completing the protocol.
1 parent cf57f89 commit 842226d

File tree

18 files changed

+1166
-193
lines changed

18 files changed

+1166
-193
lines changed

Cargo.lock

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

bootstore/src/lib.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,9 @@ use serde::{Deserialize, Serialize};
1111
Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize,
1212
)]
1313
pub struct Sha3_256Digest([u8; 32]);
14+
15+
impl Sha3_256Digest {
16+
pub fn new(bytes: [u8; 32]) -> Self {
17+
Sha3_256Digest(bytes)
18+
}
19+
}

bootstore/src/trust_quorum/rack_secret.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,6 @@ impl RackSecret {
9090
}
9191

9292
/// Combine a set of shares and return a RackSecret
93-
#[allow(unused)]
9493
pub fn combine_shares(
9594
shares: &[Vec<u8>],
9695
) -> Result<RackSecret, vsss_rs::Error> {

trust-quorum/src/configuration.rs

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,6 @@
55
//! A configuration of a trust quroum at a given epoch
66
77
use crate::crypto::{EncryptedRackSecrets, RackSecret, Sha3_256Digest};
8-
use crate::validators::ValidatedReconfigureMsg;
98
use crate::{Epoch, PlatformId, Threshold};
109
use daft::Diffable;
1110
use gfss::shamir::{Share, SplitError};
@@ -15,7 +14,7 @@ use secrecy::ExposeSecret;
1514
use serde::{Deserialize, Serialize};
1615
use serde_with::serde_as;
1716
use slog_error_chain::SlogInlineError;
18-
use std::collections::BTreeMap;
17+
use std::collections::{BTreeMap, BTreeSet};
1918

2019
#[derive(Debug, Clone, thiserror::Error, PartialEq, Eq, SlogInlineError)]
2120
pub enum ConfigurationError {
@@ -75,22 +74,30 @@ impl IdOrdItem for Configuration {
7574
id_upcast!();
7675
}
7776

77+
pub struct NewConfigParams<'a> {
78+
pub rack_id: RackUuid,
79+
pub epoch: Epoch,
80+
pub members: &'a BTreeSet<PlatformId>,
81+
pub threshold: Threshold,
82+
pub coordinator_id: &'a PlatformId,
83+
}
84+
7885
impl Configuration {
7986
/// Create a new configuration for the trust quorum
8087
///
8188
/// `previous_configuration` is never filled in upon construction. A
8289
/// coordinator will fill this in as necessary after retrieving shares for
8390
/// the last committed epoch.
8491
pub fn new(
85-
reconfigure_msg: &ValidatedReconfigureMsg,
92+
params: NewConfigParams<'_>,
8693
) -> Result<(Configuration, BTreeMap<PlatformId, Share>), ConfigurationError>
8794
{
88-
let coordinator = reconfigure_msg.coordinator_id().clone();
95+
let coordinator = params.coordinator_id.clone();
8996
let rack_secret = RackSecret::new();
9097
let shares = rack_secret.split(
91-
reconfigure_msg.threshold(),
92-
reconfigure_msg
93-
.members()
98+
params.threshold,
99+
params
100+
.members
94101
.len()
95102
.try_into()
96103
.map_err(|_| ConfigurationError::TooManyMembers)?,
@@ -106,19 +113,19 @@ impl Configuration {
106113
let mut members: BTreeMap<PlatformId, Sha3_256Digest> = BTreeMap::new();
107114
let mut shares: BTreeMap<PlatformId, Share> = BTreeMap::new();
108115
for (platform_id, (share, digest)) in
109-
reconfigure_msg.members().iter().cloned().zip(shares_and_digests)
116+
params.members.iter().cloned().zip(shares_and_digests)
110117
{
111118
members.insert(platform_id.clone(), digest);
112119
shares.insert(platform_id, share);
113120
}
114121

115122
Ok((
116123
Configuration {
117-
rack_id: reconfigure_msg.rack_id(),
118-
epoch: reconfigure_msg.epoch(),
124+
rack_id: params.rack_id,
125+
epoch: params.epoch,
119126
coordinator,
120127
members,
121-
threshold: reconfigure_msg.threshold(),
128+
threshold: params.threshold,
122129
encrypted_rack_secrets: None,
123130
},
124131
shares,

0 commit comments

Comments
 (0)