Skip to content

Commit 7adaaf9

Browse files
committed
TQ: Introduce tqdb
This PR introduces a new test tool for the trust-quorum protocol: tqdb. tqdb is a repl that takes event traces produced by the `cluster` proptest and uses them for deterministic replay of actions against test state. The test state includes a "universe" of real protocol nodes, a fake nexus, and fake networks. The proptest and debugging state is shared and contained in the `trust-quorum-test-utils`. The debugger allows a variety of functionality including stepping through individual events, setting breakpoints, snapshotting and diffing states and viewing the event log itself. The purpose of tqdb is twofold: 1. Allow for debugging of failed proptests. This is non-trivial in some cases, even with shrunken tests, because the generated actions are high-level and are all generated up front. The actual operations such as reconfigurations are derived from these high level random generations in conjunction with the current state of the system. Therefore the set of failing generated actions doesn't really tell you much. You have to look at the logs, and the assertion that fired and reason about it with incomplete information. Now, for each concrete action taken, we record the event in a log. In the case of a failure an event log can be loaded into tqdb, with a breakpoint set right before the failure. A snapshot of the state can be taken, and then the failing event can be applied. The diff will tell you what changed and allow you to inspect the actual state of the system. Full visibility into your failure is now possible. 2. The trust quorum protocol is non-trivial. Tqdb allows developers to see in detail how the protocol behaves and understand what is happening in certain situations. Event logs can be created by hand (or script) for particularly interesting scenarios and then run through tqdb. In order to get the diff functionality to work as I wanted, I had to implement `Eq` for types that implemented `subtle::ConstantTimeEq` in both `gfss` (our secret sharing library), and `trust-quorum` crates. However the safety in terms of the compiler breaking the constant time guarantees is unknown. Therefore, a feature flag was added such that only `test-utils` and `tqdb` crates are able to use these implementations. They are not used in the production codebase. Feature unification is not at play here because neither `test-utils` or `tqdb` are part of the product.
1 parent 9465f5a commit 7adaaf9

File tree

27 files changed

+2394
-658
lines changed

27 files changed

+2394
-658
lines changed

Cargo.lock

Lines changed: 42 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,8 @@ members = [
141141
"test-utils",
142142
"trust-quorum",
143143
"trust-quorum/gfss",
144+
"trust-quorum/test-utils",
145+
"trust-quorum/tqdb",
144146
"typed-rng",
145147
"update-common",
146148
"update-engine",
@@ -298,6 +300,8 @@ default-members = [
298300
"sp-sim",
299301
"trust-quorum",
300302
"trust-quorum/gfss",
303+
"trust-quorum/test-utils",
304+
"trust-quorum/tqdb",
301305
"test-utils",
302306
"typed-rng",
303307
"update-common",
@@ -460,6 +464,8 @@ gateway-test-utils = { path = "gateway-test-utils" }
460464
gateway-types = { path = "gateway-types" }
461465
gethostname = "0.5.0"
462466
gfss = { path = "trust-quorum/gfss" }
467+
trust-quorum = { path = "trust-quorum" }
468+
trust-quorum-test-utils = { path = "trust-quorum/test-utils" }
463469
glob = "0.3.2"
464470
guppy = "0.17.19"
465471
headers = "0.4.1"

dev-tools/reconfigurator-cli/src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ use iddqd::IdOrdMap;
1313
use indent_write::fmt::IndentWriter;
1414
use internal_dns_types::diff::DnsDiff;
1515
use itertools::Itertools;
16-
use log_capture::LogCapture;
16+
pub use log_capture::LogCapture;
1717
use nexus_inventory::CollectionBuilder;
1818
use nexus_reconfigurator_blippy::Blippy;
1919
use nexus_reconfigurator_blippy::BlippyReportSortKey;

dev-tools/repl-utils/src/lib.rs

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ use anyhow::anyhow;
99
use anyhow::bail;
1010
use camino::Utf8Path;
1111
use clap::Parser;
12+
use reedline::Prompt;
1213
use reedline::Reedline;
1314
use reedline::Signal;
1415
use std::fs::File;
@@ -108,13 +109,24 @@ pub fn run_repl_from_file<C: Parser>(
108109
pub fn run_repl_on_stdin<C: Parser>(
109110
run_one: &mut dyn FnMut(C) -> anyhow::Result<Option<String>>,
110111
) -> anyhow::Result<()> {
111-
let mut ed = Reedline::create();
112+
let ed = Reedline::create();
112113
let prompt = reedline::DefaultPrompt::new(
113114
reedline::DefaultPromptSegment::Empty,
114115
reedline::DefaultPromptSegment::Empty,
115116
);
117+
run_repl_on_stdin_customized(ed, &prompt, run_one)
118+
}
119+
120+
/// Runs a REPL using stdin/stdout with a customized `Reedline` and `Prompt`
121+
///
122+
/// See docs for [`run_repl_on_stdin`]
123+
pub fn run_repl_on_stdin_customized<C: Parser>(
124+
mut ed: Reedline,
125+
prompt: &dyn Prompt,
126+
run_one: &mut dyn FnMut(C) -> anyhow::Result<Option<String>>,
127+
) -> anyhow::Result<()> {
116128
loop {
117-
match ed.read_line(&prompt) {
129+
match ed.read_line(prompt) {
118130
Ok(Signal::Success(buffer)) => {
119131
// Strip everything after '#' as a comment.
120132
let entry = match buffer.split_once('#') {

trust-quorum/Cargo.toml

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ license = "MPL-2.0"
88
workspace = true
99

1010
[dependencies]
11+
anyhow.workspace = true
1112
bcs.workspace = true
1213
bootstore.workspace = true
1314
camino.workspace = true
@@ -36,6 +37,18 @@ omicron-workspace-hack.workspace = true
3637

3738
[dev-dependencies]
3839
assert_matches.workspace = true
40+
dropshot.workspace = true
3941
omicron-test-utils.workspace = true
4042
proptest.workspace = true
43+
serde_json.workspace = true
4144
test-strategy.workspace = true
45+
trust-quorum-test-utils.workspace = true
46+
47+
[features]
48+
# Impl `PartialEq` and `Eq` for types implementing `subtle::ConstantTimeEq` when
49+
# this feature is enabled.
50+
#
51+
# This is of unknown risk. The rust compiler may obviate the security of using
52+
# subtle when we do this. On the other hand its very useful for testing and
53+
# debugging outside of production.
54+
danger_partial_eq_ct_wrapper = ["gfss/danger_partial_eq_ct_wrapper"]

trust-quorum/gfss/Cargo.toml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,14 @@ omicron-workspace-hack.workspace = true
2121
[dev-dependencies]
2222
proptest.workspace = true
2323
test-strategy.workspace = true
24+
25+
[features]
26+
27+
28+
# Impl `PartialEq` and `Eq` for types implementing `subtle::ConstantTimeEq` when
29+
# this feature is enabled.
30+
#
31+
# This is of unknown risk. The rust compiler may obviate the security of using
32+
# subtle when we do this. On the other hand its very useful for testing and
33+
# debugging outside of production.
34+
danger_partial_eq_ct_wrapper = []

trust-quorum/gfss/src/gf256.rs

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ use zeroize::Zeroize;
3232

3333
/// An element in a finite field of prime power 2^8
3434
///
35-
/// We explicitly don't enable the equality operators to prevent ourselves from
35+
/// We explicitly don't derive the equality operators to prevent ourselves from
3636
/// accidentally using those instead of the constant time ones.
3737
#[repr(transparent)]
3838
#[derive(Debug, Clone, Copy, Zeroize, Serialize, Deserialize)]
@@ -120,6 +120,15 @@ impl ConstantTimeEq for Gf256 {
120120
}
121121
}
122122

123+
#[cfg(feature = "danger_partial_eq_ct_wrapper")]
124+
impl PartialEq for Gf256 {
125+
fn eq(&self, other: &Self) -> bool {
126+
self.ct_eq(&other).into()
127+
}
128+
}
129+
#[cfg(feature = "danger_partial_eq_ct_wrapper")]
130+
impl Eq for Gf256 {}
131+
123132
impl Add for Gf256 {
124133
type Output = Self;
125134

trust-quorum/gfss/src/shamir.rs

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,16 @@ impl Share {
137137
}
138138
}
139139

140+
#[cfg(feature = "danger_partial_eq_ct_wrapper")]
141+
impl PartialEq for Share {
142+
fn eq(&self, other: &Self) -> bool {
143+
self.x_coordinate == other.x_coordinate
144+
&& self.y_coordinates == other.y_coordinates
145+
}
146+
}
147+
#[cfg(feature = "danger_partial_eq_ct_wrapper")]
148+
impl Eq for Share {}
149+
140150
impl std::fmt::Debug for Share {
141151
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
142152
f.debug_struct("KeyShareGf256").finish()

trust-quorum/src/compute_key_share.rs

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ use std::collections::BTreeMap;
1919

2020
/// In memory state that tracks retrieval of key shares in order to compute
2121
/// this node's key share for a given configuration.
22+
#[derive(Debug, Clone)]
2223
pub struct KeyShareComputer {
2324
log: Logger,
2425

@@ -28,6 +29,17 @@ pub struct KeyShareComputer {
2829
collected_shares: BTreeMap<PlatformId, Share>,
2930
}
3031

32+
#[cfg(feature = "danger_partial_eq_ct_wrapper")]
33+
impl PartialEq for KeyShareComputer {
34+
fn eq(&self, other: &Self) -> bool {
35+
self.config == other.config
36+
&& self.collected_shares == other.collected_shares
37+
}
38+
}
39+
40+
#[cfg(feature = "danger_partial_eq_ct_wrapper")]
41+
impl Eq for KeyShareComputer {}
42+
3143
impl KeyShareComputer {
3244
pub fn new(
3345
log: &Logger,

trust-quorum/src/configuration.rs

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
use crate::crypto::{EncryptedRackSecrets, RackSecret, Sha3_256Digest};
88
use crate::validators::ValidatedReconfigureMsg;
99
use crate::{Epoch, PlatformId, Threshold};
10+
use daft::Diffable;
1011
use gfss::shamir::{Share, SplitError};
1112
use iddqd::{IdOrdItem, id_upcast};
1213
use omicron_uuid_kinds::RackUuid;
@@ -31,7 +32,15 @@ pub enum ConfigurationError {
3132
///
3233
/// Only valid for non-lrtq configurations
3334
#[derive(
34-
Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize,
35+
Debug,
36+
Clone,
37+
PartialEq,
38+
Eq,
39+
PartialOrd,
40+
Ord,
41+
Serialize,
42+
Deserialize,
43+
Diffable,
3544
)]
3645
pub struct Configuration {
3746
/// Unique Id of the rack

0 commit comments

Comments
 (0)