-
Notifications
You must be signed in to change notification settings - Fork 65
apollo_consensus: add honest nodes consensus simulation #10868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
apollo_consensus: add honest nodes consensus simulation #10868
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
fc26f31 to
ceb5ac4
Compare
dafnamatsry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dafnamatsry reviewed 4 files and all commit messages, and made 11 comments.
Reviewable status: all files reviewed, 11 unresolved discussions (waiting on @asmaastarkware and @matanl-starkware).
crates/apollo_consensus/src/simulation_test.rs line 43 at r2 (raw file):
/// A timed event in the discrete event simulation. /// /// Events are ordered by tick (time) in reverse order for the priority queue
This is confusing. They are ordered ascending order right?
Code quote:
reverse ordercrates/apollo_consensus/src/simulation_test.rs line 81 at r2 (raw file):
rng: StdRng, /// The ID of the node being simulated (node 0). node_id: ValidatorId,
If it is always 0, better make it a constant.
Code quote:
/// The ID of the node being simulated (node 0).
node_id: ValidatorId,crates/apollo_consensus/src/simulation_test.rs line 122 at r2 (raw file):
fn get_leader(&self, round: Round) -> ValidatorId { let idx = usize::try_from(round).unwrap() % self.validators.len();
Does that mean that in all tests, node 0 (the node under test) will be the proposer in the first round?
I think that should be randomised too.
crates/apollo_consensus/src/simulation_test.rs line 144 at r2 (raw file):
if leader_id != self.node_id { self.schedule( 1,
Why not random?
Code quote:
1,crates/apollo_consensus/src/simulation_test.rs line 156 at r2 (raw file):
// 2. Votes from other honest validators // Skip index 0 (self) - our votes are handled by the state machine for i in 1..self.validators.len() {
Depending on what you decide to do with node_id:
- If you keep it as a member like now - change the for loop to go over all the indices except
node_id. - If you make it a constant, so you can keep the for loop as is.
Code quote:
// Skip index 0 (self) - our votes are handled by the state machine
for i in 1..self.validators.len() {crates/apollo_consensus/src/simulation_test.rs line 202 at r2 (raw file):
// Start the single height consensus match self.shc.start(&leader_fn) {
Can you pass self.get_leader instead?
Code quote:
&leader_fncrates/apollo_consensus/src/simulation_test.rs line 240 at r2 (raw file):
match req { SMRequest::StartValidateProposal(init) => { let delay = self.rng.gen_range(5..10);
Consider using a larger range for the delay, to simulate a more realistic scenario.
IIUC, votes are scheduled between ticks 2 and 20, we should ensure the request completion covers at least that entire window.
crates/apollo_consensus/src/simulation_test.rs line 249 at r2 (raw file):
} SMRequest::StartBuildProposal(round) => { let delay = self.rng.gen_range(5..10);
Same here.
crates/apollo_consensus/src/simulation_test.rs line 280 at r2 (raw file):
fn verify_honest_success(sim: &DiscreteEventSimulation, result: Option<&Decision>) { let decision = result.expect("FAILURE: Simulation timed out! Honest network should always decide.");
Log the simulation events and history in case of failures, so we can easily debug the tests.
crates/apollo_consensus/src/simulation_test.rs line 304 at r2 (raw file):
} }) .count();
Compare the valid_votes with the returned decision.precommits.
crates/apollo_consensus/src/simulation_test.rs line 307 at r2 (raw file):
// Add 1 for self (our vote is tracked internally by the state machine) let total = valid_votes + 1;
How do we make sure we voted correctly? Both prevote and precommit?
ceb5ac4 to
0e51b51
Compare
asmaastarkware
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmaastarkware made 2 comments.
Reviewable status: 3 of 4 files reviewed, 11 unresolved discussions (waiting on @dafnamatsry and @matanl-starkware).
crates/apollo_consensus/src/simulation_test.rs line 144 at r2 (raw file):
Previously, dafnamatsry wrote…
Why not random?
in the honest case, we expect the proposal to be received first, followed by the votes
(honest node will send proposal then vote)
crates/apollo_consensus/src/simulation_test.rs line 307 at r2 (raw file):
Previously, dafnamatsry wrote…
How do we make sure we voted correctly? Both prevote and precommit?
I think it's a unit test. checking that the result is true is enough
matanl-starkware
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matanl-starkware reviewed all commit messages and made 4 comments.
Reviewable status: 3 of 4 files reviewed, 12 unresolved discussions (waiting on @asmaastarkware and @dafnamatsry).
crates/apollo_consensus/src/simulation_test.rs line 122 at r2 (raw file):
Previously, dafnamatsry wrote…
Does that mean that in all tests, node 0 (the node under test) will be the proposer in the first round?
I think that should be randomised too.
As far as I am aware, this function will always return the same value for a given round.
Maybe probability P for node-0 to be the proposer (the one under test), and (1-P) for someone else?
crates/apollo_consensus/src/simulation_test.rs line 240 at r2 (raw file):
Previously, dafnamatsry wrote…
Consider using a larger range for the delay, to simulate a more realistic scenario.
IIUC, votes are scheduled between ticks 2 and 20, we should ensure the request completion covers at least that entire window.
Agree with @dafnamatsry.
In addition, consider putting all these "ranges" as constants somewhere.
crates/apollo_consensus/src/simulation_test.rs line 89 at r3 (raw file):
validators: Vec<ValidatorId>, /// Priority queue of timed events (min-heap by tick). timeline: BinaryHeap<TimedEvent>,
Are these the future events that have yet to be processed (as opposed to processed_history)?
Consider renaming.
Code quote:
timelinecrates/apollo_consensus/src/simulation_test.rs line 100 at r3 (raw file):
let rng = StdRng::seed_from_u64(seed); let validators: Vec<ValidatorId> = (0..total_nodes).map(|i| ValidatorId::from(u64::try_from(i).unwrap())).collect();
We know for sure that total_nodes can fit into u64
Suggestion:
ValidatorId::from(i as u64)0e51b51 to
7e7b39b
Compare
asmaastarkware
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmaastarkware made 3 comments.
Reviewable status: 3 of 4 files reviewed, 12 unresolved discussions (waiting on @dafnamatsry and @matanl-starkware).
crates/apollo_consensus/src/simulation_test.rs line 122 at r2 (raw file):
Previously, matanl-starkware (Matan Lior) wrote…
As far as I am aware, this function will always return the same value for a given round.
Maybe probability P for node-0 to be the proposer (the one under test), and (1-P) for someone else?
Done.
crates/apollo_consensus/src/simulation_test.rs line 89 at r3 (raw file):
Previously, matanl-starkware (Matan Lior) wrote…
Are these the future events that have yet to be processed (as opposed to
processed_history)?
Consider renaming.
Done.
crates/apollo_consensus/src/simulation_test.rs line 100 at r3 (raw file):
Previously, matanl-starkware (Matan Lior) wrote…
We know for sure that total_nodes can fit into u64
I used the try_from to satisfy clippy::as_conversions lint
matanl-starkware
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matanl-starkware made 1 comment and resolved 1 discussion.
Reviewable status: 3 of 4 files reviewed, 11 unresolved discussions (waiting on @asmaastarkware and @dafnamatsry).
crates/apollo_consensus/src/simulation_test.rs line 122 at r2 (raw file):
Previously, asmaastarkware (asmaa-starkware) wrote…
Done.
For round=0, this code will always yield the same result
Code snippet:
let seed = SIMULATION_SEED.wrapping_mul(31).wrapping_add(round_u64);
let mut round_rng = StdRng::seed_from_u64(seed);
let random_value: f64 = round_rng.gen();
asmaastarkware
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmaastarkware made 1 comment.
Reviewable status: 3 of 4 files reviewed, 11 unresolved discussions (waiting on @dafnamatsry).
crates/apollo_consensus/src/simulation_test.rs line 122 at r2 (raw file):
Previously, matanl-starkware (Matan Lior) wrote…
For round=0, this code will always yield the same result
for each seed and round, it will yield the same validator, which is the expected behavior
2b93b33 to
e9629cb
Compare
e9629cb to
5274c34
Compare
dafnamatsry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dafnamatsry reviewed 1 file and all commit messages, made 2 comments, and resolved 8 discussions.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @asmaastarkware).
crates/apollo_consensus/src/simulation_test.rs line 144 at r2 (raw file):
Previously, asmaastarkware (asmaa-starkware) wrote…
in the honest case, we expect the proposal to be received first, followed by the votes
(honest node will send proposal then vote)
But it is still possible that the other honest nodes got the proposal and voted before the tested node got the proposal.
crates/apollo_consensus/src/simulation_test.rs line 32 at r5 (raw file):
lazy_static! { static ref VALIDATOR_ID: ValidatorId = ValidatorId::from(0u64);
Suggestion:
NODE_05274c34 to
e6ff30c
Compare
e6ff30c to
363eca6
Compare
363eca6 to
ec67905
Compare
51b8a2f to
1888781
Compare
matanl-starkware
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matanl-starkware reviewed 4 files and all commit messages.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @asmaastarkware and @dafnamatsry).
59d48c1 to
b631aa8
Compare
81ff6ba to
1d4c9c1
Compare
b631aa8 to
9278d7c
Compare
1d4c9c1 to
2d569b4
Compare
2d569b4 to
75c89e3
Compare
9278d7c to
32a5ce3
Compare
dafnamatsry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dafnamatsry reviewed 2 files, made 4 comments, and resolved 3 discussions.
Reviewable status: all files reviewed (commit messages unreviewed), 5 unresolved discussions (waiting on @asmaastarkware).
crates/apollo_consensus/src/simulation_test.rs line 91 at r9 (raw file):
validators: Vec<ValidatorId>, /// The current maximum round being processed. current_max_round: Option<Round>,
Why max? Isn't it just tracking the current round of the simulation?
Code quote:
/// The current maximum round being processed.
current_max_round: Option<Round>,crates/apollo_consensus/src/simulation_test.rs line 193 at r9 (raw file):
// honest network. Update this once we add invalid votes: enable invalid votes to be // sent anytime. let base = self.build_finish_info.get(&round).map(|(delay, _)| *delay).unwrap_or(0);
Isn't it always None at this point? generate_round_info comes before handle_requests for a specific round.
Code quote:
let base = self.build_finish_info.get(&round).map(|(delay, _)| *delay).unwrap_or(0);crates/apollo_consensus/src/simulation_test.rs line 225 at r9 (raw file):
fn check_and_generate_next_round(&mut self) { if self.current_max_round.is_none() || self.current_max_round.unwrap() < self.shc.current_round()
I don't think the number of tested rounds should depend on the SHC.
I think it should be a configuration of the test to generate traffic for X rounds, where maybe each round has a configured time range for its traffic.
crates/apollo_consensus/src/simulation_test.rs line 328 at r9 (raw file):
peer_precommits: usize, finished_proposal: bool, expected_commitment: ProposalCommitment,
Is this always set to the constant PROPOSAL_COMMITMENT? If so, I think you can remove it.
Code quote:
expected_commitment: ProposalCommitment,
asmaastarkware
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmaastarkware made 4 comments and dismissed @dafnamatsry from a discussion.
Reviewable status: all files reviewed (commit messages unreviewed), 5 unresolved discussions (waiting on @dafnamatsry).
crates/apollo_consensus/src/simulation_test.rs line 91 at r9 (raw file):
Previously, dafnamatsry wrote…
Why max? Isn't it just tracking the current round of the simulation?
Done.
crates/apollo_consensus/src/simulation_test.rs line 193 at r9 (raw file):
Previously, dafnamatsry wrote…
Isn't it always
Noneat this point?generate_round_infocomes beforehandle_requestsfor a specific round.
we call check_and_generate_next_round after calling shc.start(), if the start returned a start build proposal request, then we will update the info for this round..
this is happeneing every new round, when the SM starts a new round, it will return ScheduleTimeoutPropose or start build request.. so if we are the proposer we will "execute the request + update the info" then check if we should generate another round traffic, yes
crates/apollo_consensus/src/simulation_test.rs line 225 at r9 (raw file):
Previously, dafnamatsry wrote…
I don't think the number of tested rounds should depend on the SHC.
I think it should be a configuration of the test to generate traffic for X rounds, where maybe each round has a configured time range for its traffic.
I think since we can't predict which rounds will be reached in an event-driven simulation with random delays, and the test has a deadline to prevent infinite runs.. the current approach is ok
crates/apollo_consensus/src/simulation_test.rs line 328 at r9 (raw file):
Previously, dafnamatsry wrote…
Is this always set to the constant
PROPOSAL_COMMITMENT? If so, I think you can remove it.
no, will update it the next PRs
32a5ce3 to
a95546c
Compare
a95546c to
877d8d7
Compare
877d8d7 to
4ee399d
Compare
matanl-starkware
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matanl-starkware reviewed 3 files and all commit messages, and made 7 comments.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @asmaastarkware and @dafnamatsry).
crates/apollo_consensus/src/simulation_test.rs line 30 at r11 (raw file):
const NODE_0_LEADER_PROBABILITY: f64 = 0.1; const ROUND_DURATION: u64 = 100; // Each round spans ~100 ticks const ROUND_OVERLAP: u64 = 20; // Small overlap between rounds
Overlap in terms of time (ticks)? Consider defining as percent of ROUND_DURATION
Code quote:
OVERLAPcrates/apollo_consensus/src/simulation_test.rs line 177 at r11 (raw file):
if leader_id == *NODE_0 { self.node_0_proposer_rounds.insert(round); }
Consider returning at this stage, and avoid future NODE_0 checks in this flow.
Code quote:
}crates/apollo_consensus/src/simulation_test.rs line 181 at r11 (raw file):
// 1. Proposal from leader (if not NODE_0) if leader_id != *NODE_0 { let proposal_tick = round_start_tick + self.rng.gen_range(2..20);
Consider defining all ranges near the ROUND_DURATION definition, so it can be easily seen (and possibly modified) in one place.
Code quote:
2..20crates/apollo_consensus/src/simulation_test.rs line 197 at r11 (raw file):
// For rounds where NODE_0 is proposer, votes will be scheduled after build finish if leader_id != *NODE_0 { for i in 1..self.validators.len() {
Why not use schedule_honest_peer_votes_after_build?
Code quote:
for i in 1..self.validators.len()crates/apollo_consensus/src/simulation_test.rs line 202 at r11 (raw file):
let prevote_tick = (round_start_tick + self.rng.gen_range(20..60)).min(round_end_tick);
This theoretically enables PreVote and PreCommit to be sent in the same tick (I'm not sure in which order).
Code quote:
.min(round_end_tick)crates/apollo_consensus/src/simulation_test.rs line 318 at r11 (raw file):
SMRequest::StartValidateProposal(init) => { let delay = self.rng.gen_range(15..30); let finish_tick = self.current_tick + delay;
Suggestion:
validate_finish_tickcrates/apollo_consensus/src/simulation_test.rs line 535 at r11 (raw file):
let mut sim = DiscreteEventSimulation::new(TOTAL_NODES, seed, num_rounds); let deadline_ticks = u64::try_from(num_rounds).unwrap() * 100;
Suggestion:
* ROUND_DURATION8cfadb5 to
996a755
Compare
dafnamatsry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dafnamatsry reviewed 3 files and all commit messages, made 9 comments, and resolved 5 discussions.
Reviewable status: all files reviewed, 12 unresolved discussions (waiting on @asmaastarkware).
crates/apollo_consensus/src/simulation_test.rs line 31 at r12 (raw file):
// Timing configuration (all values in ticks) const ROUND_DURATION: u64 = 100; // Each round spans ~100 ticks
Why ~?
Code quote:
~100crates/apollo_consensus/src/simulation_test.rs line 114 at r12 (raw file):
/// History of all processed events. processed_history: Vec<InputEvent>, /// Tracks what the node actually voted for in each round.
Is it only the latest vote in each round?
crates/apollo_consensus/src/simulation_test.rs line 198 at r12 (raw file):
// 1. Proposal from leader let proposal_tick = round_start_tick + self.rng.gen_range(PROPOSAL_ARRIVAL_DELAY.0..PROPOSAL_ARRIVAL_DELAY.1);
You can define the delays as Ranges, i.e.:
const PROPOSAL_ARRIVAL_DELAY: Range = 2..20;
and then you can call self.rng.gen_range(PROPOSAL_ARRIVAL_DELAY)
crates/apollo_consensus/src/simulation_test.rs line 325 at r12 (raw file):
// If NODE_0 is proposer for this round, schedule peer votes after build finish if self.node_0_proposer_rounds.contains(&round) { self.schedule_peer_votes(round, build_finish_tick);
schedule_peer_votes accepts the round_start_tick. Sending build_finish_tickmay result in scheduling votes after the round duration.
Code quote:
self.schedule_peer_votes(round, build_finish_tick);crates/apollo_consensus/src/simulation_test.rs line 326 at r12 (raw file):
if self.node_0_proposer_rounds.contains(&round) { self.schedule_peer_votes(round, build_finish_tick); }
It has to be a proposer for this round if we got StartBuildProposal right?
Suggestion:
// Schedule peer votes after build finish
assert!(self.node_0_proposer_rounds.contains(&round));
self.schedule_peer_votes(round, build_finish_tick);crates/apollo_consensus/src/simulation_test.rs line 345 at r12 (raw file):
}; let timeout_tick = self.current_tick + delay; self.schedule_at_tick(timeout_tick, InputEvent::Internal(event));
Could it be that this will be scheduled beyond the round duration?
If yes and we are ok with it, please document it.
If no, how are we enforcing it?
crates/apollo_consensus/src/simulation_test.rs line 378 at r12 (raw file):
InputEvent::Vote(v) => { if v.vote_type == VoteType::Precommit { if let Some(proposal_commitment) = v.proposal_commitment {
Suggestion:
if let (VoteType::Precommit, Some(commitemt)) = (v.vote_type, v.proposal_commitment) {crates/apollo_consensus/src/simulation_test.rs line 425 at r12 (raw file):
0 } });
Suggestion:
let self_vote = sim.node_votes.get(&r).filter(|v| {
v.vote_type == VoteType::Precommit
&& v.proposal_commitment == Some(expected_commitment)})
.iter().count();crates/apollo_consensus/src/simulation_test.rs line 488 at r12 (raw file):
HashSet::new() } });
Suggestion:
let expected_diff: HashSet<_> = sim.node_votes.get(&decided_round).filter(|v| {
v.vote_type == VoteType::Precommit
&& v.proposal_commitment == Some(decided_block)})
.iter().cloned().collect()996a755 to
f4e8619
Compare
matanl-starkware
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matanl-starkware reviewed 2 files and all commit messages, and resolved 7 discussions.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @asmaastarkware).
f4e8619 to
dbcb7f4
Compare
asmaastarkware
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmaastarkware made 3 comments.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @dafnamatsry).
crates/apollo_consensus/src/simulation_test.rs line 114 at r12 (raw file):
Previously, dafnamatsry wrote…
Is it only the latest vote in each round?
1 vote per round
crates/apollo_consensus/src/simulation_test.rs line 198 at r12 (raw file):
Previously, dafnamatsry wrote…
You can define the delays as
Ranges, i.e.:const PROPOSAL_ARRIVAL_DELAY: Range = 2..20;and then you can call
self.rng.gen_range(PROPOSAL_ARRIVAL_DELAY)
Done.
crates/apollo_consensus/src/simulation_test.rs line 326 at r12 (raw file):
Previously, dafnamatsry wrote…
It has to be a proposer for this round if we got
StartBuildProposalright?
yes
dafnamatsry
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dafnamatsry reviewed 2 files and all commit messages, made 1 comment, and resolved 5 discussions.
Reviewable status:complete! all files reviewed, all discussions resolved (waiting on @asmaastarkware).
22eb9d0

No description provided.