fix(consensus): scheduling rebroadcast timeout as part of recovery#3295
fix(consensus): scheduling rebroadcast timeout as part of recovery#3295
Conversation
|
I get the idea, but not sure this is the expected behavior when restoring from WAL. Imho, this should not be opaque to the user. We should trigger this from the outside. Happy to discuss further. |
Well, why? Scheduling the rebroadcast timeout is always opaque to the user (because it's always done by Malachite). Also, what is the use case for not triggering this? |
My thought process:
This is why I initially brought it up. Now, going into details:
This PR adds a rebroadcast timeout unconditionally in both cases.
The case of finalized heights. The block is decided, and other nodes have moved on. Broadcasting these votes seems pointless here.
Yes, but we're mixing two different things here.
This is why I'm advocating to trigger this from outside the restoration function. Something like the following could work: internal_consensus.recover_from_wal(entries);
internal_consensus.schedule_rebroadcast_if_needed(); |
No. #3286 describes exactly the case where the block is locally decided, but the other nodes did not move on (yet). |
|
Got it. In that case, what if we have In the consensus crate: pub fn recover_from_wal(...) -> Option<Round> {
// ...
max_round.map(Round::from)
}
pub fn schedule_rebroadcast(&mut self, round: Round) {
self.timeout_manager.schedule_timeout(Timeout {
kind: TimeoutKind::Rebroadcast,
round,
});
}And then in both call sites: let max_round = internal_consensus.recover_from_wal(entries);
if let Some(round) = max_round {
// Schedule rebroadcast timeout.
// See https://github.com/eqlabs/pathfinder/issues/3286 for motivation.
internal_consensus.schedule_rebroadcast(round);
}I think this keeps concerns cleaner. Wdyt? |
well, why not... |
ce9ba9a to
b17bca1
Compare
that doesn't gossip its final vote.
e9b5d3a to
9904a5f
Compare
| feature = "consensus-integration-tests", | ||
| debug_assertions | ||
| ))] | ||
| pub fn do_not_send_vote(vote_height: u64, inject_failure: Option<InjectFailureConfig>) -> bool { |
There was a problem hiding this comment.
| pub fn do_not_send_vote(vote_height: u64, inject_failure: Option<InjectFailureConfig>) -> bool { | |
| pub fn debug_do_not_send_vote(vote_height: u64, inject_failure: Option<InjectFailureConfig>) -> bool { |
| feature = "consensus-integration-tests", | ||
| debug_assertions | ||
| )))] | ||
| pub fn do_not_send_vote( |
There was a problem hiding this comment.
| pub fn do_not_send_vote( | |
| pub fn debug_do_not_send_vote( |
A potential fix for #3286 : When having recent (even finalized) heights in WAL on startup, schedule a rebroadcast of votes for them, as other nodes might not have got those votes due to the previous shutdown.