fix(consensus): scheduling rebroadcast timeout as part of recovery#3295
fix(consensus): scheduling rebroadcast timeout as part of recovery#3295
Conversation
|
I get the idea, but not sure this is the expected behavior when restoring from WAL. Imho, this should not be opaque to the user. We should trigger this from the outside. Happy to discuss further. |
Well, why? Scheduling the rebroadcast timeout is always opaque to the user (because it's always done by Malachite). Also, what is the use case for not triggering this? |
My thought process:
This is why I initially brought it up. Now, going into details:
This PR adds a rebroadcast timeout unconditionally in both cases.
The case of finalized heights. The block is decided, and other nodes have moved on. Broadcasting these votes seems pointless here.
Yes, but we're mixing two different things here.
This is why I'm advocating to trigger this from outside the restoration function. Something like the following could work: internal_consensus.recover_from_wal(entries);
internal_consensus.schedule_rebroadcast_if_needed(); |
No. #3286 describes exactly the case where the block is locally decided, but the other nodes did not move on (yet). |
that doesn't gossip its final vote.
|
Got it. In that case, what if we have In the consensus crate: pub fn recover_from_wal(...) -> Option<Round> {
// ...
max_round.map(Round::from)
}
pub fn schedule_rebroadcast(&mut self, round: Round) {
self.timeout_manager.schedule_timeout(Timeout {
kind: TimeoutKind::Rebroadcast,
round,
});
}And then in both call sites: let max_round = internal_consensus.recover_from_wal(entries);
if let Some(round) = max_round {
// Schedule rebroadcast timeout.
// See https://github.com/eqlabs/pathfinder/issues/3286 for motivation.
internal_consensus.schedule_rebroadcast(round);
}I think this keeps concerns cleaner. Wdyt? |
well, why not... |
Co-authored-by: Abel E <abel.elbaile@gmail.com>
A potential fix for #3286 : When having recent (even finalized) heights in WAL on startup, schedule a rebroadcast of votes for them, as other nodes might not have got those votes due to the previous shutdown.