fix(consensus): scheduling rebroadcast timeout as part of recovery by vbar · Pull Request #3295 · equilibriumco/pathfinder

vbar · 2026-03-24T13:12:44Z

A potential fix for #3286 : When having recent (even finalized) heights in WAL on startup, schedule a rebroadcast of votes for them, as other nodes might not have got those votes due to the previous shutdown.

t00ts · 2026-03-24T15:12:34Z

I get the idea, but not sure this is the expected behavior when restoring from WAL.

Imho, this should not be opaque to the user. We should trigger this from the outside.

Happy to discuss further.

vbar · 2026-03-24T15:18:49Z

Imho, this should not be opaque to the user. We should trigger this from the outside.

Well, why? Scheduling the rebroadcast timeout is always opaque to the user (because it's always done by Malachite). Also, what is the use case for not triggering this?

t00ts · 2026-03-24T15:40:27Z

Well, why?

My thought process:

The "potential fix" already asked for caution.
Then the code: adding a network side effect inside a state restoration function to fix a liveliness issue smelled like bad design.

This is why I initially brought it up.

Now, going into details:

recover_from_wal is called in two situations:

Finalized heights within history_depth
Incomplete (in-progress) heights

This PR adds a rebroadcast timeout unconditionally in both cases.

what is the use case for not triggering this?

The case of finalized heights. The block is decided, and other nodes have moved on. Broadcasting these votes seems pointless here.

Scheduling the rebroadcast timeout is always opaque to the user (because it's always done by Malachite).

Yes, but we're mixing two different things here.

In live consensus, malachite scheduling these rebroadcasts is actual protocol behaviour. It's part of the liveliness mechanism.
In WAL recovery, scheduling a rebroadcast is more of a "startup policy" decision. It really is "what should the node do AFTER restoring its state".

This is why I'm advocating to trigger this from outside the restoration function. Something like the following could work:

internal_consensus.recover_from_wal(entries);
internal_consensus.schedule_rebroadcast_if_needed();

vbar · 2026-03-24T15:46:32Z

Well, why?
what is the use case for not triggering this?

The case of finalized heights. The block is decided, and other nodes have moved on. Broadcasting these votes seems pointless here.

No. #3286 describes exactly the case where the block is locally decided, but the other nodes did not move on (yet).

t00ts · 2026-03-24T16:39:35Z

Got it. In that case, what if we have recover_from_wal return the max round it found, thus keeping it as pure state restoration (no opaque network side effects), and let the caller schedule the rebroadcast explicitly?

In the consensus crate:

pub fn recover_from_wal(...) -> Option<Round> {
    // ...
    max_round.map(Round::from)
}

pub fn schedule_rebroadcast(&mut self, round: Round) {
    self.timeout_manager.schedule_timeout(Timeout {
        kind: TimeoutKind::Rebroadcast,
        round,
    });
}

And then in both call sites:

let max_round = internal_consensus.recover_from_wal(entries);
if let Some(round) = max_round {
    // Schedule rebroadcast timeout.
    // See https://github.com/eqlabs/pathfinder/issues/3286 for motivation.
    internal_consensus.schedule_rebroadcast(round);
}

I think this keeps concerns cleaner. Wdyt?

crates/consensus/src/internal.rs

vbar · 2026-03-25T07:33:42Z

Got it. In that case, what if we have recover_from_wal return the max round it found, thus keeping it as pure state restoration (no opaque network side effects), and let the caller schedule the rebroadcast explicitly?

I think this keeps concerns cleaner. Wdyt?

well, why not...

t00ts

LGTM % that doc comment

crates/consensus/src/internal.rs

… WAL

that doesn't gossip its final vote.

CHr15F0x

LGTM mod the rename.

CHr15F0x · 2026-04-04T09:15:53Z

crates/pathfinder/src/consensus/inner/integration_testing.rs

+    feature = "consensus-integration-tests",
+    debug_assertions
+))]
+pub fn do_not_send_vote(vote_height: u64, inject_failure: Option<InjectFailureConfig>) -> bool {


Suggested change

pub fn do_not_send_vote(vote_height: u64, inject_failure: Option<InjectFailureConfig>) -> bool {

pub fn debug_do_not_send_vote(vote_height: u64, inject_failure: Option<InjectFailureConfig>) -> bool {

CHr15F0x · 2026-04-04T09:16:04Z

crates/pathfinder/src/consensus/inner/integration_testing.rs

+    feature = "consensus-integration-tests",
+    debug_assertions
+)))]
+pub fn do_not_send_vote(


Suggested change

pub fn do_not_send_vote(

pub fn debug_do_not_send_vote(

vbar requested a review from a team as a code owner March 24, 2026 13:12

t00ts reviewed Mar 24, 2026

View reviewed changes

crates/consensus/src/internal.rs Outdated Show resolved Hide resolved

t00ts previously approved these changes Mar 27, 2026

View reviewed changes

crates/consensus/src/internal.rs Outdated Show resolved Hide resolved

vbar force-pushed the vbar/consensus-rebroadcast-recovery branch 2 times, most recently from ce9ba9a to b17bca1 Compare March 30, 2026 06:19

vbar added 2 commits March 31, 2026 13:05

fix(consensus): schedule rebroadcast timeout as part of recovery from…

faa4f74

… WAL

test(consensus): add a variant of ProposalCommitted test failure

9904a5f

that doesn't gossip its final vote.

vbar force-pushed the vbar/consensus-rebroadcast-recovery branch from e9b5d3a to 9904a5f Compare March 31, 2026 12:20

CHr15F0x approved these changes Apr 4, 2026

View reviewed changes

vbar dismissed t00ts’s stale review via 9904a5f April 4, 2026 19:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(consensus): scheduling rebroadcast timeout as part of recovery#3295

fix(consensus): scheduling rebroadcast timeout as part of recovery#3295
vbar wants to merge 2 commits intomainfrom
vbar/consensus-rebroadcast-recovery

vbar commented Mar 24, 2026

Uh oh!

t00ts commented Mar 24, 2026

Uh oh!

vbar commented Mar 24, 2026

Uh oh!

t00ts commented Mar 24, 2026

Uh oh!

vbar commented Mar 24, 2026

Uh oh!

t00ts commented Mar 24, 2026

Uh oh!

Uh oh!

vbar commented Mar 25, 2026

Uh oh!

t00ts left a comment

Uh oh!

Uh oh!

CHr15F0x left a comment

Uh oh!

CHr15F0x Apr 4, 2026

Uh oh!

CHr15F0x Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	pub fn do_not_send_vote(vote_height: u64, inject_failure: Option<InjectFailureConfig>) -> bool {
	pub fn debug_do_not_send_vote(vote_height: u64, inject_failure: Option<InjectFailureConfig>) -> bool {

Conversation

vbar commented Mar 24, 2026

Uh oh!

t00ts commented Mar 24, 2026

Uh oh!

vbar commented Mar 24, 2026

Uh oh!

t00ts commented Mar 24, 2026

Uh oh!

vbar commented Mar 24, 2026

Uh oh!

t00ts commented Mar 24, 2026

Uh oh!

Uh oh!

vbar commented Mar 25, 2026

Uh oh!

t00ts left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

CHr15F0x left a comment

Choose a reason for hiding this comment

Uh oh!

CHr15F0x Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

CHr15F0x Apr 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants