Skip to content

feat: Implement Multi-Anchor Consensus (Shoal++ Section 5.2) #552

@grantkee

Description

@grantkee

Summary

Implement multiple anchors per round to increase commit frequency and reduce "anchoring latency" - the delay non-anchor nodes experience while waiting to be referenced by a committed anchor. This optimization designates multiple nodes per round as potential anchors, each operating as an independent consensus instance.

Reference: Shoal++: High Throughput DAG BFT Can Be Fast! - Section 5.2

Depends on: Fast Direct Commit Rule (Section 5.1) - #470

Background

Current Single-Anchor Approach

The existing implementation elects one leader per even round using round-robin selection:

// crates/consensus/primary/src/consensus/leader_schedule.rs - lines 284-303
pub fn leader(&self, round: Round) -> Authority {
    let next_leader = (round as usize / 2).saturating_sub(1) % self.committee.size();
    let leader: Authority = self.committee.authorities().get(next_leader)
        .expect("authority out of bounds!")
        .clone();

    let table = self.leader_swap_table.read();
    table.swap(&leader.id(), round).unwrap_or(leader)
}

Problem: Non-anchor nodes must wait until their batches are referenced by a committed anchor before their transactions are finalized. This creates "anchoring latency" proportional to 1/n where n is the committee size.

Multi-Anchor Solution

Designate multiple (or all) nodes per round as potential anchors. Each anchor operates as an independent consensus instance, but commits are ordered deterministically to maintain log consistency across replicas.

Technical Challenges

Challenge 1: Round Timeouts

Problem: DAG rounds advance when the fastest 2f+1 nodes certify. Slower replicas may be consistently excluded from anchor eligibility.

Solution: Implement small timeouts at round boundaries to keep replicas synchronized:

  • Wait briefly before advancing to allow stragglers to catch up
  • Enables denser DAG connectivity
  • Allows all n nodes to be potential anchors (not just fastest 2f+1)
// Pseudocode for round timeout
before_advancing_round(next_round):
    wait(round_timeout)  // Small delay (e.g., 50-100ms)
    proceed_to_round(next_round)

Challenge 2: Skipping Anchor Candidates

Problem: Progress stalls if an anchor candidate lacks sufficient support (weak votes/certificates).

Solution: Dynamic anchor materialization instead of pre-assignment:

  • Only one active consensus instance at a time
  • When an anchor is skipped (confirmed uncommitted), skip all subsequent virtual anchors in that sequence
  • Materialize next instance starting from first node after the last committed anchor
// Dynamic re-interpretation process
on_anchor_skip(skipped_anchor):
    skip_all_subsequent_virtual_anchors(skipped_anchor.sequence)
    next_instance = first_node_after(last_committed_anchor)
    materialize_consensus_instance(next_instance)

Implementation Approach

Phase 1: Round Timeouts

  1. Add configurable round timeout parameter
  2. Modify round advancement logic in core.rs or synchronizer.rs
  3. Balance timeout duration (too short = no benefit, too long = latency increase)

Phase 2: Multi-Anchor Selection

  1. Extend leader_schedule.rs to return multiple anchors per round
  2. Define deterministic ordering for parallel anchor commits
  3. Modify bullshark.rs to process multiple anchors per commit cycle

Phase 3: Dynamic Anchor Materialization

  1. Track virtual anchor sequences
  2. Implement skip detection and sequence invalidation
  3. Add logic to materialize next consensus instance after skips

Key Data Structures

// Proposed structures (conceptual)

/// Represents a virtual anchor candidate
struct VirtualAnchor {
    round: Round,
    authority: AuthorityIdentifier,
    sequence_id: u64,
    is_materialized: bool,
}

/// Tracks parallel consensus instances
struct MultiAnchorState {
    /// Active consensus instances (ordered by commit priority)
    instances: Vec<VirtualAnchor>,
    /// Last committed anchor for each sequence
    last_committed: HashMap<u64, VirtualAnchor>,
    /// Skipped sequences to avoid retrying
    skipped_sequences: HashSet<u64>,
}

Files to Modify

  • crates/consensus/primary/src/consensus/leader_schedule.rs - Multi-anchor selection
  • crates/consensus/primary/src/consensus/bullshark.rs - Multi-anchor commit logic
  • crates/consensus/primary/src/consensus/state.rs - Track multi-anchor state
  • crates/consensus/primary/src/core.rs - Round timeout implementation
  • crates/consensus/primary/src/synchronizer.rs - May need timeout coordination
  • crates/types/src/primary/ - New types for virtual anchors

Acceptance Criteria

  • Round timeouts are configurable and implemented
  • Multiple anchors can be designated per round
  • Anchors commit in deterministic order across all replicas
  • Dynamic anchor materialization works when anchors are skipped
  • Anchoring latency is reduced (measure in benchmarks)
  • All existing consensus tests pass
  • New tests for multi-anchor scenarios
  • No safety regressions (linearizability maintained)

Performance Considerations

  • Throughput: More frequent commits should improve transaction finality
  • Latency: Round timeouts add small delay but enable more anchor opportunities
  • Complexity: Multi-anchor tracking increases state management overhead
  • Network: May increase message complexity during commit phase

Open Questions

  1. What is the optimal round timeout duration? (likely needs tuning based on config)
  2. How many anchors per round provides the best throughput/complexity tradeoff?
  3. Should all n nodes be anchors, or a subset (e.g., 2f+1)?
  4. How does multi-anchor interact with the reputation-based leader swap table?
  5. Should anchor ordering be strictly round-robin or based on reputation scores?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions