-
Notifications
You must be signed in to change notification settings - Fork 16
Description
Summary
Implement multiple anchors per round to increase commit frequency and reduce "anchoring latency" - the delay non-anchor nodes experience while waiting to be referenced by a committed anchor. This optimization designates multiple nodes per round as potential anchors, each operating as an independent consensus instance.
Reference: Shoal++: High Throughput DAG BFT Can Be Fast! - Section 5.2
Depends on: Fast Direct Commit Rule (Section 5.1) - #470
Background
Current Single-Anchor Approach
The existing implementation elects one leader per even round using round-robin selection:
// crates/consensus/primary/src/consensus/leader_schedule.rs - lines 284-303
pub fn leader(&self, round: Round) -> Authority {
let next_leader = (round as usize / 2).saturating_sub(1) % self.committee.size();
let leader: Authority = self.committee.authorities().get(next_leader)
.expect("authority out of bounds!")
.clone();
let table = self.leader_swap_table.read();
table.swap(&leader.id(), round).unwrap_or(leader)
}Problem: Non-anchor nodes must wait until their batches are referenced by a committed anchor before their transactions are finalized. This creates "anchoring latency" proportional to 1/n where n is the committee size.
Multi-Anchor Solution
Designate multiple (or all) nodes per round as potential anchors. Each anchor operates as an independent consensus instance, but commits are ordered deterministically to maintain log consistency across replicas.
Technical Challenges
Challenge 1: Round Timeouts
Problem: DAG rounds advance when the fastest 2f+1 nodes certify. Slower replicas may be consistently excluded from anchor eligibility.
Solution: Implement small timeouts at round boundaries to keep replicas synchronized:
- Wait briefly before advancing to allow stragglers to catch up
- Enables denser DAG connectivity
- Allows all n nodes to be potential anchors (not just fastest 2f+1)
// Pseudocode for round timeout
before_advancing_round(next_round):
wait(round_timeout) // Small delay (e.g., 50-100ms)
proceed_to_round(next_round)
Challenge 2: Skipping Anchor Candidates
Problem: Progress stalls if an anchor candidate lacks sufficient support (weak votes/certificates).
Solution: Dynamic anchor materialization instead of pre-assignment:
- Only one active consensus instance at a time
- When an anchor is skipped (confirmed uncommitted), skip all subsequent virtual anchors in that sequence
- Materialize next instance starting from first node after the last committed anchor
// Dynamic re-interpretation process
on_anchor_skip(skipped_anchor):
skip_all_subsequent_virtual_anchors(skipped_anchor.sequence)
next_instance = first_node_after(last_committed_anchor)
materialize_consensus_instance(next_instance)
Implementation Approach
Phase 1: Round Timeouts
- Add configurable round timeout parameter
- Modify round advancement logic in
core.rsorsynchronizer.rs - Balance timeout duration (too short = no benefit, too long = latency increase)
Phase 2: Multi-Anchor Selection
- Extend
leader_schedule.rsto return multiple anchors per round - Define deterministic ordering for parallel anchor commits
- Modify
bullshark.rsto process multiple anchors per commit cycle
Phase 3: Dynamic Anchor Materialization
- Track virtual anchor sequences
- Implement skip detection and sequence invalidation
- Add logic to materialize next consensus instance after skips
Key Data Structures
// Proposed structures (conceptual)
/// Represents a virtual anchor candidate
struct VirtualAnchor {
round: Round,
authority: AuthorityIdentifier,
sequence_id: u64,
is_materialized: bool,
}
/// Tracks parallel consensus instances
struct MultiAnchorState {
/// Active consensus instances (ordered by commit priority)
instances: Vec<VirtualAnchor>,
/// Last committed anchor for each sequence
last_committed: HashMap<u64, VirtualAnchor>,
/// Skipped sequences to avoid retrying
skipped_sequences: HashSet<u64>,
}Files to Modify
crates/consensus/primary/src/consensus/leader_schedule.rs- Multi-anchor selectioncrates/consensus/primary/src/consensus/bullshark.rs- Multi-anchor commit logiccrates/consensus/primary/src/consensus/state.rs- Track multi-anchor statecrates/consensus/primary/src/core.rs- Round timeout implementationcrates/consensus/primary/src/synchronizer.rs- May need timeout coordinationcrates/types/src/primary/- New types for virtual anchors
Acceptance Criteria
- Round timeouts are configurable and implemented
- Multiple anchors can be designated per round
- Anchors commit in deterministic order across all replicas
- Dynamic anchor materialization works when anchors are skipped
- Anchoring latency is reduced (measure in benchmarks)
- All existing consensus tests pass
- New tests for multi-anchor scenarios
- No safety regressions (linearizability maintained)
Performance Considerations
- Throughput: More frequent commits should improve transaction finality
- Latency: Round timeouts add small delay but enable more anchor opportunities
- Complexity: Multi-anchor tracking increases state management overhead
- Network: May increase message complexity during commit phase
Open Questions
- What is the optimal round timeout duration? (likely needs tuning based on config)
- How many anchors per round provides the best throughput/complexity tradeoff?
- Should all n nodes be anchors, or a subset (e.g., 2f+1)?
- How does multi-anchor interact with the reputation-based leader swap table?
- Should anchor ordering be strictly round-robin or based on reputation scores?