Skip to content

Add range sync tests#8989

Open
dapplion wants to merge 8 commits intosigp:unstablefrom
dapplion:range-sync-tests
Open

Add range sync tests#8989
dapplion wants to merge 8 commits intosigp:unstablefrom
dapplion:range-sync-tests

Conversation

@dapplion
Copy link
Collaborator

@dapplion dapplion commented Mar 16, 2026

Increase coverage for range sync using the style and pattern for lookup tests:

async fn test_name() {
    let mut r = TestRig::default();
    r.setup_xyz().await;
    r.simulate(SimulateConfig::new().some_condition()).await;
    r.assert_something();
}

This pattern in agnostic to the implementation, and achieves two goals:

  • Make tests independent of implementation details, allowing refactors (tree-sync) without needing to rewrite all tests
  • Tests are succinct and readable

Coverage (range sync, merged across all 7 forks)

File Before After Δ
chain.rs 56.0% (382/682) 79.0% (539/682) +23.0%
chain_collection.rs 81.9% (285/348) 80.2% (279/348) -1.7%
range.rs 76.2% (157/206) 86.9% (179/206) +10.7%
blocks_by_range.rs 39.1% (9/23) 91.3% (21/23) +52.2%
data_columns_by_range.rs 28.1% (9/32) 84.4% (27/32) +56.3%
blobs_by_range.rs 0% (0/30) 0% (0/30)
Total 65.3% (842/1321) 79.1% (1045/1321) +13.8%

5 old tests → 19 new tests using simulate() pattern. Tests run across all forks: base, altair, bellatrix, capella, deneb, electra, fulu.

Existing Tests Migration

1. head_chain_removed_while_finalized_syncing (regression #2821)

What it does: Add head peer → head chain created → grab head batch request → add finalized peer → finalized chain takes priority → grab finalized batch request → disconnect head peer → assert still in Finalized state.

What it tests: When a head chain exists and a finalized peer arrives, the finalized chain takes priority. Disconnecting the head peer removes the head chain but the finalized chain survives.

Migration: Already covered by finalized_to_head_transition (finalized takes priority, both complete). But the specific "head peer disconnect during finalized sync" scenario is NOT tested. Add:

async fn head_peer_disconnect_during_finalized_sync() {
    let mut r = TestRig::default();
    r.setup_finalized_and_head_sync().await;
    // disconnect head peer, finalized sync should still complete
    r.simulate(SimulateConfig::happy_path().with_disconnect_head_peers()).await;
    r.assert_range_sync_completed();
}

Needs: SimulateConfig::with_disconnect_head_peers() — disconnect head peers mid-simulate but keep finalized peers.

2. state_update_while_purging (regression #2827)

What it tested: When chain targets become known to fork choice during a state update, purge_outdated_chains runs before update_finalized_chains/update_head_chains without crashing.

Removed: The bug was a call ordering issue in ChainCollection::update(). The fix hardcodes purge_outdated_chains (line 223) before update_finalized_chains (line 227) and update_head_chains (line 231) in a single function body. This ordering can't regress without visibly rewriting update().

3. pause_and_resume_on_ee_offline

What it does: Add head peer → EE goes offline → complete head batch → processor empty (paused) → add finalized peer → complete finalized batch → processor still empty → EE back online → assert 2 chain segments in processor queue.

What it tests: When the execution engine goes offline, completed batches queue up and aren't sent to the processor. When EE comes back online, all queued batches are dispatched.

Migration:

async fn pause_and_resume_on_ee_offline() {
    let mut r = TestRig::default();
    r.setup_finalized_sync().await;
    r.simulate(SimulateConfig::happy_path().with_ee_offline_for_n_batches(2)).await;
    r.assert_range_sync_completed();
}

Needs: SimulateConfig::with_ee_offline_for_n_batches(n) — set EE offline before simulate, toggle back online after N batches complete. The simulate loop would need to call update_execution_engine_state at the right time.

4. finalized_sync_enough_global_custody_peers_few_chain_peers

What it does: Add 100 fullnode peers + 1 supernode → assert finalized state → drive sync to completion via complete_and_process_range_sync_until.

What it tests: End-to-end finalized sync with sufficient custody column coverage across many peers. Tests that range sync can complete when no single peer has all columns but the swarm collectively covers them.

Migration:Already covered by finalized_sync_completes — uses setup_finalized_sync() which adds 100 fullnode peers + 1 supernode, builds a real chain, and simulate() drives it to completion with assert_range_sync_completed().

5. finalized_sync_not_enough_custody_peers_on_start (PeerDAS-only)

What it does: Add single fullnode → assert finalized state → assert no network requests (not enough custody coverage) → add 100 fullnodes + 1 supernode → drive sync to completion.

What it tests: When there aren't enough peers to cover all custody columns, range sync creates the chain but doesn't send requests. Once enough peers arrive, sync proceeds.

Migration:

async fn finalized_sync_not_enough_custody_peers_on_start() {
    let mut r = TestRig::default();
    r.setup_finalized_sync_with_insufficient_peers().await;
    r.assert_empty_network(); // no requests sent yet
    r.add_sufficient_peers().await;
    r.simulate(SimulateConfig::happy_path()).await;
    r.assert_range_sync_completed();
}

Needs: setup_finalized_sync_with_insufficient_peers() — adds only 1 fullnode peer. add_sufficient_peers() — adds 100 fullnodes + 1 supernode. This is PeerDAS-only so needs a if !fulu_enabled() { return; } guard.

@dapplion dapplion added the test improvement Improve tests label Mar 16, 2026
@dapplion dapplion requested a review from jxs as a code owner March 16, 2026 04:14
# Conflicts:
#	beacon_node/network/src/sync/tests/range.rs
@dapplion dapplion requested a review from pawanjay176 March 16, 2026 04:19
Copy link
Member

@eserilev eserilev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice, adds lots of new test coverage and makes it easier to write future tests. Nice work! I have a nit or two but they aren't blockers

Comment on lines +325 to 326
#[allow(dead_code)]
pub fn with_custody_type(node_custody_type: NodeCustodyType) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i assume we'll plan on writing tests for nodes with different custody types in the future?

/// Set EE offline at start, bring back online after this many BlocksByRange responses
ee_offline_for_n_range_responses: Option<usize>,
/// Disconnect all peers after responding to this many BlocksByRange requests
disconnect_peers_after_range_requests: Option<usize>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name here was slightly confusing to me since I dont think we respond to the Nth request (we disconnect at Some(0)), maybe something like successful_range_responses_before_disconnect is clearer?

and the comment could read

/// Disconnect all peers after this many successful BlocksByRange responses.

Not a blocker at all, feel free to ignore

Comment on lines +753 to +756
if self
.complete_strategy
.return_wrong_range_column_indices_n_times
> 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noting in the supernode case, we'd actually end up returning no columns here I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test improvement Improve tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants