Skip to content

Conversation

@ilitteri
Copy link
Collaborator

Motivation

ethrex downloads ALL 24.4M headers sequentially (15-20 min on mainnet) before starting state sync. Nethermind runs header download concurrently with state sync, removing headers from the critical path entirely.

Description

Restructure sync_cycle_snap() to overlap header downloads with state sync:

  1. Fetch pivot by hash: request_pivot_header() fetches the sync_head header directly via get_block_header_by_hash() — no need to download the full chain first
  2. Background download: download_headers_background() runs as a tokio task, downloading remaining headers concurrently with state sync
  3. Shared state: block_sync_state wrapped in Arc<tokio::sync::Mutex<>> shared between background download and state sync (for update_pivot calls)
  4. Await before finalization: snap_sync() awaits the header download handle before store_block_bodies and forkchoice_update, which need the full header chain

On mainnet, state sync takes 30+ minutes — headers (15-20 min) complete well before state sync finishes, making headers essentially "free".

Files changed

  • snap_sync.rs: Restructured sync_cycle_snap, new request_pivot_header, new download_headers_background, updated snap_sync and update_pivot signatures
  • peer_handler.rs: Added get_block_header_by_hash() method
  • snap/client.rs: Updated request_account_range to accept Arc<Mutex<SnapBlockSyncState>>

How to Test

  • Run snap sync on hoodi and verify headers download in background (logs show "Background header download started" then state sync progress immediately)
  • Verify "Background header download complete" appears before finalization
  • Compare total sync time vs sequential baseline

Fetch the pivot header by hash directly and start state sync immediately.
Headers download runs as a background tokio task concurrently with state
sync, removing 15-20 minutes of header download from the critical path
on mainnet.

Key changes:
- sync_cycle_snap: fetch pivot early via request_pivot_header, spawn
  download_headers_background, call snap_sync immediately
- snap_sync/update_pivot: block_sync_state is now Arc<Mutex<>> shared
  between background download and state sync
- snap_sync awaits header_download_handle before finalization
- Added PeerHandler::get_block_header_by_hash for single-header requests
@ilitteri ilitteri requested a review from a team as a code owner February 12, 2026 03:41
Copilot AI review requested due to automatic review settings February 12, 2026 03:41
@github-actions github-actions bot added L1 Ethereum client performance Block execution throughput and performance in general labels Feb 12, 2026
@github-actions
Copy link

🤖 Kimi Code Review

Review Summary

This PR introduces significant changes to snap sync by decoupling header downloads from state sync. While the approach is sound, there are several critical issues that need addressing:

Critical Issues

  1. Race Condition in download_headers_background (lines 209-211):

    let current_head = {
        let bss = block_sync_state.lock().await;
        bss.store.get_latest_canonical_block_hash().await?
    };

    The lock is dropped immediately after getting the store reference, but bss.store is just a clone of the store. This creates a race condition where the actual latest canonical block hash might change between getting it and using it.

  2. Potential Deadlock in snap_sync (line 642):

    header_download_handle.await??;

    If the background task panics or the channel closes, this could hang indefinitely. Consider adding a timeout.

  3. Incorrect State Access Pattern (multiple locations):
    The PR changes block_sync_state from &mut to &Arc<Mutex<>> but doesn't properly handle the locking pattern. For example, in request_account_range (line 99), the parameter is &Arc<Mutex<>> but the function signature suggests it should probably just take the state directly.

Security & Correctness Issues

  1. Missing Request ID Validation (lines 633-658 in get_block_header_by_hash):
    The response ID is ignored (id: _ in pattern matching). This could allow malicious peers to send responses to wrong requests.

  2. Unbounded Retry Logic (lines 135-155 in request_pivot_header):
    The retry logic uses exponential backoff but has no upper bound on total wait time. A malicious network could cause indefinite blocking.

  3. State Consistency Risk (lines 651-658 in snap_sync):
    After waiting for headers to complete, there's no validation that the downloaded headers actually form a valid chain to the pivot.

Performance & Best Practices

  1. Inefficient Locking:
    The pattern block_sync_state.lock().await is used repeatedly in tight loops. Consider batching operations to reduce lock contention.

  2. Redundant Clones:
    In snap_sync (line 651), bss.block_hashes.iter().rev() creates unnecessary copies of hashes.

  3. Error Handling:
    Several debug! logs should probably be warn! or error! for better observability (lines 642, 647 in peer_handler.rs).

Suggested Fixes

  1. Fix race condition:

    let current_head = store.get_latest_canonical_block_hash().await?.unwrap_or(H256::zero());
  2. Add request ID validation:

    Ok(RLPxMessage::BlockHeaders(BlockHeaders { id, block_headers })) => {
        if id != request_id {
            debug!("Received response with mismatched ID");
            continue;
        }
        // ... rest of processing
  3. Add timeout to header download:

    tokio::time::timeout(Duration::from_secs(300), header_download_handle).await??;
  4. Consider using a channel instead of Arc for progress communication to reduce lock contention.

The overall architecture change is good, but these issues need addressing before merge.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

Findings (ordered by severity)

  1. Potential consensus/security issue: pivot header not validated against requested hash
    In request_pivot_header, the header returned by get_block_header_by_hash is accepted without verifying header.hash() == pivot_hash. A malicious or buggy peer could return a different header and mislead the sync. Please explicitly validate the hash before accepting.
    File: crates/networking/p2p/sync/snap_sync.rs around new request_pivot_header (diff hunk near +122).

  2. Tokio mutex held across .await (deadlock/latency risk)
    In download_headers_background, a block_sync_state.lock().await is held while awaiting on get_latest_canonical_block_hash. This can block other tasks needing the same mutex and lead to stalls.
    File: crates/networking/p2p/sync/snap_sync.rs in download_headers_background (diff hunk near +214).
    Suggestion: extract Store or needed data outside the lock, or store Store separately and avoid locking for this read.

  3. Tokio mutex held across .await in header processing
    block_sync_state.lock().await.process_incoming_headers(...).await holds the lock during an async operation. If process_incoming_headers awaits internally, this can serialize unrelated work and create contention.
    File: crates/networking/p2p/sync/snap_sync.rs near +279.
    Suggestion: refactor process_incoming_headers to take data and return updates, or expose a non-async method for mutation to minimize lock hold time.

Other notes

  • The concurrent header download + state sync is a good direction. The JoinHandle awaiting in snap_sync is correct, but consider logging the error context if the background task fails to improve diagnosability.
  • get_block_header_by_hash is a solid addition; consider verifying the response’s id if possible (even if peers can be noncompliant, matching on the request ID helps catch cross-talk).

Tests / validation

  • I didn’t see new tests in the diff. At least one test covering request_pivot_header hash mismatch handling would reduce regression risk.

If you want, I can propose a small refactor to avoid holding the mutex across awaits and add a defensive hash check.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Now let me compile my review findings:

Code Review: PR #6194 - Overlap header downloads with state sync

Summary

This PR restructures snap sync to download headers in the background concurrently with state sync, removing headers from the critical path. The approach is sound and the implementation is generally well-done. However, there are several issues that need attention.


Critical Issues

1. Potential Race Condition with current_head_number (snap_sync.rs:296-297)

current_head = last_block_hash;
current_head_number = last_block_number;

In download_headers_background(), current_head_number is updated and used in the next iteration's request_block_headers() call. However, there's a logic issue: the function uses current_head_number to fetch headers, but after updating it to last_block_number, there's no guarantee that the next fetch will start from the correct position if headers are non-contiguous.

Recommendation: Verify that request_block_headers(current_head_number, sync_head) correctly handles the case where we're advancing from last_block_number rather than last_block_number + 1.

2. Missing Validation of Pivot Header (snap_sync.rs:124-129)

let pivot_header = request_pivot_header(peers, sync_head).await?;
info!(
    "Obtained pivot header #{} (hash: {:?})",
    pivot_header.number,
    pivot_header.hash()
);

The pivot header fetched by hash is not validated. A malicious peer could return:

  • A header with a different hash than requested
  • A header with incorrect PoW/PoS validation
  • A header that doesn't match the consensus rules

Recommendation: Add validation:

let pivot_header = request_pivot_header(peers, sync_head).await?;
if pivot_header.hash() != sync_head {
    return Err(SyncError::InvalidPivotHeader);
}
// Add additional consensus validation here

3. Inconsistent Error Handling in Background Task (snap_sync.rs:249-251)

if attempts > MAX_HEADER_FETCH_ATTEMPTS {
    warn!("Background header download: failed to find target, aborting");
    return Ok(());  // Returns Ok even on failure!
}

The background task returns Ok(()) when it fails to download headers after max attempts. This is then unwrapped with ?? at snap_sync.rs:642, so the error is silently ignored. The main sync continues without a complete header chain, but then calls store_block_bodies and forkchoice_update which require the full chain.

Recommendation: Return an error instead:

if attempts > MAX_HEADER_FETCH_ATTEMPTS {
    return Err(SyncError::NoBlockHeaders);
}

Major Issues

4. PeerHandler Cloning May Be Expensive (snap_sync.rs:159)

let mut bg_peers = peers.clone();

PeerHandler derives Clone, but cloning it means the background task has its own copy of the peer handler. Since PeerHandler contains a PeerTable, this could lead to:

  • Inconsistent peer state between the foreground and background tasks
  • Both tasks competing for the same peers, potentially causing peer reputation issues
  • Doubled request timeouts if both tasks hit the same failing peer

Recommendation: Consider using Arc<Mutex<PeerHandler>> or a shared peer selection mechanism to coordinate peer usage between tasks.

5. Potential Memory Issue: Unbounded block_hashes Vector (snap_sync.rs:54, 304)

pub struct SnapBlockSyncState {
    pub block_hashes: Vec<H256>,
    store: Store,
}

For mainnet (24.4M headers), block_hashes will grow to ~780MB (24.4M * 32 bytes). This is stored in memory and protected by a Mutex that's frequently locked during background downloads.

Recommendation: Consider paginating or using a more memory-efficient structure, or at least document this memory requirement.

6. Missing Cancellation Mechanism (snap_sync.rs:162-173)

The background task has no way to be cancelled if the main sync encounters an error or needs to abort. If snap_sync() returns early due to an error, the background task continues running indefinitely.

Recommendation: Use a CancellationToken:

let cancel_token = CancellationToken::new();
let header_handle = {
    let token = cancel_token.clone();
    tokio::spawn(async move {
        tokio::select! {
            result = download_headers_background(...) => result,
            _ = token.cancelled() => Ok(()),
        }
    })
};

Minor Issues

7. Duplicate Log Message (snap_sync.rs:309, 643)

The message "Background header download complete" appears twice - once from the background task and once in snap_sync() after awaiting the handle. The second is redundant.

8. Unclear Variable Naming (snap_sync.rs:150-154)

let mut block_sync_state = block_sync_state;
block_sync_state.block_hashes.push(pivot_header.hash());
let block_sync_state = Arc::new(tokio::sync::Mutex::new(block_sync_state));

The repeated shadowing of block_sync_state makes the code harder to follow.

Recommendation:

let mut block_sync_state = block_sync_state;
block_sync_state.block_hashes.push(pivot_header.hash());
let shared_block_sync_state = Arc::new(tokio::sync::Mutex::new(block_sync_state));

9. Missing Error Context in peer_handler.rs (peer_handler.rs:652-654)

if let Some(header) = block_headers.into_iter().next() {
    return Ok(Some(header));
}

When a peer returns an empty block_headers vector for a specific hash request, we silently return Ok(None). This makes debugging difficult.

Recommendation: Add logging:

if let Some(header) = block_headers.into_iter().next() {
    return Ok(Some(header));
} else {
    debug!("Peer returned empty headers for hash {block_hash:?}");
}

10. Potential Off-by-One in Header Download (snap_sync.rs:299-300)

if block_headers.len() > 1 {
    let block_headers_iter = block_headers.into_iter().skip(1);

The code skips the first header assuming it's already stored, but in the background download flow:

  1. We fetch pivot header and store it
  2. We start background download from current_head_number
  3. The first header in the response might not be duplicate

Need to verify that request_block_headers always returns a range starting from a header we already have.


Performance Considerations

11. Lock Contention on block_sync_state (snap_sync.rs:301-305, 757-761)

The Arc<Mutex<SnapBlockSyncState>> is locked in both:

  • Background download task (frequently for process_incoming_headers)
  • Main task (in update_pivot)

This could cause contention if both try to lock simultaneously. Consider using RwLock instead since most operations are reads, or batch the writes to reduce lock frequency.


Code Quality

12. Inconsistent Comment Style (snap_sync.rs:280, 287)

Some comments use // while code immediately follows, others have newlines. Standardize for consistency.

13. Magic Number in Retry Logic (snap_sync.rs:214-216)

tokio::time::sleep(Duration::from_millis(
    1.1_f64.powf(attempts as f64) as u64,
))

The exponential backoff constant 1.1 should be a named constant for clarity and consistency with other retry logic in the codebase.


Testing Recommendations

  1. Test header download failure scenarios: What happens if the background task fails after state sync completes?
  2. Test peer exhaustion: What if all peers fail during background download?
  3. Test memory usage: Verify 24.4M headers don't cause OOM
  4. Test cancellation: Ensure no resource leaks if sync aborts early

Positive Aspects

✅ Clear separation of concerns with dedicated functions
✅ Good documentation explaining the concurrent execution model
✅ Proper error propagation with ?? operator
✅ Maintains backward compatibility with full sync fallback
✅ The performance improvement is significant (15-20 min saved)


Recommendation

Request Changes - The critical issues (#2 and #3) must be fixed before merge. Issue #4 should be carefully evaluated as it could cause peer management problems in production.


Automated review by Claude (Anthropic) · custom prompt

@github-actions
Copy link

github-actions bot commented Feb 12, 2026

Lines of code report

Total lines added: 177
Total lines removed: 0
Total lines changed: 177

Detailed view
+------------------------------------------------+-------+------+
| File                                           | Lines | Diff |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/metrics.rs        | 618   | +2   |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/network.rs        | 629   | +23  |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/peer_handler.rs   | 590   | +44  |
+------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/snap_sync.rs | 1053  | +108 |
+------------------------------------------------+-------+------+

@greptile-apps
Copy link

greptile-apps bot commented Feb 12, 2026

Greptile Overview

Greptile Summary

This PR restructures snap sync to download headers in a background task concurrently with state sync, eliminating 15-20 minutes from the critical path on mainnet.

Key changes:

  • Fetch pivot header directly by hash via new get_block_header_by_hash() method instead of downloading full chain first
  • Spawn background task download_headers_background() that runs concurrently with state sync
  • Share block_sync_state between tasks using Arc<tokio::sync::Mutex<>>
  • Await background task completion before finalization (which needs full header chain)

Critical issue found:

  • download_headers_background() restores checkpoint hash but doesn't update corresponding block number, causing header download to restart from wrong position (lines 238-240)

Confidence Score: 2/5

  • Critical bug in checkpoint restoration will cause incorrect header download behavior
  • The checkpoint restoration bug on lines 238-240 fails to update current_head_number when restoring from checkpoint, causing the background header download to request headers from the wrong starting block. This will likely cause sync failures or inefficiency when resuming interrupted syncs.
  • Pay close attention to crates/networking/p2p/sync/snap_sync.rs lines 238-240 (checkpoint restoration bug)

Important Files Changed

Filename Overview
crates/networking/p2p/sync/snap_sync.rs Refactored to overlap header downloads with state sync using background task and Arc<Mutex<>>. Critical bug: checkpoint restoration doesn't update block number (line 238-240).
crates/networking/p2p/peer_handler.rs Added get_block_header_by_hash method to fetch single header by hash. Clean implementation, follows existing patterns.
crates/networking/p2p/snap/client.rs Updated request_account_range signature to accept Arc<Mutex> for concurrent access. Minimal change, correct.

Sequence Diagram

sequenceDiagram
    participant Main as sync_cycle_snap
    participant Pivot as request_pivot_header
    participant BG as download_headers_background
    participant State as snap_sync (state download)
    participant Store as Store
    participant Peers as PeerHandler

    Main->>Store: get current_head
    Main->>Pivot: Fetch pivot header by hash
    Pivot->>Peers: get_block_header_by_hash(sync_head)
    Peers-->>Pivot: pivot_header
    Pivot-->>Main: pivot_header
    Main->>Store: add_block_headers([pivot_header])
    Main->>Main: Wrap block_sync_state in Arc<Mutex<>>
    
    par Background Header Download
        Main->>BG: tokio::spawn(download_headers_background)
        BG->>Store: get_header_download_checkpoint()
        loop Until sync_head found
            BG->>Peers: request_block_headers(current_head_number, sync_head)
            Peers-->>BG: block_headers batch
            BG->>Store: process_incoming_headers (via lock)
        end
        BG-->>Main: header_handle (awaited later)
    and Concurrent State Sync
        Main->>State: snap_sync(header_download_handle)
        loop State download & healing
            State->>Peers: request_account_range
            State->>Peers: request_storage_ranges
            State->>Peers: request_bytecodes
            Note over State: May call update_pivot which locks block_sync_state
        end
        State->>State: Await header_download_handle
        State->>Store: store_block_bodies
        State->>Store: forkchoice_update
    end
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +238 to +240
if let Some(head) = store.get_header_download_checkpoint().await? {
current_head = head;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_head_number not updated after checkpoint restoration

When the checkpoint is restored, current_head (hash) is updated but current_head_number remains at the original value. This causes request_block_headers(current_head_number, sync_head) on line 246 to restart from the wrong block number.

Suggested change
if let Some(head) = store.get_header_download_checkpoint().await? {
current_head = head;
}
if let Some(head) = store.get_header_download_checkpoint().await? {
current_head = head;
current_head_number = store.get_block_number(head).await?.ok_or(SyncError::BlockNumber(head))?;
}
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/networking/p2p/sync/snap_sync.rs
Line: 238:240

Comment:
`current_head_number` not updated after checkpoint restoration

When the checkpoint is restored, `current_head` (hash) is updated but `current_head_number` remains at the original value. This causes `request_block_headers(current_head_number, sync_head)` on line 246 to restart from the wrong block number.

```suggestion
    if let Some(head) = store.get_header_download_checkpoint().await? {
        current_head = head;
        current_head_number = store.get_block_number(head).await?.ok_or(SyncError::BlockNumber(head))?;
    }
```

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR restructures L1 snap sync to overlap header downloads with state sync, aiming to remove 15–20 minutes of sequential header download time from the critical path (mainnet).

Changes:

  • Fetches the pivot header directly by hash before starting state sync.
  • Spawns a Tokio background task to download the remaining headers concurrently with state sync and awaits it before finalization.
  • Updates snap client/state-sync plumbing to share SnapBlockSyncState via Arc<tokio::sync::Mutex<...>>, and adds a peer API for fetching a single header by hash.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
crates/networking/p2p/sync/snap_sync.rs Refactors snap sync cycle to fetch pivot header upfront, download headers in a background task, and share sync state via Arc<Mutex<...>>.
crates/networking/p2p/snap/client.rs Updates request_account_range signature to accept the shared Arc<Mutex<SnapBlockSyncState>>.
crates/networking/p2p/peer_handler.rs Adds get_block_header_by_hash() to request a single header by hash from a specific peer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +146 to +155
// Store the pivot header and add it to block_sync_state
store
.add_block_headers(vec![pivot_header.clone()])
.await?;
let mut block_sync_state = block_sync_state;
block_sync_state.block_hashes.push(pivot_header.hash());

// Wrap in Arc<Mutex<>> for sharing between background download and state sync
let block_sync_state = Arc::new(tokio::sync::Mutex::new(block_sync_state));

Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

block_hashes is initialized with the pivot hash and then concurrently extended by the background downloader via process_incoming_headers. Since headers are downloaded forward from current_head_number, appending them after the pivot will (a) reorder the chain incorrectly and (b) likely duplicate the pivot header when the background task reaches sync_head. This will break the numbers_and_hashes computation later (duplicate hashes mapped to the wrong block numbers) and can also cause snap sync to select the wrong pivot if the background task appends before snap_sync reads .last(). Consider keeping pivot_hash separate from block_hashes (or ensuring the background download builds block_hashes in canonical order ending with the pivot and never duplicates it).

Copilot uses AI. Check for mistakes.
Comment on lines +231 to +237
let mut current_head = {
let bss = block_sync_state.lock().await;
bss.store
.get_latest_canonical_block_hash()
.await?
.unwrap_or(H256::zero())
};
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_headers_background acquires block_sync_state.lock().await and then performs an async DB call (get_latest_canonical_block_hash().await) while holding the mutex. Holding a Tokio mutex across .await can cause unnecessary contention and can block update_pivot/state sync from making progress. Prefer reading from the store argument (no lock needed) or move the DB await outside the critical section.

Suggested change
let mut current_head = {
let bss = block_sync_state.lock().await;
bss.store
.get_latest_canonical_block_hash()
.await?
.unwrap_or(H256::zero())
};
let mut current_head = store
.get_latest_canonical_block_hash()
.await?
.unwrap_or(H256::zero());

Copilot uses AI. Check for mistakes.
Comment on lines 299 to 306
if block_headers.len() > 1 {
let block_headers_iter = block_headers.into_iter().skip(1);

block_sync_state
.lock()
.await
.process_incoming_headers(block_headers_iter)
.await?;
}
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process_incoming_headers(...).await is executed while holding block_sync_state.lock().await. Since process_incoming_headers awaits on multiple store writes, this mutex can be held for a long time, blocking concurrent pivot updates and any reads of block_hashes. Consider restructuring so the mutex is only held for in-memory updates (e.g., compute headers/hashes first, write headers/checkpoint to store outside the lock, then briefly lock to update block_hashes).

Copilot uses AI. Check for mistakes.
Comment on lines 329 to 336
let pivot_hash = block_sync_state
.lock()
.await
.block_hashes
.last()
.copied()
.ok_or(SyncError::NoBlockHeaders)?;
let mut pivot_header = store
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

snap_sync derives pivot_hash from block_sync_state.block_hashes.last(), but block_hashes is mutated concurrently by the background header download task. Depending on timing, .last() can become an intermediate header rather than sync_head, causing state sync to target the wrong pivot. Use a stable pivot identifier (e.g., pass sync_head/pivot_hash explicitly into snap_sync, or store the pivot hash in a dedicated field that the background task does not modify).

Copilot uses AI. Check for mistakes.
let bss = block_sync_state.clone();
let mut bg_peers = peers.clone();
let bg_store = store.clone();
let pending_block = store.get_pending_block(sync_head).await.ok().flatten();
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

store.get_pending_block(sync_head).await.ok().flatten() drops any storage error and silently treats it as “no pending block”. If the DB read fails, header download will proceed with potentially inconsistent state and the root cause is lost. Propagate the error (or at least log it and fail the sync cycle) instead of swallowing it.

Suggested change
let pending_block = store.get_pending_block(sync_head).await.ok().flatten();
let pending_block = store.get_pending_block(sync_head).await?;

Copilot uses AI. Check for mistakes.
Comment on lines +156 to +176
// Spawn background header download — runs concurrently with state sync
let header_handle = {
let bss = block_sync_state.clone();
let mut bg_peers = peers.clone();
let bg_store = store.clone();
let pending_block = store.get_pending_block(sync_head).await.ok().flatten();
tokio::spawn(async move {
download_headers_background(
&mut bg_peers,
&bg_store,
&bss,
current_head_number,
sync_head,
pending_block.map(|b| b.header),
)
.await
})
};

let mut attempts = 0;
info!("Background header download started — proceeding with state sync");
snap_sync(peers, &store, &block_sync_state, datadir, header_handle).await?;
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If snap_sync(...) returns early with an error before awaiting header_handle, the spawned background header task will keep running and mutating the DB/checkpoints even though the sync cycle failed. Consider adding a cancellation mechanism (e.g., CancellationToken passed into download_headers_background) and aborting/joining the task on error, so failed sync attempts don't leave a stray downloader running.

Copilot uses AI. Check for mistakes.
if attempts > MAX_HEADER_FETCH_ATTEMPTS {
warn!("Sync failed to find target block header, aborting");
warn!("Background header download: failed to find target, aborting");
return Ok(());
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When header fetching exceeds MAX_HEADER_FETCH_ATTEMPTS, the background task logs a warning and returns Ok(()). Since snap_sync treats the JoinHandle result as success, finalization may run without a complete header chain. This should return an error (e.g., SyncError::NoBlockHeaders) so the sync cycle fails fast instead of silently proceeding in an invalid state.

Suggested change
return Ok(());
return Err(SyncError::NoBlockHeaders);

Copilot uses AI. Check for mistakes.
Comment on lines 271 to 277
if first_block_hash == last_block_hash
&& first_block_hash == current_head
&& current_head != sync_head
{
// There is no path to the sync head this goes back until it find a common ancerstor
warn!("Sync failed to find target block header, going back to the previous parent");
warn!("Background header download: going back to previous parent");
current_head = first_block_parent_hash;
continue;
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the “no path to sync head” fallback, current_head is set to first_block_parent_hash but current_head_number is not updated. Since the next request is driven by current_head_number (not the hash), this fallback may not actually move the search window and can lead to repeated requests/loops. Use first_block_number to decrement current_head_number (and keep hash/number consistent), or refactor to drive the request by hash instead of number for this recovery path.

Copilot uses AI. Check for mistakes.
Comment on lines +652 to +653
if let Some(header) = block_headers.into_iter().next() {
return Ok(Some(header));
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_block_header_by_hash returns the first header the peer sends without verifying it matches block_hash. A malicious/buggy peer could return a different header, and snap sync would accept it as the pivot. Validate header.hash() == block_hash (and ideally that exactly one header was returned) before returning Some(header), otherwise treat it as a failure.

Suggested change
if let Some(header) = block_headers.into_iter().next() {
return Ok(Some(header));
if block_headers.len() == 1 {
let header = block_headers.into_iter().next().expect("len checked above");
if header.hash() == block_hash {
return Ok(Some(header));
} else {
debug!(
"Received block header with mismatching hash from peer {peer_id:?}: \
expected {block_hash:?}, got {:?}",
header.hash()
);
}
} else {
debug!(
"Expected exactly one block header from peer {peer_id:?} for hash \
{block_hash:?}, but received {} headers",
block_headers.len()
);

Copilot uses AI. Check for mistakes.
Headers now download in background concurrently with state sync, so the
old sequential Phase 1/8 model is wrong. Changes:
- Remove phase numbers from display (phases can repeat due to healing loops)
- Add headers_downloading AtomicBool flag to track background download state
- Show compact one-liner header progress inline with each state sync update
- Log header completion once when background download finishes
- Split download_headers_background into wrapper + inner to ensure the flag
  is always cleared even on error paths
@github-actions
Copy link

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 63.330 ± 0.298 63.034 64.080 1.00
head 63.530 ± 0.255 63.231 63.958 1.00 ± 0.01

.add_block_headers(vec![pivot_header.clone()])
.await?;
let mut block_sync_state = block_sync_state;
block_sync_state.block_hashes.push(pivot_header.hash());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pivot hash is pushed here before spawning the background download. But download_headers_background_inner will also push it when it reaches sync_head — the process_incoming_headers call at line 333 extends block_hashes with all received headers including the sync_head.

This means the pivot hash appears twice in block_hashes. The forkchoice_update at line 684 computes pivot_header.number - i for each entry, so the duplicate shifts all preceding block numbers by one — every header before the pivot gets mapped to the wrong block number.

Fix: either skip pushing here and let the background task handle it, or add a guard in process_incoming_headers / the background loop to skip the sync_head hash if it's already present.

let mut current_head = {
let bss = block_sync_state.lock().await;
bss.store
.get_latest_canonical_block_hash()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_head (hash) is re-derived here from get_latest_canonical_block_hash() + checkpoint, but current_head_number was computed by the caller from a different current_head value (the one from block_sync_state.get_current_head()). Between those two reads, sync_cycle_snap stored the pivot header via add_block_headers, which could change what get_latest_canonical_block_hash() returns.

If the re-derived current_head hash doesn't match current_head_number, the first request_block_headers(current_head_number, sync_head) call starts from the wrong position.

Consider passing both current_head and current_head_number from the caller instead of re-deriving the hash.

// and forkchoice_update, which need the full header chain.
info!("Waiting for background header download to complete...");
header_download_handle.await??;
info!("Background header download complete");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: This duplicates the log at line 338 inside download_headers_background_inner. Since header_download_handle.await?? only succeeds when the background task completed successfully (which already logged "complete"), this second message is redundant. Consider removing one of them or changing this to a debug-level log like debug!("Header download handle joined successfully").

@github-project-automation github-project-automation bot moved this to In Progress in ethrex_l1 Feb 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: In Progress
Status: Todo

Development

Successfully merging this pull request may close these issues.

2 participants