Skip to content

gap_sync/fix: Close gap and peer banning after warp sync on parachains#11309

Closed
lexnv wants to merge 4 commits intomasterfrom
lexnv/unblock-gap-sync
Closed

gap_sync/fix: Close gap and peer banning after warp sync on parachains#11309
lexnv wants to merge 4 commits intomasterfrom
lexnv/unblock-gap-sync

Conversation

@lexnv
Copy link
Contributor

@lexnv lexnv commented Mar 9, 2026

This PR fixes an issue with parachain collators stuck in a ban loop after starting with warp sync. Effectively having 0 connected peers.

The root causes are

  • warp sync now imports header only blocks before the gap is created
  • DB never updates the gap

Warp Sync

Gap sync fails to advance, causing the sync engine to request the same block multiple times. Considering that the gap is old, no other collator can provide a response and close it. The downstream effect is that after 3 identical requests, the peer gets banned.

For non-archive nodes (ie block_data.block.body.is_none()), the gap sync requests blocks without bodies.
When a warp synced block is provided, the block is already in the chain. That causes the import to return AlreadyInChain before having a chance to advance the gap start.
Then, because the gap start is never advanced, the request is duplicated to peers, causing a banning loop and disconnects.

DB never closes the gap

The DB gap stalls across node restarts: the DB gap is never updated. Even when a warp-sync is skipped by advancing now the gap's best queued number, the change is not reflected in the DB.

  • The next block is imported at number == gap sttart + 1
  • DB only handles exact match at number == gap start

To fix this, a while loop is added on the MissingHeaderAndBody case, that closes the gap if the block's headers were already imported.

The issue has surfaced after introducing the following optimization:

Closes:

lexnv added 4 commits March 9, 2026 10:28
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv requested review from lrubasze and skunert March 9, 2026 16:38
@skunert
Copy link
Contributor

skunert commented Mar 9, 2026

Can you explain a bit more clearly what exactly leads to this bug?
Is it only for nodes with existing DB or also fresh ones?

I remember reviewing @lrubasze PR and thought this case was handled where gap sync encounters a sparse warp block 🤔

Copy link
Member

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires a regression test.

@bkchr
Copy link
Member

bkchr commented Mar 10, 2026

Can you explain a bit more clearly what exactly leads to this bug?
Is it only for nodes with existing DB or also fresh ones?

My AI assistant told me that this gap already existed before, using the old nodes. I did not yet had verified it. So, there was maybe a bug before, that triggers now something using the latest node. Clearly we need a test that exactly reproduces the problem.

@skunert
Copy link
Contributor

skunert commented Mar 10, 2026

My main point of confusion is whether this bug triggers for freshly warp-synced nodes. If it only triggers when gap sync is in progress and you upgrade mid-sync, then ignoring this is also an option.

@lexnv
Copy link
Contributor Author

lexnv commented Mar 10, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants