Skip to content

[stable2603] Backport #11330#11359

Merged
EgorPopelyaev merged 2 commits intostable2603from
backport-11330-to-stable2603
Mar 12, 2026
Merged

[stable2603] Backport #11330#11359
EgorPopelyaev merged 2 commits intostable2603from
backport-11330-to-stable2603

Conversation

@paritytech-release-backport-bot

Backport #11330 into stable2603 from lexnv.

See the documentation on how to use this bot.

… state (#11330)

This PR skips the execution of blocks when they are propagated to
importing via `StateAction::Skip`.

There is a bug in the import queue that is affecting collators, which is
that they should not execute blocks for non-archive collators that are
part of Gap Sync.

The bug has surfaced by changing the `import_existing` from false to
true in:
- #10373

### Issue

The issue manifests for collators that have an unfilled block gap in
their DB.

During restarting with #10373, a collator would try the following:
- client info has detected a gap at block 5800 with length 1
- collator [X] requests the block 5800 with `fields: HEADER | BODY |
JUSTIFICATION, from: Number(5800)`
- the other 2 collators respond with the full block, including the body,
because by default collators will keep around the canonical chain but
discard the block state
- collator [X] tries to import the block because `import_existing` is
true and we continue execution after the following check:

https://github.com/paritytech/polkadot-sdk/blob/2b9576c163b1c2408291e2b6c98ae0f2465b4818/substrate/client/service/src/client/client.rs#L1809-L1812

- Before the changes, the code returned `return
Ok(ImportResult::AlreadyInChain)` which short-circuited the importing of
the block

- collator [X] imports the block but fails with `State already
discarded`
- the error is propagated back to the sync engine that decides to
restart the sync process with the same block gap `Restarting sync with
client ...`
- This results in a vicious cycle where the collator [X] requests the
same block again, then restarts the sync engine
- Eventually at the 3 request the other collators will notice that this
behavior is malicious and ban and disconnect the peers.

### Fix

The fix is to skip executing blocks when the gap sync has marked blocks
as `StateAction::Skip`.

Please note we are still dealing with the following, which should be
part of a different PR:
- Gap Sync was never closed from the database
- When the node starts with a block gap, the node will always initiate a
block request over the sync protocol to close the gap
- Before the gap was marked as `import_existing: false` which short
ciruited the circuit and returned `AlreadyInChain`
- Effectively nodes would re-request the gap on reboot wasting
networking bandwidth to close the gap "in memory" only, but this was
never commited to the DB

### Full Logs

```rust
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1,
	block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None)
2026-03-10 13:43:41.138 TRACE                 main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)

2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362)
	BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) }

2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks  (5800) (origin: GapSync)
2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc…1cda) has 4 logs (origin: GapSync)
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda:
	Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91

2026-03-10 13:45:17.792  WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1,
		block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }

2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800)))
2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)
```

### Testing Done

- unblocks kusama yap 3392:
https://grafana.teleport.parity.io/goto/KBKfuhKDR?orgId=1
- left side of the graph is origin/master, right side is the patch
applied with connected peers

Closes:
- #11299

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 3c93291)
@github-actions
Copy link
Contributor

This pull request is amending an existing release. Please proceed with extreme caution,
as to not impact downstream teams that rely on the stability of it. Some things to consider:

  • Backports are only for 'patch' or 'minor' changes. No 'major' or other breaking change.
  • Should be a legit fix for some bug, not adding tons of new features.
  • Must either be already audited or not need an audit.
Emergency Bypass

If you really need to bypass this check: add validate: false to each crate
in the Prdoc where a breaking change is introduced. This will release a new major
version of that crate and all its reverse dependencies and basically break the release.

@EgorPopelyaev EgorPopelyaev enabled auto-merge (squash) March 12, 2026 14:01
@EgorPopelyaev EgorPopelyaev merged commit 62f4364 into stable2603 Mar 12, 2026
237 of 247 checks passed
@EgorPopelyaev EgorPopelyaev deleted the backport-11330-to-stable2603 branch March 12, 2026 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A3-backport Pull request is already reviewed well in another branch.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants