[stable2506] Backport #11330#11356
Merged
EgorPopelyaev merged 1 commit intostable2506from Mar 13, 2026
Merged
Conversation
… state (#11330) This PR skips the execution of blocks when they are propagated to importing via `StateAction::Skip`. There is a bug in the import queue that is affecting collators, which is that they should not execute blocks for non-archive collators that are part of Gap Sync. The bug has surfaced by changing the `import_existing` from false to true in: - #10373 ### Issue The issue manifests for collators that have an unfilled block gap in their DB. During restarting with #10373, a collator would try the following: - client info has detected a gap at block 5800 with length 1 - collator [X] requests the block 5800 with `fields: HEADER | BODY | JUSTIFICATION, from: Number(5800)` - the other 2 collators respond with the full block, including the body, because by default collators will keep around the canonical chain but discard the block state - collator [X] tries to import the block because `import_existing` is true and we continue execution after the following check: https://github.com/paritytech/polkadot-sdk/blob/2b9576c163b1c2408291e2b6c98ae0f2465b4818/substrate/client/service/src/client/client.rs#L1809-L1812 - Before the changes, the code returned `return Ok(ImportResult::AlreadyInChain)` which short-circuited the importing of the block - collator [X] imports the block but fails with `State already discarded` - the error is propagated back to the sync engine that decides to restart the sync process with the same block gap `Restarting sync with client ...` - This results in a vicious cycle where the collator [X] requests the same block again, then restarts the sync engine - Eventually at the 3 request the other collators will notice that this behavior is malicious and ban and disconnect the peers. ### Fix The fix is to skip executing blocks when the gap sync has marked blocks as `StateAction::Skip`. Please note we are still dealing with the following, which should be part of a different PR: - Gap Sync was never closed from the database - When the node starts with a block gap, the node will always initiate a block request over the sync protocol to close the gap - Before the gap was marked as `import_existing: false` which short ciruited the circuit and returned `AlreadyInChain` - Effectively nodes would re-request the gap on reboot wasting networking bandwidth to close the gap "in memory" only, but this was never commited to the DB ### Full Logs ```rust 2026-03-10 13:43:41.138 DEBUG main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1, block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) } 2026-03-10 13:43:41.138 DEBUG main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None) 2026-03-10 13:43:41.138 TRACE main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102) 2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362) BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) } 2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks (5800) (origin: GapSync) 2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc…1cda) has 4 logs (origin: GapSync) 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91 2026-03-10 13:45:17.792 WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1, block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) } 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800))) 2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102) ``` ### Testing Done - unblocks kusama yap 3392: https://grafana.teleport.parity.io/goto/KBKfuhKDR?orgId=1 - left side of the graph is origin/master, right side is the patch applied with connected peers Closes: - #11299 --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit 3c93291)
Contributor
|
This pull request is amending an existing release. Please proceed with extreme caution,
Emergency Bypass
If you really need to bypass this check: add |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Backport #11330 into
stable2506from lexnv.See the documentation on how to use this bot.