aura/import: Skip block execution when collators have no parent block state#11330
Merged
aura/import: Skip block execution when collators have no parent block state#11330
Conversation
… state Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Contributor
Author
|
/cmd prdoc --audience node_dev --bump patch |
…e_dev --bump patch'
This was referenced Mar 10, 2026
…lexnv/skip-loop-execution
bkchr
reviewed
Mar 10, 2026
Comment on lines
+1602
to
+1603
| /// Regression test: gap sync of a single block that fails to import should not | ||
| /// cause an infinite restart loop. |
Member
There was a problem hiding this comment.
We don't have anything fixed in the sync code?
Contributor
Author
There was a problem hiding this comment.
This PR will close the sync gap in memory, while this #11332 will close it at the DB level.
Believe tha together they should function similarly to the older PR while keeping the code changes to a minimum:
lrubasze
reviewed
Mar 11, 2026
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
bkchr
approved these changes
Mar 11, 2026
| // The following states are ignored: | ||
| // - `StateAction::ApplyChanges`: means that the node produced the block itself or the | ||
| // block was imported via state sync. | ||
| // - `StateAction::Skip`: means that the block should be skipped. The is evident in the |
paritytech-release-backport-bot bot
pushed a commit
that referenced
this pull request
Mar 12, 2026
… state (#11330) This PR skips the execution of blocks when they are propagated to importing via `StateAction::Skip`. There is a bug in the import queue that is affecting collators, which is that they should not execute blocks for non-archive collators that are part of Gap Sync. The bug has surfaced by changing the `import_existing` from false to true in: - #10373 ### Issue The issue manifests for collators that have an unfilled block gap in their DB. During restarting with #10373, a collator would try the following: - client info has detected a gap at block 5800 with length 1 - collator [X] requests the block 5800 with `fields: HEADER | BODY | JUSTIFICATION, from: Number(5800)` - the other 2 collators respond with the full block, including the body, because by default collators will keep around the canonical chain but discard the block state - collator [X] tries to import the block because `import_existing` is true and we continue execution after the following check: https://github.com/paritytech/polkadot-sdk/blob/2b9576c163b1c2408291e2b6c98ae0f2465b4818/substrate/client/service/src/client/client.rs#L1809-L1812 - Before the changes, the code returned `return Ok(ImportResult::AlreadyInChain)` which short-circuited the importing of the block - collator [X] imports the block but fails with `State already discarded` - the error is propagated back to the sync engine that decides to restart the sync process with the same block gap `Restarting sync with client ...` - This results in a vicious cycle where the collator [X] requests the same block again, then restarts the sync engine - Eventually at the 3 request the other collators will notice that this behavior is malicious and ban and disconnect the peers. ### Fix The fix is to skip executing blocks when the gap sync has marked blocks as `StateAction::Skip`. Please note we are still dealing with the following, which should be part of a different PR: - Gap Sync was never closed from the database - When the node starts with a block gap, the node will always initiate a block request over the sync protocol to close the gap - Before the gap was marked as `import_existing: false` which short ciruited the circuit and returned `AlreadyInChain` - Effectively nodes would re-request the gap on reboot wasting networking bandwidth to close the gap "in memory" only, but this was never commited to the DB ### Full Logs ```rust 2026-03-10 13:43:41.138 DEBUG main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1, block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) } 2026-03-10 13:43:41.138 DEBUG main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None) 2026-03-10 13:43:41.138 TRACE main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102) 2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362) BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) } 2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks (5800) (origin: GapSync) 2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc…1cda) has 4 logs (origin: GapSync) 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91 2026-03-10 13:45:17.792 WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1, block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) } 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800))) 2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102) ``` ### Testing Done - unblocks kusama yap 3392: https://grafana.teleport.parity.io/goto/KBKfuhKDR?orgId=1 - left side of the graph is origin/master, right side is the patch applied with connected peers Closes: - #11299 --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit 3c93291)
|
Successfully created backport PR for |
paritytech-release-backport-bot bot
pushed a commit
that referenced
this pull request
Mar 12, 2026
… state (#11330) This PR skips the execution of blocks when they are propagated to importing via `StateAction::Skip`. There is a bug in the import queue that is affecting collators, which is that they should not execute blocks for non-archive collators that are part of Gap Sync. The bug has surfaced by changing the `import_existing` from false to true in: - #10373 ### Issue The issue manifests for collators that have an unfilled block gap in their DB. During restarting with #10373, a collator would try the following: - client info has detected a gap at block 5800 with length 1 - collator [X] requests the block 5800 with `fields: HEADER | BODY | JUSTIFICATION, from: Number(5800)` - the other 2 collators respond with the full block, including the body, because by default collators will keep around the canonical chain but discard the block state - collator [X] tries to import the block because `import_existing` is true and we continue execution after the following check: https://github.com/paritytech/polkadot-sdk/blob/2b9576c163b1c2408291e2b6c98ae0f2465b4818/substrate/client/service/src/client/client.rs#L1809-L1812 - Before the changes, the code returned `return Ok(ImportResult::AlreadyInChain)` which short-circuited the importing of the block - collator [X] imports the block but fails with `State already discarded` - the error is propagated back to the sync engine that decides to restart the sync process with the same block gap `Restarting sync with client ...` - This results in a vicious cycle where the collator [X] requests the same block again, then restarts the sync engine - Eventually at the 3 request the other collators will notice that this behavior is malicious and ban and disconnect the peers. ### Fix The fix is to skip executing blocks when the gap sync has marked blocks as `StateAction::Skip`. Please note we are still dealing with the following, which should be part of a different PR: - Gap Sync was never closed from the database - When the node starts with a block gap, the node will always initiate a block request over the sync protocol to close the gap - Before the gap was marked as `import_existing: false` which short ciruited the circuit and returned `AlreadyInChain` - Effectively nodes would re-request the gap on reboot wasting networking bandwidth to close the gap "in memory" only, but this was never commited to the DB ### Full Logs ```rust 2026-03-10 13:43:41.138 DEBUG main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1, block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) } 2026-03-10 13:43:41.138 DEBUG main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None) 2026-03-10 13:43:41.138 TRACE main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102) 2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362) BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) } 2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks (5800) (origin: GapSync) 2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc…1cda) has 4 logs (origin: GapSync) 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91 2026-03-10 13:45:17.792 WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1, block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) } 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800))) 2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102) ``` ### Testing Done - unblocks kusama yap 3392: https://grafana.teleport.parity.io/goto/KBKfuhKDR?orgId=1 - left side of the graph is origin/master, right side is the patch applied with connected peers Closes: - #11299 --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit 3c93291)
|
Successfully created backport PR for |
paritytech-release-backport-bot bot
pushed a commit
that referenced
this pull request
Mar 12, 2026
… state (#11330) This PR skips the execution of blocks when they are propagated to importing via `StateAction::Skip`. There is a bug in the import queue that is affecting collators, which is that they should not execute blocks for non-archive collators that are part of Gap Sync. The bug has surfaced by changing the `import_existing` from false to true in: - #10373 ### Issue The issue manifests for collators that have an unfilled block gap in their DB. During restarting with #10373, a collator would try the following: - client info has detected a gap at block 5800 with length 1 - collator [X] requests the block 5800 with `fields: HEADER | BODY | JUSTIFICATION, from: Number(5800)` - the other 2 collators respond with the full block, including the body, because by default collators will keep around the canonical chain but discard the block state - collator [X] tries to import the block because `import_existing` is true and we continue execution after the following check: https://github.com/paritytech/polkadot-sdk/blob/2b9576c163b1c2408291e2b6c98ae0f2465b4818/substrate/client/service/src/client/client.rs#L1809-L1812 - Before the changes, the code returned `return Ok(ImportResult::AlreadyInChain)` which short-circuited the importing of the block - collator [X] imports the block but fails with `State already discarded` - the error is propagated back to the sync engine that decides to restart the sync process with the same block gap `Restarting sync with client ...` - This results in a vicious cycle where the collator [X] requests the same block again, then restarts the sync engine - Eventually at the 3 request the other collators will notice that this behavior is malicious and ban and disconnect the peers. ### Fix The fix is to skip executing blocks when the gap sync has marked blocks as `StateAction::Skip`. Please note we are still dealing with the following, which should be part of a different PR: - Gap Sync was never closed from the database - When the node starts with a block gap, the node will always initiate a block request over the sync protocol to close the gap - Before the gap was marked as `import_existing: false` which short ciruited the circuit and returned `AlreadyInChain` - Effectively nodes would re-request the gap on reboot wasting networking bandwidth to close the gap "in memory" only, but this was never commited to the DB ### Full Logs ```rust 2026-03-10 13:43:41.138 DEBUG main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1, block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) } 2026-03-10 13:43:41.138 DEBUG main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None) 2026-03-10 13:43:41.138 TRACE main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102) 2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362) BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) } 2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks (5800) (origin: GapSync) 2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc…1cda) has 4 logs (origin: GapSync) 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91 2026-03-10 13:45:17.792 WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1, block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) } 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800))) 2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102) ``` ### Testing Done - unblocks kusama yap 3392: https://grafana.teleport.parity.io/goto/KBKfuhKDR?orgId=1 - left side of the graph is origin/master, right side is the patch applied with connected peers Closes: - #11299 --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit 3c93291)
|
Successfully created backport PR for |
paritytech-release-backport-bot bot
pushed a commit
that referenced
this pull request
Mar 12, 2026
… state (#11330) This PR skips the execution of blocks when they are propagated to importing via `StateAction::Skip`. There is a bug in the import queue that is affecting collators, which is that they should not execute blocks for non-archive collators that are part of Gap Sync. The bug has surfaced by changing the `import_existing` from false to true in: - #10373 ### Issue The issue manifests for collators that have an unfilled block gap in their DB. During restarting with #10373, a collator would try the following: - client info has detected a gap at block 5800 with length 1 - collator [X] requests the block 5800 with `fields: HEADER | BODY | JUSTIFICATION, from: Number(5800)` - the other 2 collators respond with the full block, including the body, because by default collators will keep around the canonical chain but discard the block state - collator [X] tries to import the block because `import_existing` is true and we continue execution after the following check: https://github.com/paritytech/polkadot-sdk/blob/2b9576c163b1c2408291e2b6c98ae0f2465b4818/substrate/client/service/src/client/client.rs#L1809-L1812 - Before the changes, the code returned `return Ok(ImportResult::AlreadyInChain)` which short-circuited the importing of the block - collator [X] imports the block but fails with `State already discarded` - the error is propagated back to the sync engine that decides to restart the sync process with the same block gap `Restarting sync with client ...` - This results in a vicious cycle where the collator [X] requests the same block again, then restarts the sync engine - Eventually at the 3 request the other collators will notice that this behavior is malicious and ban and disconnect the peers. ### Fix The fix is to skip executing blocks when the gap sync has marked blocks as `StateAction::Skip`. Please note we are still dealing with the following, which should be part of a different PR: - Gap Sync was never closed from the database - When the node starts with a block gap, the node will always initiate a block request over the sync protocol to close the gap - Before the gap was marked as `import_existing: false` which short ciruited the circuit and returned `AlreadyInChain` - Effectively nodes would re-request the gap on reboot wasting networking bandwidth to close the gap "in memory" only, but this was never commited to the DB ### Full Logs ```rust 2026-03-10 13:43:41.138 DEBUG main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1, block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) } 2026-03-10 13:43:41.138 DEBUG main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None) 2026-03-10 13:43:41.138 TRACE main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102) 2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362) BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) } 2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks (5800) (origin: GapSync) 2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc…1cda) has 4 logs (origin: GapSync) 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91 2026-03-10 13:45:17.792 WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1, block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) } 2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800))) 2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102) ``` ### Testing Done - unblocks kusama yap 3392: https://grafana.teleport.parity.io/goto/KBKfuhKDR?orgId=1 - left side of the graph is origin/master, right side is the patch applied with connected peers Closes: - #11299 --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit 3c93291)
|
Successfully created backport PR for |
EgorPopelyaev
pushed a commit
that referenced
this pull request
Mar 12, 2026
Backport #11330 into `stable2509` from lexnv. See the [documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md) on how to use this bot. <!-- # To be used by other automation, do not modify: original-pr-number: #${pull_number} --> Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
EgorPopelyaev
pushed a commit
that referenced
this pull request
Mar 12, 2026
Backport #11330 into `stable2603` from lexnv. See the [documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md) on how to use this bot. <!-- # To be used by other automation, do not modify: original-pr-number: #${pull_number} --> Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
EgorPopelyaev
pushed a commit
that referenced
this pull request
Mar 13, 2026
Backport #11330 into `stable2506` from lexnv. See the [documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md) on how to use this bot. <!-- # To be used by other automation, do not modify: original-pr-number: #${pull_number} --> Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
EgorPopelyaev
pushed a commit
that referenced
this pull request
Mar 13, 2026
Backport #11330 into `stable2512` from lexnv. See the [documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md) on how to use this bot. <!-- # To be used by other automation, do not modify: original-pr-number: #${pull_number} --> Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Mar 19, 2026
This PR closes missing body gaps in the database for non-archive nodes. Effectively, a missing body gap cannot be closed on the DB side if the node is non-archive. Since execution is already skipped, the node will close the memory gap in the sync engine; however, the gap remains open in the db. This leads to wasting resources at every startup: - client info contains a gap that cannot be filled (since we don't have the state around for execution) - blocks are fetched from the connected peers - gap is filled by ignoring blocks in the sync engine Further, for collators on origin master this causes an infinite loop of sync engine restarts that get punished via banning and disconnecting. For more details and root cause check: - #11330 Part of: - #11299 --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
paritytech-release-backport-bot bot
pushed a commit
that referenced
this pull request
Mar 20, 2026
This PR closes missing body gaps in the database for non-archive nodes. Effectively, a missing body gap cannot be closed on the DB side if the node is non-archive. Since execution is already skipped, the node will close the memory gap in the sync engine; however, the gap remains open in the db. This leads to wasting resources at every startup: - client info contains a gap that cannot be filled (since we don't have the state around for execution) - blocks are fetched from the connected peers - gap is filled by ignoring blocks in the sync engine Further, for collators on origin master this causes an infinite loop of sync engine restarts that get punished via banning and disconnecting. For more details and root cause check: - #11330 Part of: - #11299 --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> (cherry picked from commit 0f64cfc)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR skips the execution of blocks when they are propagated to importing via
StateAction::Skip.There is a bug in the import queue that is affecting collators, which is that they should not execute blocks for non-archive collators that are part of Gap Sync.
The bug has surfaced by changing the
import_existingfrom false to true in:Issue
The issue manifests for collators that have an unfilled block gap in their DB.
During restarting with #10373, a collator would try the following:
fields: HEADER | BODY | JUSTIFICATION, from: Number(5800)import_existingis true and we continue execution after the following check:polkadot-sdk/substrate/client/service/src/client/client.rs
Lines 1809 to 1812 in 2b9576c
Before the changes, the code returned
return Ok(ImportResult::AlreadyInChain)which short-circuited the importing of the blockcollator [X] imports the block but fails with
State already discardedthe error is propagated back to the sync engine that decides to restart the sync process with the same block gap
Restarting sync with client ...This results in a vicious cycle where the collator [X] requests the same block again, then restarts the sync engine
Eventually at the 3 request the other collators will notice that this behavior is malicious and ban and disconnect the peers.
Fix
The fix is to skip executing blocks when the gap sync has marked blocks as
StateAction::Skip.Please note we are still dealing with the following, which should be part of a different PR:
import_existing: falsewhich short ciruited the circuit and returnedAlreadyInChainFull Logs
Testing Done
Closes: