Skip to content

aura/import: Skip block execution when collators have no parent block state#11330

Merged
lexnv merged 9 commits intomasterfrom
lexnv/skip-loop-execution
Mar 12, 2026
Merged

aura/import: Skip block execution when collators have no parent block state#11330
lexnv merged 9 commits intomasterfrom
lexnv/skip-loop-execution

Conversation

@lexnv
Copy link
Contributor

@lexnv lexnv commented Mar 10, 2026

This PR skips the execution of blocks when they are propagated to importing via StateAction::Skip.

There is a bug in the import queue that is affecting collators, which is that they should not execute blocks for non-archive collators that are part of Gap Sync.

The bug has surfaced by changing the import_existing from false to true in:

Issue

The issue manifests for collators that have an unfilled block gap in their DB.

During restarting with #10373, a collator would try the following:

  • client info has detected a gap at block 5800 with length 1
  • collator [X] requests the block 5800 with fields: HEADER | BODY | JUSTIFICATION, from: Number(5800)
  • the other 2 collators respond with the full block, including the body, because by default collators will keep around the canonical chain but discard the block state
  • collator [X] tries to import the block because import_existing is true and we continue execution after the following check:

BlockStatus::InChainPruned if !import_existing => {
return Ok(ImportResult::AlreadyInChain)
},
BlockStatus::InChainPruned => {},

  • Before the changes, the code returned return Ok(ImportResult::AlreadyInChain) which short-circuited the importing of the block

  • collator [X] imports the block but fails with State already discarded

  • the error is propagated back to the sync engine that decides to restart the sync process with the same block gap Restarting sync with client ...

  • This results in a vicious cycle where the collator [X] requests the same block again, then restarts the sync engine

  • Eventually at the 3 request the other collators will notice that this behavior is malicious and ban and disconnect the peers.

Fix

The fix is to skip executing blocks when the gap sync has marked blocks as StateAction::Skip.

Please note we are still dealing with the following, which should be part of a different PR:

  • Gap Sync was never closed from the database
  • When the node starts with a block gap, the node will always initiate a block request over the sync protocol to close the gap
  • Before the gap was marked as import_existing: false which short ciruited the circuit and returned AlreadyInChain
  • Effectively nodes would re-request the gap on reboot wasting networking bandwidth to close the gap "in memory" only, but this was never commited to the DB

Full Logs

2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1,
	block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None)    
2026-03-10 13:43:41.138 TRACE                 main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)


2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362)
	BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) } 

2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks  (5800) (origin: GapSync)    
2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc1cda) has 4 logs (origin: GapSync)    
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda:
	Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91

2026-03-10 13:45:17.792  WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91    
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1,
		block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }

	
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800)))    
2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)    

Testing Done

Closes:

… state

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv self-assigned this Mar 10, 2026
@lexnv lexnv added the T0-node This PR/Issue is related to the topic “node”. label Mar 10, 2026
@lexnv
Copy link
Contributor Author

lexnv commented Mar 10, 2026

/cmd prdoc --audience node_dev --bump patch

lexnv added 2 commits March 10, 2026 15:19
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@paritytech-review-bot paritytech-review-bot bot requested a review from a team March 10, 2026 15:20
Comment on lines +1602 to +1603
/// Regression test: gap sync of a single block that fails to import should not
/// cause an infinite restart loop.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have anything fixed in the sync code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR will close the sync gap in memory, while this #11332 will close it at the DB level.

Believe tha together they should function similarly to the older PR while keeping the code changes to a minimum:

lexnv added 4 commits March 11, 2026 15:11
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Copy link
Contributor

@lrubasze lrubasze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

// The following states are ignored:
// - `StateAction::ApplyChanges`: means that the node produced the block itself or the
// block was imported via state sync.
// - `StateAction::Skip`: means that the block should be skipped. The is evident in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"The is evident"?

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv added this pull request to the merge queue Mar 12, 2026
Merged via the queue into master with commit 3c93291 Mar 12, 2026
221 of 227 checks passed
@lexnv lexnv deleted the lexnv/skip-loop-execution branch March 12, 2026 12:28
@lexnv lexnv added A4-backport-stable2506 Pull request must be backported to the stable2506 release branch A4-backport-stable2509 Pull request must be backported to the stable2509 release branch A4-backport-stable2512 Pull request must be backported to the stable2512 release branch A4-backport-stable2603 Pull request must be backported to the stable2603 release branch labels Mar 12, 2026
paritytech-release-backport-bot bot pushed a commit that referenced this pull request Mar 12, 2026
… state (#11330)

This PR skips the execution of blocks when they are propagated to
importing via `StateAction::Skip`.

There is a bug in the import queue that is affecting collators, which is
that they should not execute blocks for non-archive collators that are
part of Gap Sync.

The bug has surfaced by changing the `import_existing` from false to
true in:
- #10373

### Issue

The issue manifests for collators that have an unfilled block gap in
their DB.

During restarting with #10373, a collator would try the following:
- client info has detected a gap at block 5800 with length 1
- collator [X] requests the block 5800 with `fields: HEADER | BODY |
JUSTIFICATION, from: Number(5800)`
- the other 2 collators respond with the full block, including the body,
because by default collators will keep around the canonical chain but
discard the block state
- collator [X] tries to import the block because `import_existing` is
true and we continue execution after the following check:

https://github.com/paritytech/polkadot-sdk/blob/2b9576c163b1c2408291e2b6c98ae0f2465b4818/substrate/client/service/src/client/client.rs#L1809-L1812

- Before the changes, the code returned `return
Ok(ImportResult::AlreadyInChain)` which short-circuited the importing of
the block

- collator [X] imports the block but fails with `State already
discarded`
- the error is propagated back to the sync engine that decides to
restart the sync process with the same block gap `Restarting sync with
client ...`
- This results in a vicious cycle where the collator [X] requests the
same block again, then restarts the sync engine
- Eventually at the 3 request the other collators will notice that this
behavior is malicious and ban and disconnect the peers.

### Fix

The fix is to skip executing blocks when the gap sync has marked blocks
as `StateAction::Skip`.

Please note we are still dealing with the following, which should be
part of a different PR:
- Gap Sync was never closed from the database
- When the node starts with a block gap, the node will always initiate a
block request over the sync protocol to close the gap
- Before the gap was marked as `import_existing: false` which short
ciruited the circuit and returned `AlreadyInChain`
- Effectively nodes would re-request the gap on reboot wasting
networking bandwidth to close the gap "in memory" only, but this was
never commited to the DB

### Full Logs

```rust
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1,
	block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None)
2026-03-10 13:43:41.138 TRACE                 main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)

2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362)
	BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) }

2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks  (5800) (origin: GapSync)
2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc…1cda) has 4 logs (origin: GapSync)
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda:
	Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91

2026-03-10 13:45:17.792  WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1,
		block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }

2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800)))
2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)
```

### Testing Done

- unblocks kusama yap 3392:
https://grafana.teleport.parity.io/goto/KBKfuhKDR?orgId=1
- left side of the graph is origin/master, right side is the patch
applied with connected peers

Closes:
- #11299

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 3c93291)
@paritytech-release-backport-bot

Successfully created backport PR for stable2506:

paritytech-release-backport-bot bot pushed a commit that referenced this pull request Mar 12, 2026
… state (#11330)

This PR skips the execution of blocks when they are propagated to
importing via `StateAction::Skip`.

There is a bug in the import queue that is affecting collators, which is
that they should not execute blocks for non-archive collators that are
part of Gap Sync.

The bug has surfaced by changing the `import_existing` from false to
true in:
- #10373

### Issue

The issue manifests for collators that have an unfilled block gap in
their DB.

During restarting with #10373, a collator would try the following:
- client info has detected a gap at block 5800 with length 1
- collator [X] requests the block 5800 with `fields: HEADER | BODY |
JUSTIFICATION, from: Number(5800)`
- the other 2 collators respond with the full block, including the body,
because by default collators will keep around the canonical chain but
discard the block state
- collator [X] tries to import the block because `import_existing` is
true and we continue execution after the following check:

https://github.com/paritytech/polkadot-sdk/blob/2b9576c163b1c2408291e2b6c98ae0f2465b4818/substrate/client/service/src/client/client.rs#L1809-L1812

- Before the changes, the code returned `return
Ok(ImportResult::AlreadyInChain)` which short-circuited the importing of
the block

- collator [X] imports the block but fails with `State already
discarded`
- the error is propagated back to the sync engine that decides to
restart the sync process with the same block gap `Restarting sync with
client ...`
- This results in a vicious cycle where the collator [X] requests the
same block again, then restarts the sync engine
- Eventually at the 3 request the other collators will notice that this
behavior is malicious and ban and disconnect the peers.

### Fix

The fix is to skip executing blocks when the gap sync has marked blocks
as `StateAction::Skip`.

Please note we are still dealing with the following, which should be
part of a different PR:
- Gap Sync was never closed from the database
- When the node starts with a block gap, the node will always initiate a
block request over the sync protocol to close the gap
- Before the gap was marked as `import_existing: false` which short
ciruited the circuit and returned `AlreadyInChain`
- Effectively nodes would re-request the gap on reboot wasting
networking bandwidth to close the gap "in memory" only, but this was
never commited to the DB

### Full Logs

```rust
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1,
	block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None)
2026-03-10 13:43:41.138 TRACE                 main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)

2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362)
	BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) }

2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks  (5800) (origin: GapSync)
2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc…1cda) has 4 logs (origin: GapSync)
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda:
	Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91

2026-03-10 13:45:17.792  WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1,
		block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }

2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800)))
2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)
```

### Testing Done

- unblocks kusama yap 3392:
https://grafana.teleport.parity.io/goto/KBKfuhKDR?orgId=1
- left side of the graph is origin/master, right side is the patch
applied with connected peers

Closes:
- #11299

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 3c93291)
@paritytech-release-backport-bot

Successfully created backport PR for stable2509:

paritytech-release-backport-bot bot pushed a commit that referenced this pull request Mar 12, 2026
… state (#11330)

This PR skips the execution of blocks when they are propagated to
importing via `StateAction::Skip`.

There is a bug in the import queue that is affecting collators, which is
that they should not execute blocks for non-archive collators that are
part of Gap Sync.

The bug has surfaced by changing the `import_existing` from false to
true in:
- #10373

### Issue

The issue manifests for collators that have an unfilled block gap in
their DB.

During restarting with #10373, a collator would try the following:
- client info has detected a gap at block 5800 with length 1
- collator [X] requests the block 5800 with `fields: HEADER | BODY |
JUSTIFICATION, from: Number(5800)`
- the other 2 collators respond with the full block, including the body,
because by default collators will keep around the canonical chain but
discard the block state
- collator [X] tries to import the block because `import_existing` is
true and we continue execution after the following check:

https://github.com/paritytech/polkadot-sdk/blob/2b9576c163b1c2408291e2b6c98ae0f2465b4818/substrate/client/service/src/client/client.rs#L1809-L1812

- Before the changes, the code returned `return
Ok(ImportResult::AlreadyInChain)` which short-circuited the importing of
the block

- collator [X] imports the block but fails with `State already
discarded`
- the error is propagated back to the sync engine that decides to
restart the sync process with the same block gap `Restarting sync with
client ...`
- This results in a vicious cycle where the collator [X] requests the
same block again, then restarts the sync engine
- Eventually at the 3 request the other collators will notice that this
behavior is malicious and ban and disconnect the peers.

### Fix

The fix is to skip executing blocks when the gap sync has marked blocks
as `StateAction::Skip`.

Please note we are still dealing with the following, which should be
part of a different PR:
- Gap Sync was never closed from the database
- When the node starts with a block gap, the node will always initiate a
block request over the sync protocol to close the gap
- Before the gap was marked as `import_existing: false` which short
ciruited the circuit and returned `AlreadyInChain`
- Effectively nodes would re-request the gap on reboot wasting
networking bandwidth to close the gap "in memory" only, but this was
never commited to the DB

### Full Logs

```rust
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1,
	block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None)
2026-03-10 13:43:41.138 TRACE                 main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)

2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362)
	BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) }

2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks  (5800) (origin: GapSync)
2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc…1cda) has 4 logs (origin: GapSync)
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda:
	Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91

2026-03-10 13:45:17.792  WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1,
		block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }

2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800)))
2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)
```

### Testing Done

- unblocks kusama yap 3392:
https://grafana.teleport.parity.io/goto/KBKfuhKDR?orgId=1
- left side of the graph is origin/master, right side is the patch
applied with connected peers

Closes:
- #11299

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 3c93291)
@paritytech-release-backport-bot

Successfully created backport PR for stable2512:

paritytech-release-backport-bot bot pushed a commit that referenced this pull request Mar 12, 2026
… state (#11330)

This PR skips the execution of blocks when they are propagated to
importing via `StateAction::Skip`.

There is a bug in the import queue that is affecting collators, which is
that they should not execute blocks for non-archive collators that are
part of Gap Sync.

The bug has surfaced by changing the `import_existing` from false to
true in:
- #10373

### Issue

The issue manifests for collators that have an unfilled block gap in
their DB.

During restarting with #10373, a collator would try the following:
- client info has detected a gap at block 5800 with length 1
- collator [X] requests the block 5800 with `fields: HEADER | BODY |
JUSTIFICATION, from: Number(5800)`
- the other 2 collators respond with the full block, including the body,
because by default collators will keep around the canonical chain but
discard the block state
- collator [X] tries to import the block because `import_existing` is
true and we continue execution after the following check:

https://github.com/paritytech/polkadot-sdk/blob/2b9576c163b1c2408291e2b6c98ae0f2465b4818/substrate/client/service/src/client/client.rs#L1809-L1812

- Before the changes, the code returned `return
Ok(ImportResult::AlreadyInChain)` which short-circuited the importing of
the block

- collator [X] imports the block but fails with `State already
discarded`
- the error is propagated back to the sync engine that decides to
restart the sync process with the same block gap `Restarting sync with
client ...`
- This results in a vicious cycle where the collator [X] requests the
same block again, then restarts the sync engine
- Eventually at the 3 request the other collators will notice that this
behavior is malicious and ban and disconnect the peers.

### Fix

The fix is to skip executing blocks when the gap sync has marked blocks
as `StateAction::Skip`.

Please note we are still dealing with the following, which should be
part of a different PR:
- Gap Sync was never closed from the database
- When the node starts with a block gap, the node will always initiate a
block request over the sync protocol to close the gap
- Before the gap was marked as `import_existing: false` which short
ciruited the circuit and returned `AlreadyInChain`
- Effectively nodes would re-request the gap on reboot wasting
networking bandwidth to close the gap "in memory" only, but this was
never commited to the DB

### Full Logs

```rust
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, finalized_number: 5883372, finalized_state: Some((0x43664710059a72b37c11db9f99a0f38323b478fbdc82afac058c530c7b002e4d, 5883372)), number_leaves: 1,
	block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }
2026-03-10 13:43:41.138 DEBUG                 main sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: None)
2026-03-10 13:43:41.138 TRACE                 main sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)

2026-03-10 13:45:17.775 TRACE tokio-runtime-worker sync: [Parachain] New gap block request for 12D3KooWRejf1JYYjaaKhHAn28VJJR9ryZqs3wiGPsVjk6eFLLrn, (best:5883362, common:5883362)
	BlockRequest { id: 0, fields: HEADER | BODY | JUSTIFICATION, from: Number(5800), direction: Descending, max: Some(1) }

2026-03-10 13:45:17.784 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Starting import of 1 blocks  (5800) (origin: GapSync)
2026-03-10 13:45:17.784 TRACE tokio-runtime-worker sync::import-queue: [Parachain] Block 5800 (0x26dc…1cda) has 4 logs (origin: GapSync)
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync::import-queue: [Parachain] Error importing block 5800: 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda:
	Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91

2026-03-10 13:45:17.792  WARN tokio-runtime-worker sync: [Parachain] 💔 Error importing block 0x26dca166cfefe439262d201b10a8d2679edc4bd98ae59fe12d7f7eef9b871cda: consensus error: Api called for an unknown Block: State already discarded for 0x4739cf07649d6383bb19d2adccbe9d3f5b1ed91ef5fd6530bc8e69e560b5be91
2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Restarting sync with client info Info { best_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, best_number: 5883392, genesis_hash: 0x8692fdabb7e55c3347c0f887343e3c0f3fbb560c5f52c9cdc1f7660a1f183c5d, finalized_hash: 0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, finalized_number: 5883392, finalized_state: Some((0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102, 5883392)), number_leaves: 1,
		block_gap: Some(BlockGap { start: 5800, end: 5800, gap_type: MissingBody }) }

2026-03-10 13:45:17.792 DEBUG tokio-runtime-worker sync: [Parachain] Starting gap sync #5800 - #5800 (old gap best and target: Some((5800, 5800)))
2026-03-10 13:45:17.792 TRACE tokio-runtime-worker sync: [Parachain] Restarted sync at #5883392 (0xcb03c2aa7dd61f84b27d4c7db42ab848d2eaee9da77ddedc827e070ece063102)
```

### Testing Done

- unblocks kusama yap 3392:
https://grafana.teleport.parity.io/goto/KBKfuhKDR?orgId=1
- left side of the graph is origin/master, right side is the patch
applied with connected peers

Closes:
- #11299

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 3c93291)
@paritytech-release-backport-bot

Successfully created backport PR for stable2603:

EgorPopelyaev pushed a commit that referenced this pull request Mar 12, 2026
Backport #11330 into `stable2509` from lexnv.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
EgorPopelyaev pushed a commit that referenced this pull request Mar 12, 2026
Backport #11330 into `stable2603` from lexnv.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
EgorPopelyaev pushed a commit that referenced this pull request Mar 13, 2026
Backport #11330 into `stable2506` from lexnv.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
EgorPopelyaev pushed a commit that referenced this pull request Mar 13, 2026
Backport #11330 into `stable2512` from lexnv.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
github-merge-queue bot pushed a commit that referenced this pull request Mar 19, 2026
This PR closes missing body gaps in the database for non-archive nodes.


Effectively, a missing body gap cannot be closed on the DB side if the
node is non-archive. Since execution is already skipped, the node will
close the memory gap in the sync engine; however, the gap remains open
in the db.

This leads to wasting resources at every startup:
- client info contains a gap that cannot be filled (since we don't have
the state around for execution)
- blocks are fetched from the connected peers
- gap is filled by ignoring blocks in the sync engine

Further, for collators on origin master this causes an infinite loop of
sync engine restarts that get punished via banning and disconnecting.
For more details and root cause check:
- #11330

Part of:
- #11299

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
paritytech-release-backport-bot bot pushed a commit that referenced this pull request Mar 20, 2026
This PR closes missing body gaps in the database for non-archive nodes.

Effectively, a missing body gap cannot be closed on the DB side if the
node is non-archive. Since execution is already skipped, the node will
close the memory gap in the sync engine; however, the gap remains open
in the db.

This leads to wasting resources at every startup:
- client info contains a gap that cannot be filled (since we don't have
the state around for execution)
- blocks are fetched from the connected peers
- gap is filled by ignoring blocks in the sync engine

Further, for collators on origin master this causes an infinite loop of
sync engine restarts that get punished via banning and disconnecting.
For more details and root cause check:
- #11330

Part of:
- #11299

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
(cherry picked from commit 0f64cfc)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A4-backport-stable2506 Pull request must be backported to the stable2506 release branch A4-backport-stable2509 Pull request must be backported to the stable2509 release branch A4-backport-stable2512 Pull request must be backported to the stable2512 release branch A4-backport-stable2603 Pull request must be backported to the stable2603 release branch T0-node This PR/Issue is related to the topic “node”.

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants