statement-store: reject statement peers during major sync by peter941221 · Pull Request #11449 · paritytech/polkadot-sdk

peter941221 · 2026-03-20T09:00:09Z

Summary

This PR fixes #11411 and the adjacent statement-handler lifecycle holes it exposed by preventing peers from remaining in a half-connected statement-protocol state while the node is unavailable.

The core repair is still the original major-sync fix:

Reject inbound statement substreams while the handler is unavailable.
Disconnect statement streams that still open while unavailable.
Disconnect already-connected peers that send statements while unavailable, and clean up their local statement-handler state.

Follow-up reviewer passes then closed the same half-connected state in nearby paths:

offline transitions,
async send failures,
outbound interruption,
duplicate and stale reopen events,
initial-sync fetch failures,
and false completion metrics on aborted cleanup paths.

Why this is needed

Today a peer can end up in a broader half-connected state:

the statement stream is open,
live statements are rejected or interrupted while the node is unavailable,
peers opened during that window may never get a clean initial-sync replay path if cleanup is incomplete,
and there is no later replay trigger.

That means statements can be missed permanently even though the peer still looks connected.

A key detail here is that gating only newly opening peers is not sufficient. Peers that were already connected before major sync starts can still hit the same loss window once sync state changes, and later reviewer passes found equivalent stale-state variants after send failures and reopen races.

Why disconnect instead of silently ignoring

Silently ignoring statements preserves the broken state: the peer stays connected on the statement protocol, but cannot contribute live statements and does not get a clean catch-up path.

Disconnecting is the smallest fix that reuses the existing lifecycle:

reject statement traffic while unavailable,
clean up local statement-handler state eagerly,
let the peer reconnect once the node is ready again,
and reuse the normal stream-open path to queue initial sync.

Tests

This branch now adds regression coverage for:

rejecting inbound statement substreams during major sync,
rejecting inbound statement substreams while offline,
disconnecting streams that open during major sync,
disconnecting streams that open while offline,
reconnecting after sync exit and re-queuing initial sync,
preserving the normal open path outside major sync,
disconnecting already-open peers that send while unavailable,
send failure retry and cleanup,
interrupted outbound propagation cleanup,
duplicate stream-open state replacement,
stale unavailable reopen cleanup,
initial-sync fetch failure cleanup,
and ensuring aborted initial sync cleanup does not inflate completion metrics.

Validation

rustfmt --edition 2021 substrate/client/network/statement/src/lib.rs
git diff --check -- substrate/client/network/statement/src/lib.rs
cargo check -p sc-network-statement --lib
cargo test -p sc-network-statement --lib -- --nocapture

Validation was completed in a Linux-native WSL checkout with real symlinks preserved. Earlier Windows-backed checkout failures were environment-specific and are no longer the limiting factor for this branch.

- reject inbound statement substreams while major syncing - disconnect statement streams that open during the sync window - disconnect already-connected peers that send during major sync - add regression tests for reconnect and initial-sync recovery paths

cla-bot-2021 · 2026-03-20T09:00:16Z

User @peter941221, please sign the CLA here.

peter941221 · 2026-03-20T15:45:30Z

Quick update: I pushed two more reviewer-driven fixes on this branch:

4b69bf3 — disconnect and cleanly retry if initial-sync statement fetch fails
19bf2c0 — only record initial_sync_duration_seconds on true initial-sync completion

Both are covered by the current Linux-native validation:

cargo test -p sc-network-statement --lib -- --nocapture
cargo check -p sc-network-statement --lib

If you'd prefer the later hardening commits split into follow-up PRs, I'm happy to do that.

github-project-automation bot added this to [preview] release tracker Mar 20, 2026

github-project-automation bot moved this to Todo in [preview] release tracker Mar 20, 2026

peter941221 added 9 commits March 20, 2026 17:24

statement-store: snapshot sync state on stream open

46d42fa

statement-store: pause statement protocol while offline

1d26dba

statement-store: stabilize sustained flooding test

7bfd50e

statement-store: harden statement send failures

7bac3c4

statement-store: disconnect peers on outbound interruption

4173826

statement-store: tolerate duplicate stream-open events

95907ec

statement-store: cleanup stale peers on unavailable reopen

b56fe2e

statement-store: retry initial sync after fetch failure

4b69bf3

statement-store: only count completed initial syncs

19bf2c0

pgherveou removed this from [preview] release tracker Mar 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

statement-store: reject statement peers during major sync#11449

statement-store: reject statement peers during major sync#11449
peter941221 wants to merge 10 commits intoparitytech:masterfrom
peter941221:peter/statement-store-major-sync

peter941221 commented Mar 20, 2026 •

edited

Loading

Uh oh!

cla-bot-2021 bot commented Mar 20, 2026

Uh oh!

peter941221 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

peter941221 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why this is needed

Why disconnect instead of silently ignoring

Tests

Validation

Uh oh!

cla-bot-2021 bot commented Mar 20, 2026

Uh oh!

peter941221 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

peter941221 commented Mar 20, 2026 •

edited

Loading