-
Notifications
You must be signed in to change notification settings - Fork 1.2k
client/db: Close missing body gaps for non archive nodes #11332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+202
−0
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
e34f544
client/db: Remove missing body gaps for non archive nodes
lexnv 13fcc0f
client/db: Close missing body gaps for non archive nodes
lexnv a6c9cd1
client/db: Add tests
lexnv 8604871
client/db: Test with multi block gaps
lexnv 5312a18
Merge branch 'master' into lexnv/nuke-gaps
lexnv 221cb99
Update from github-actions[bot] running command 'prdoc --audience nod…
github-actions[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| title: 'client/db: Close missing body gaps for non archive nodes' | ||
| doc: | ||
| - audience: Node Dev | ||
| description: |- | ||
| This PR closes missing body gaps in the database for non-archive nodes. | ||
|
|
||
|
|
||
| Effectively, a missing body gap cannot be closed on the DB side if the node is non-archive. Since execution is already skipped, the node will close the memory gap in the sync engine; however, the gap remains open in the db. | ||
|
|
||
| This leads to wasting resources at every startup: | ||
| - client info contains a gap that cannot be filled (since we don't have the state around for execution) | ||
| - blocks are fetched from the connected peers | ||
| - gap is filled by ignoring blocks in the sync engine | ||
|
|
||
| Further, for collators on origin master this causes an infinite loop of sync engine restarts that get punished via banning and disconnecting. For more details and root cause check: | ||
| - https://github.com/paritytech/polkadot-sdk/pull/11330 | ||
|
|
||
| Part of: | ||
| - https://github.com/paritytech/polkadot-sdk/issues/11299 | ||
| crates: | ||
| - name: sc-client-db | ||
| bump: patch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did it come to this? Because someone restarted the node using a different pruning mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, it was just the collator functioning "normally" and no params were changed other than switching to the master PR (which now has
import_existing: true).I believe I had a look over the client info a few years ago, reporting gaps that should be filled or that should have never existed, but I dismissed that as stopping the node at the wrong time or switching params.
I think this code was silently filling the memory gap but never informed the DB layer about not caring for the block execution:
polkadot-sdk/substrate/client/service/src/client/client.rs
Lines 1809 to 1812 in 2b9576c
And now with the optimization of not even fetching the
BODY, its assumed that the DB will never close it:polkadot-sdk/substrate/client/service/src/client/client.rs
Lines 1809 to 1812 in 2b9576c
Not entirely sure what might have caused the gap in the first place, since the nodes were running without the optimization of stripping BODY initially 🤔 So i would have expected that all bodies were fetched as well