fix: continue pruning if version is not found#1063
Conversation
WalkthroughThe changes enhance the Changes
Sequence Diagram(s)sequenceDiagram
participant NodeDB
participant Cache as Cache.getRootKey
participant OrphanTraversal as traverseOrphansWithRootkeyCache
NodeDB->>Cache: getRootKey(version)
alt Version does not exist
Cache-->>NodeDB: ErrVersionDoesNotExist
NodeDB->>NodeDB: Log error and continue
else Valid rootKey returned
Cache-->>NodeDB: rootKey
NodeDB->>OrphanTraversal: traverseOrphansWithRootkeyCache(rootKey)
alt traverse error is ErrVersionDoesNotExist
OrphanTraversal-->>NodeDB: Ignored error
else Other error occurs
OrphanTraversal-->>NodeDB: Return error
end
end
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (3)
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
nodedb.go (1)
466-468: Fix typo in error log message.There's a typo in the error message: "moving on the the next version" (duplicate "the").
- ndb.logger.Error("Error while pruning, moving on the the next version in the store", "version missing", version, "next version", version+1, "err", err) + ndb.logger.Error("Error while pruning, moving on to the next version in the store", "version missing", version, "next version", version+1, "err", err)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
nodedb.go(2 hunks)
🧰 Additional context used
🧬 Code Definitions (1)
nodedb.go (3)
mutable_tree.go (1)
ErrVersionDoesNotExist(18-18)node.go (1)
Node(59-75)cache/cache.go (1)
Node(10-12)
🔇 Additional comments (3)
nodedb.go (3)
462-464: Improved error handling for ErrVersionDoesNotExist.This change modifies the error handling to specifically check for
ErrVersionDoesNotExistand continue execution in that case, rather than immediately returning the error. This aligns with the PR's objective to allow pruning to continue when versions are missing.
470-493: Added conditional traversal and ErrVersionDoesNotExist handling.This change adds a null check on
rootKeybefore traversing orphans, which prevents potential nil pointer dereferences. It also modifies the error handling to ignoreErrVersionDoesNotExisterrors during traversal, consistent with the other changes in this PR.
506-508: Consistent error handling for next version root key.This change applies the same improved error handling pattern for the next version's root key check, maintaining consistency with the earlier changes.
nodedb.go
Outdated
| // check if the version is referred by the next version | ||
| nextRootKey, err := cache.getRootKey(ndb, version+1) | ||
| if err != nil { | ||
| if err != nil && err != ErrVersionDoesNotExist { |
There was a problem hiding this comment.
Could nextRootKey be nil above if ErrVersionDoesNotExist?
There was a problem hiding this comment.
Yeah it can if both the current version and the next version are missing
There was a problem hiding this comment.
we don't have a check for it being nil right?
|
@Mergifyio backport release/v1.2.x |
|
@Mergifyio backport release/v1.3.x |
✅ Backports have been createdDetails
|
✅ Backports have been createdDetails
|
(cherry picked from commit 8a2e2fe)
(cherry picked from commit 8a2e2fe)
Description
We found a case in Osmosis node where there is a root key that is points to a node that doesn't exists and it hangs the pruning process because fails at get root key (returns ErrVersionDoesNotExist).
There is already code to clean the dangling ref node up, but it just never get there because it early returns ErrVersionDoesNotExist before getting there.
This means when pruning we cannot prune a version of the store because it gets stuck. This PR moves onto the next version in the store if pruning returns a not found error.
Notes about legacy nodes
firsttolegacyLatestVersion+1see:
Downloading state
https://snapshots.testnet.osmosis.zone/
or rn polkachu snapshots have and issue with
bankandconcentratedliquidityhttps://polkachu.com/tendermint_snapshots/osmosis
I ran this PR on this state on osmosis mainnet and it fixed the issue see => osmosis-labs/osmosis#9333
Checking broken stores
Use this PR and run:
osmosis-labs/cosmprund#2
Pruning broken stores
Use this PR and run:
osmosis-labs/cosmprund#2
State will then be fixed
Things we don't know
Why are there states deleted outside of pruning? Why does this become more apparent with async pruning?
Another version of the fix
#1048
This fix, works in the same way and just continues after the is a version not found error, this moves past both checks, version and version+1
Why this is needed
Currently if pruning breaks with this error the chain state will start to grow quickly.
What the fix will look like
Osmosis mainnet with broken state:
Before this would have and the state would bloat
This is osmosis testnet with broken state
This represents a large backlog as pruning is on
720388727208281Summary by CodeRabbit
Summary by CodeRabbit