FCU to canonical ancestor silently ignored, leaving descendants canonical#10876
FCU to canonical ancestor silently ignored, leaving descendants canonical#10876svlachakis wants to merge 12 commits intomasterfrom
Conversation
|
@claude review |
|
Claude finished @smartprogrammer93's task in 3m 53s —— View job Claude's Review
SummaryThe fix is correct and minimal. Removing the 5-line early-return restores the semantics demanded by the Engine API spec: a FCU to any already-known block must always update chain state, not just return The Fix (
|
There was a problem hiding this comment.
Pull request overview
Fixes a forkchoice update edge case where FCU targeting an already-canonical ancestor below the current head was previously short-circuited, leaving the higher descendants incorrectly marked canonical and the head pointer unchanged. This aligns ForkchoiceUpdatedHandler.ApplyForkchoiceUpdate behavior with the intended canonical reorg semantics and adds regression/unit coverage around canonical marking.
Changes:
- Remove the early-return optimization in
ForkchoiceUpdatedHandlersoUpdateMainChainruns even when FCU targets a canonical ancestor below head. - Add an Engine API regression test to ensure FCU to an ancestor moves head back and de-canonicalizes descendants.
- Add a suite of BlockTree unit tests covering canonical marking/unmarking across reorg and lookup scenarios (including persistence/reload).
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/Nethermind/Nethermind.Merge.Plugin/Handlers/ForkchoiceUpdatedHandler.cs | Removes the processed-branch early return so FCU to a canonical ancestor triggers main-chain update/de-canonicalization. |
| src/Nethermind/Nethermind.Merge.Plugin.Test/EngineModuleTests.V1.cs | Adds an Engine API regression test reproducing and preventing the “ancestor FCU ignored” canonical mismatch. |
| src/Nethermind/Nethermind.Blockchain.Test/BlockTreeTests.cs | Adds canonical-marking/unit tests for reorg, lookup consistency, and persistence/reload edge cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
|
@claude review |
|
Claude finished @smartprogrammer93's task in 5m 56s —— View job Claude finished @smartprogrammer93's task — Claude's Review
ContextThis is a different implementation from what the previous Claude review described. The current branch does not touch SummaryThe fix is correct and well-targeted. The Core Fix (
|
|
BTW in ePBS we will get FCU a block and then we can get FCU to the ancestor (not sure if always parent) and we need to support it. How does this affect this scenario? Is it compatible with this during sync? |
Fixes Closes Resolves #9906 & #10861
Changes
Fix: Stale canonical markers after beacon sync cause wrong eth_getBlockByNumber results
Problem
BlockTree.UpdateMainChain uses Head?.Number as the baseline for its unmark loop:
During beacon sync, BlockDownloader calls:
This marks downloaded blocks as canonical (HasBlockOnMainChain = true) without updating Head. The head update in MoveToMain is gated on
wasProcessed = true, so Head remains at whatever height the last FCU set it to.
After syncing blocks at heights H+1..Hn this way, Head remains stale at H.
When a reorg then arrives — FCU(B) where B is a sibling of the canonical block at height H — UpdateMainChain is called with:
Since H > H is false, the unmark loop is skipped entirely. As a result, orphaned blocks at H+1..Hn retain HasBlockOnMainChain = true.
This causes eth_getBlockByNumber to return the wrong block at those heights — the canonical mismatch observed on Gnosis.
A second related case exists in ePBS, where the CL can issue FCU to an ancestor rather than a sibling. If Head is stale at H and beacon sync has
marked H+1..Hn canonical, a subsequent FCU(H-1) triggers the downward unmark loop which correctly clears H but cannot see above the stale head —
leaving H+1..Hn canonical.
Fix
Replaced the if/else with a downward unmark (unchanged) followed by an unconditional upward scan:
The IsOnMainChainBehindHead early-return in ForkchoiceUpdatedHandler is not changed — it is spec-compliant, and geth's forkchoiceUpdated has
identical behaviour.
Startup healing flag for nodes already affected
Nodes that hit this bug before the fix may have corrupted canonical markers persisted to disk. The fix prevents new corruption from occurring,
but does not repair existing state.
A new startup flag triggers a one-time repair pass at boot:
--Init.HealCanonicalChainOnStartup true
--Init.HealCanonicalChainDepth 8192 (default, adjustable)
When enabled, ReviewBlockTree invokes BlockTree.HealCanonicalChain(head, depth) before the normal startup traversal. The heal runs in two phases
within a single BatchWrite (atomic, deferred flush):
but no longer reachable from the true head.
HasBlockOnMainChain = true and sits at index 0 in its ChainLevelInfo slot. Stops early at genesis, a missing parent (pruned chain), or a block
hash not found in any slot.
HealCanonicalChain is a void command on IBlockTree. ReadOnlyBlockTree and StatelessBlockTree throw on invocation; BlockTreeOverlay delegates to
the inner tree.
Tests
BlockTreeTests.cs:
the upward scan, passes after.
above the stale head. The downward loop clears the stale head level; the upward scan clears the rest. Failed before Math.Max, passes after.
call.
Types of changes
What types of changes does your code introduce?
Testing
Requires testing
If yes, did you write tests?