Skip to content

Comments

Prefetch Transactions from Pool & PIP-66 Back#2031

Merged
lucca30 merged 36 commits intodevelopfrom
lmartins/prefetch-on-bp
Feb 11, 2026
Merged

Prefetch Transactions from Pool & PIP-66 Back#2031
lucca30 merged 36 commits intodevelopfrom
lmartins/prefetch-on-bp

Conversation

@lucca30
Copy link
Contributor

@lucca30 lucca30 commented Feb 2, 2026

Description

  • Implements state prefetching from the transaction pool during block production to improve cache hit rates and reduce block building latency.
  • Brings PIP-66 back with a fix to avoid early span checks due to the early announcement.
  • Change on where we wait from Seal to Prepare phase

Key Changes

1. State Prefetching from Transaction Pool

  • Location: miner/worker.go, core/state_prefetcher.go
  • Prefetches state (accounts, storage) for pending transactions before block production starts
  • Runs prefetch in parallel with block building to warm up the state cache

2. Configuration Flags

  • --miner.prefetch: Enable/disable transaction pool prefetching (default: disabled)
  • --miner.prefetch.gaslimit.percent: Percentage of block gas limit to be used as internal limit of prefetched transactions.

3. Enhanced Metrics

  • Location: miner/worker.go, core/state/reader.go, core/blockchain.go
  • Separated worker cache metrics from blockchain import metrics (prefix: worker/chain/...)
  • Added prefetch attribution metrics to measure effectiveness:
    • hit_from_prefetch: Cache hits attributed to prefetch
    • prefetch_used_unique: Unique accounts loaded by prefetch and used during processing

Technical Details

Prefetch Flow

  1. When commitWork is called, two parallel operations start:
    • State prefetch for pending pool transactions
    • Current block production (which waits on prepare phase)
  2. Block building uses the warmed cache from prefetch
  3. Metrics track which cache hits came from prefetch vs. prior execution

Results

From a local test with polycli running both random and Uniswap mode we observed that many storages that never were prefetched are now being prefetched and also a slightly increase on accounts being early prefetched

cache_storage cache_account

Worker Commit Transactions Time

Instance 1 has prefetch enabled, results in ms

image

PS: The second instance is the one with prefetchFromPool activated

Changes

  • Bugfix (non-breaking change that solves an issue)
  • Hotfix (change that solves an urgent issue, and requires immediate attention)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (change that is not backwards-compatible and/or changes current functionality)
  • Changes only for a subset of nodes

Breaking changes

Please complete this section if any breaking changes have been made, otherwise delete it

Nodes audience

In case this PR includes changes that must be applied only to a subset of nodes, please specify how you handled it (e.g. by adding a flag with a default value...)

Checklist

  • I have added at least 2 reviewer or the whole pos-v1 team
  • I have added sufficient documentation in code
  • I will be resolving comments - if any - by pushing each fix in a separate commit and linking the commit hash in the comment reply
  • Created a task in Jira and informed the team for implementation in Erigon client (if applicable)
  • Includes RPC methods changes, and the Notion documentation has been updated

Cross repository changes

  • This PR requires changes to heimdall
    • In case link the PR here:
  • This PR requires changes to matic-cli
    • In case link the PR here:

Testing

  • I have added unit tests
  • I have added tests to CI
  • I have tested this code manually on local environment
  • I have tested this code manually on remote devnet using express-cli
  • I have tested this code manually on amoy
  • I have created new e2e tests into express-cli

Manual tests

Please complete this section with the steps you performed if you ran manual tests for this functionality, otherwise delete it

Additional comments

Please post additional comments in this section if you have them, otherwise delete it

@claude
Copy link

claude bot commented Feb 2, 2026

Code Review

Bug found in consensus/bor/bor.go at line 1080

The delay calculation delay = time.Until(parentTime) computes time until the PARENT block timestamp. Since the parent block has already been mined, its timestamp is in the past, making this a negative duration. When time.After(delay) is called with a negative duration at line 1102 (

bor/consensus/bor/bor.go

Lines 1101 to 1103 in 0077186

// Wait before start the block production if needed (previsously this wait was on Seal)
if c.config.IsBhilai(header.Number) && successionNumber == 0 && waitOnPrepare {
<-time.After(delay)
), it returns immediately without any actual waiting.

The old Seal logic correctly waited until header.GetActualTime() (the new block target time, which is in the future):

bor/consensus/bor/bor.go

Lines 1367 to 1370 in 0077186

var delay time.Duration
// Sweet, the protocol permits us to sign the block, wait for our time
if c.config.IsBhilai(header.Number) && successionNumber == 0 {

Suggested fix at line 1080 (

bor/consensus/bor/bor.go

Lines 1078 to 1082 in 0077186

parentTime := time.Unix(int64(parent.Time), 0)
producerDelay := CalcProducerDelay(number, succession, c.config)
header.Time = parent.Time + producerDelay
delay = time.Until(parentTime)
}
):

delay = time.Until(time.Unix(int64(header.Time), 0))

Note: The test at line 1741 passes because it sets genesis time 3 seconds in the future, which makes parentTime positive. In production, the parent block is always in the past.


Checked for bugs and CLAUDE.md compliance.

@codecov
Copy link

codecov bot commented Feb 2, 2026

Codecov Report

❌ Patch coverage is 79.90074% with 81 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.34%. Comparing base (96df644) to head (06b773a).
⚠️ Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
miner/worker.go 75.96% 36 Missing and 14 partials ⚠️
consensus/bor/bor.go 73.46% 8 Missing and 5 partials ⚠️
core/blockchain_reader.go 50.00% 3 Missing and 1 partial ⚠️
core/state/reader.go 94.52% 4 Missing ⚠️
core/state_prefetcher.go 88.00% 2 Missing and 1 partial ⚠️
internal/cli/server/config.go 66.66% 2 Missing and 1 partial ⚠️
consensus/beacon/consensus.go 0.00% 2 Missing ⚠️
consensus/clique/clique.go 0.00% 1 Missing ⚠️
consensus/ethash/consensus.go 0.00% 1 Missing ⚠️

❌ Your patch status has failed because the patch coverage (79.90%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #2031      +/-   ##
===========================================
+ Coverage    50.15%   50.34%   +0.18%     
===========================================
  Files          871      871              
  Lines       150614   150917     +303     
===========================================
+ Hits         75546    75973     +427     
+ Misses       70023    69889     -134     
- Partials      5045     5055      +10     
Files with missing lines Coverage Δ
core/blockchain.go 62.24% <100.00%> (+0.12%) ⬆️
core/state/database.go 62.96% <100.00%> (+3.70%) ⬆️
internal/cli/server/flags.go 100.00% <100.00%> (ø)
miner/miner.go 72.35% <ø> (ø)
consensus/clique/clique.go 41.11% <0.00%> (ø)
consensus/ethash/consensus.go 38.12% <0.00%> (ø)
consensus/beacon/consensus.go 11.35% <0.00%> (ø)
core/state_prefetcher.go 91.46% <88.00%> (-2.57%) ⬇️
internal/cli/server/config.go 63.14% <66.66%> (-0.04%) ⬇️
core/blockchain_reader.go 42.28% <50.00%> (-0.19%) ⬇️
... and 3 more

... and 18 files with indirect coverage changes

Files with missing lines Coverage Δ
core/blockchain.go 62.24% <100.00%> (+0.12%) ⬆️
core/state/database.go 62.96% <100.00%> (+3.70%) ⬆️
internal/cli/server/flags.go 100.00% <100.00%> (ø)
miner/miner.go 72.35% <ø> (ø)
consensus/clique/clique.go 41.11% <0.00%> (ø)
consensus/ethash/consensus.go 38.12% <0.00%> (ø)
consensus/beacon/consensus.go 11.35% <0.00%> (ø)
core/state_prefetcher.go 91.46% <88.00%> (-2.57%) ⬇️
internal/cli/server/config.go 63.14% <66.66%> (-0.04%) ⬇️
core/blockchain_reader.go 42.28% <50.00%> (-0.19%) ⬇️
... and 3 more

... and 18 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lucca30
Copy link
Contributor Author

lucca30 commented Feb 2, 2026

Code Review

Bug found in consensus/bor/bor.go at line 1080

The delay calculation delay = time.Until(parentTime) computes time until the PARENT block timestamp. Since the parent block has already been mined, its timestamp is in the past, making this a negative duration. When time.After(delay) is called with a negative duration at line 1102 (

bor/consensus/bor/bor.go

Lines 1101 to 1103 in 0077186

// Wait before start the block production if needed (previsously this wait was on Seal)
if c.config.IsBhilai(header.Number) && successionNumber == 0 && waitOnPrepare {
<-time.After(delay)

), it returns immediately without any actual waiting.
The old Seal logic correctly waited until header.GetActualTime() (the new block target time, which is in the future):

bor/consensus/bor/bor.go

Lines 1367 to 1370 in 0077186

var delay time.Duration
// Sweet, the protocol permits us to sign the block, wait for our time
if c.config.IsBhilai(header.Number) && successionNumber == 0 {

Suggested fix at line 1080 (

bor/consensus/bor/bor.go

Lines 1078 to 1082 in 0077186

parentTime := time.Unix(int64(parent.Time), 0)
producerDelay := CalcProducerDelay(number, succession, c.config)
header.Time = parent.Time + producerDelay
delay = time.Until(parentTime)
}

):

delay = time.Until(time.Unix(int64(header.Time), 0))

Note: The test at line 1741 passes because it sets genesis time 3 seconds in the future, which makes parentTime positive. In production, the parent block is always in the past.

Checked for bugs and CLAUDE.md compliance.

The delay calculation delay = time.Until(parentTime) is intentional and correct for the early announcement feature introduced in Rio.

Design Intent

The Rio fork allows early block announcement to improve block propagation speed. The primary producer (succession == 0) can produce and announce blocks as soon as parent.Time has passed, rather than waiting until header.Time.

Key timing relationships:

  • header.Time = parent.Time + producerDelay (where producerDelay ≥ 2s)
  • Old behavior: Wait until header.Time before sealing
  • New behavior: Wait until parent.Time before preparing (allowing earlier production)

Why negative delays are correct

In production, when the parent block is in the past:

  1. delay = time.Until(parentTime) is negative
  2. time.After(negative_duration) returns immediately
  3. This is the intended fast path - produce the block immediately without artificial delay

@lucca30 lucca30 changed the title Lmartins/prefetch on bp Prefetch Transactions from Pool & PIP-66 Back Feb 2, 2026
@lucca30 lucca30 requested a review from a team February 2, 2026 17:21
Copy link
Contributor

@cffls cffls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is quite large to review, and it essentially contains two features. If possible, could you split it into two, one is about PIP-66 and the other is txns prefetch?

@lucca30
Copy link
Contributor Author

lucca30 commented Feb 3, 2026

This PR is quite large to review, and it essentially contains two features. If possible, could you split it into two, one is about PIP-66 and the other is txns prefetch?

@cffls

Yeah, I do also agree it got big. But is important to point that the Transaction Prefetching benefits and partially depends on PIP-66. So it would be a bit hard trying to split those changes now. What I'll try to do is provided a summary of snippets related to the PIP-66 below.

1. Early Announcement Check

This section introduces a new and more flexible way to check early announcements too early because of the introduced BlockTime which is not on consensus. (better explained on another PR comment)

bor/consensus/bor/bor.go

Lines 405 to 438 in 6acb9e1

if c.config.IsRio(header.Number) {
// Rio HF introduced flexible blocktime (can be set larger than consensus without approval).
// Using strict CalcProducerDelay would reject valid blocks, so we just ensure announcement
// time comes after parent time to allow for flexible blocktime.
var parent *types.Header
if len(parents) > 0 {
parent = parents[len(parents)-1]
} else {
parent = chain.GetHeader(header.ParentHash, number-1)
}
if parent == nil || now < parent.Time {
log.Error("Block announced too early post rio", "number", number, "headerTime", header.Time, "now", now)
return consensus.ErrFutureBlock
}
} else if c.config.IsBhilai(header.Number) {
// Allow early blocks if Bhilai HF is enabled
// Don't waste time checking blocks from the future but allow a buffer of block time for
// early block announcements. Note that this is a loose check and would allow early blocks
// from non-primary producer. Such blocks will be rejected later when we know the succession
// number of the signer in the current sprint.
// Uses CalcProducerDelay instead of block period to account for producer delay on sprint start blocks.
// We assume succession 0 (primary producer) to not be much restrictive for early block announcements.
if header.Time-CalcProducerDelay(number, 0, c.config) > now {
log.Error("Block announced too early post bhilai", "number", number, "headerTime", header.Time, "now", now)
return consensus.ErrFutureBlock
}
} else {
// Don't waste time checking blocks from the future
if header.Time > now {
log.Error("Block announced too early", "number", number, "headerTime", header.Time, "now", now)
return consensus.ErrFutureBlock
}
}

2. Accounting the Announcement Time into timeout for span check

As we discussed on a chat, this address the original issue we got when introduction PIP-66. Which were considering the announcement time on the TTL instead of block time itself.

bor/consensus/bor/bor.go

Lines 496 to 508 in 6acb9e1

// Calculate TTL for the header cache entry
// If the header time is in the future (early announced block), add extra time to TTL
cacheTTL := veblopBlockTimeout
nowTime := time.Now()
headerTime := time.Unix(int64(header.Time), 0)
if headerTime.After(nowTime) {
// Add the time from now until header time as extra to the base timeout
extraTime := headerTime.Sub(nowTime)
cacheTTL = veblopBlockTimeout + extraTime
}
c.recentVerifiedHeaders.Set(header.Hash(), header, cacheTTL)
return nil

3. Wait on Prepare

This is a bit new, but fits with this feature which is moving the wait we do from Seal. This gives us better conditions to build the background task to build the new block.

func (c *Bor) Prepare(chain consensus.ChainHeaderReader, header *types.Header, waitOnPrepare bool) error {

@lucca30 lucca30 mentioned this pull request Feb 3, 2026
18 tasks
@claude
Copy link

claude bot commented Feb 10, 2026

Code Review

Issue 1/2: Ignored Error from SubGas

File: miner/worker.go line 1954 (link)

Issue: Ignored error return

The error returned from totalGasPool.SubGas(result.TotalGasUsed) is ignored. Per CLAUDE.md:

Handle Errors: Never ignore errors with _

While the first gaspool.SubGas(tx.Gas()) call at line 1945 has a preceding gas check, this second call at line 1954 has no such validation. If result.TotalGasUsed exceeds the remaining gas in totalGasPool, the error is silently ignored and gas tracking becomes inaccurate.

Suggested fix: Check the error return and handle appropriately:

if err := totalGasPool.SubGas(result.TotalGasUsed); err != nil {
    // Log and continue, or return early if gas exhaustion should stop prefetching
    log.Debug("Prefetch gas pool exhausted", "err", err)
    return
}

Issue 2/2: Data Race on w.extra

File: miner/worker.go line 1869 (link)

Issue: Race condition

The prefetchFromPool function calls w.makeHeader(genParams, false) without holding w.mu. However, makeHeader accesses w.extra (inherited from the original prepareWork code), which can be concurrently written by setExtra under w.mu.Lock().

When makeHeader is called from prepareWork, it's protected by w.mu.RLock() (line 1550). But when called from prefetchFromPool in a separate goroutine, no lock is held, creating a data race.

Suggested fix: Either:

  1. Acquire w.mu.RLock() before calling makeHeader in prefetchFromPool
  2. Pass the extra value as a parameter instead of reading from w.extra
  3. Make a copy of w.extra under lock at the start of prefetchFromPool

Copy link
Contributor

@cffls cffls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@lucca30
Copy link
Contributor Author

lucca30 commented Feb 11, 2026

The latest Claude review pointed two issues, second one I agree and fixed. The first one is a false positive.

@claude
Copy link

claude bot commented Feb 11, 2026

Code Review

Found 2 issues in the new code:

1. Ignored error from SubGas() (miner/worker.go:1967)

The totalGasPool.SubGas() call returns an error that is being silently ignored. Per CLAUDE.md: "Handle Errors: Never ignore errors with _". While this doesn't use _, the returned error is still being discarded. If gas accounting is incorrect, SubGas could return an error indicating a problem.

Location:

bor/miner/worker.go

Lines 1966 to 1968 in e04854e

if result != nil {
totalGasPool.SubGas(result.TotalGasUsed)
for _, txHash := range result.SuccessfulTxs {


2. Potential goroutine leak when prepareWork() fails (miner/worker.go:1798)

When prepareWork() fails and returns early on line 1798, interruptPrefetch.Store(true) on line 1803 is never executed. The prefetch goroutine started at line 1777 will continue running until it naturally exhausts its gas pool. If the transaction pool is empty, no gas is consumed and the goroutine runs indefinitely.

Location:

bor/miner/worker.go

Lines 1796 to 1803 in e04854e

work, err := w.prepareWork(genParams, w.makeWitness)
if err != nil {
return
}
// Starts accounting time after prepareWork, since it includes the wait we have on Prepare phase of Bor
start := time.Now()
interruptPrefetch.Store(true)

Suggested fix: Set interruptPrefetch.Store(true) before the early return:

if err != nil {
    interruptPrefetch.Store(true)
    return
}

Checked for bugs and CLAUDE.md compliance.

@sonarqubecloud
Copy link

Quality Gate Failed Quality Gate failed

Failed conditions
5.3% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

@claude
Copy link

claude bot commented Feb 11, 2026

Code Review

No issues found. Checked for bugs and CLAUDE.md compliance.


Review details:

  • Reviewed all 29 modified files
  • Verified CLAUDE.md compliance (error handling, resource management, race conditions)
  • Checked for nil pointer dereferences, logic errors, and security issues

Notes on items verified as false positives:

  • ltx, _ := txs.Peek() in miner/worker.go: Not an ignored error - Peek() returns (*LazyTransaction, *uint256.Int), the second value is a fee/tip, not an error
  • Prefetch reader nil access: Already guarded by explicit nil check on line 2039 in commit()

@lucca30 lucca30 merged commit 6f40e51 into develop Feb 11, 2026
17 of 19 checks passed
kamuikatsurgi pushed a commit that referenced this pull request Feb 13, 2026
* enabling pip-66 back and wait moved from seal to prepare

* bring subsecond extra time

* prefetch from pool

* more prefetch metrics

* address early announcement for rio blocks

* separate worker cache metrics

* optinal flag to disable prefetch

* prefetch gas limit flag

* tests and lint

* small fixes

* remove logs

* address duplicates

* small fix on integration test

* prefetch coverage histogram and fmt

* verify headers coverage and bor test resiliency

* address duplicates

* duplicates and lint

* addressing comments

* small fix

* address comments

* fix push tx for rpc nodes

* rename meter and intermediateroot prefetch

* address lint

* address succession number check

* fixing concurrency issues and tests for prefetch state being thrown away by gc

* make lint

* e2e worker tests

* benchmark tests

* worker tests fixed and interrupt watch while waiting

* remove parallel from tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants