Skip to content

feat(levm): parallel block execution via BAL #6233

Open
edg-l wants to merge 41 commits intomainfrom
bal-parallel-exec
Open

feat(levm): parallel block execution via BAL #6233
edg-l wants to merge 41 commits intomainfrom
bal-parallel-exec

Conversation

@edg-l
Copy link
Contributor

@edg-l edg-l commented Feb 20, 2026

Closes #6209

Summary

Implements BAL-based parallel transaction execution for Amsterdam+ blocks using EIP-7928 Block Access Lists.

Approach: BAL State Seeding

Each transaction runs independently on its own database, pre-seeded with BAL-derived intermediate state (same approach as geth). No conflict detection or grouping needed — the BAL provides the complete state dependency graph.

Pipeline (3 concurrent threads):

  1. Warmer — prefetches all account states, storage slots, and contract codes listed in the BAL (parallel batch fetch)
  2. Executor — runs all txs in parallel via rayon, each with its own GeneralizedDatabase seeded from BAL intermediate values
  3. Merkleizer — computes state trie from bal_to_account_updates (BAL → AccountUpdates, no execution needed)

Key functions (crates/vm/backends/levm/mod.rs):

  • execute_block_parallel — orchestrates the parallel path: sends BAL-derived AccountUpdates to merkleizer, then executes all txs in parallel
  • bal_to_account_updates(bal, store) — converts BAL final values into Vec<AccountUpdate> for the merkleizer (last entry per field = post-block state)
  • seed_db_from_bal(db, bal, max_idx) — pre-seeds a per-tx DB with cumulative BAL state through index max_idx (system calls + previous txs)
  • warm_block_from_bal(bal, store) — 3-phase prefetch: accounts → storage slots → contract codes

BAL indexing: 0 = system calls, 1 = tx 0, 2 = tx 1, ..., N+1 = withdrawals. For tx at index i, seed_db_from_bal applies all changes with block_access_index <= i.

The parallel path is only triggered when header_bal is Some (Amsterdam+ blocks via engine_newPayloadV4). All other callers pass None and use the existing sequential loop unchanged.

EF Test Verification (Two-Pass Parallel Check)

Amsterdam EF tests now exercise the parallel execution path as a correctness check. After the normal sequential run succeeds, a second two-pass run is performed:

  1. Pass 1 (sequential): Re-executes all blocks on a fresh blockchain via add_block_pipeline_returning_bal(block, None), collecting the produced BAL for each block.
  2. Pass 2 (parallel): Re-executes all blocks on another fresh blockchain via add_block_pipeline(block, Some(&bal)), using the BAL from pass 1 to drive the parallel execution path.
  3. Post-state verification: The final state from pass 2 is checked against the expected post-state from the test fixture.

This ensures that the parallel execution path produces identical results to the sequential path across the entire Amsterdam EF test suite. Non-Amsterdam tests are unaffected. All Amsterdam EF tests pass both sequential and parallel execution.

Test plan

  • 6 unit tests for bal_to_account_updates — all pass (cargo test -p ethrex-vm bal_tests)
  • EF blockchain tests — sequential path (existing CI)
  • EF blockchain tests — parallel path via two-pass BAL verification (Amsterdam tests)
  • Hive regression on devnets/bal/2 fixture suite with parallel path enabled
  • Kurtosis devnet benchmark (see results below)

Notes

  • No new crate dependencies (rayon and FxHashMap already used)
  • Per-tx BAL validation: After each transaction executes in parallel, its actual state mutations are validated against the BAL claims. GeneralizedDatabase::compute_tx_diff() computes the diff between initial and final account state for the tx, then BlockAccessList::validate_tx_diff(tx_idx, &diff) checks that balances, nonces, storage values, and code changes match exactly what the BAL declares for that tx index. Blocks with mismatched state mutations are rejected. This matches geth's per-tx validation approach.
  • BAL recording is disabled in the parallel path — the header-embedded BAL is trusted (validated by validate_block_access_list_hash) and used directly for both state root computation and per-tx state seeding. This mirrors geth's approach: BALStateTransition.IntermediateRoot() computes the state root entirely from BAL diffs (via readAccountDiff + ModifiedAccounts) without re-executing transactions, and BALReader.getStateObject / initObjFromDiff seeds per-tx state from BAL intermediate values. No BAL re-recording happens during validation in either implementation.
  • Per-tx DBs skip initial_accounts_state tracking (state transitions come from BAL, not from diffing)

Benchmark Results

Benchmarked on a Kurtosis devnet (bal-devnet-2-ethrex.yaml) comparing parallel execution (via BAL) vs sequential (main), both running on identical hardware. Nodes are in consensus (matching block hashes).

Spamoor Setup (mainnet-like workload, ~700 tx/s target)

Scenario tx/s Share Mainnet Analog
erc20tx 280 40% USDT/USDC token transfers — hot storage (balances, allowances)
uniswap-swaps 175 25% DEX swaps — complex contract calls, hot pool state
eoatx 105 15% Simple ETH transfers
storagespam 70 10% Storage-heavy contract interactions
erc721tx 35 5% NFT mints/transfers
blobs 35 5% L2 sequencer-like blob transactions

Blocks average ~231 txs and ~60 Mgas each.

Results (43 non-empty blocks sampled)

Parallel (bal-parallel-exec) Sequential (main)
Avg Ggas/s 2.00 1.20
Avg exec time 31 ms 51 ms
Peak Ggas/s 2.61 1.40

~67% faster with BAL parallel execution on realistic mainnet-like workloads.

@github-actions github-actions bot added the levm Lambda EVM implementation label Feb 20, 2026
@github-actions
Copy link

github-actions bot commented Feb 20, 2026

Lines of code report

Total lines added: 876
Total lines removed: 0
Total lines changed: 876

Detailed view
+-------------------------------------------------+-------+------+
| File                                            | Lines | Diff |
+-------------------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs          | 2188  | +24  |
+-------------------------------------------------+-------+------+
| ethrex/crates/common/types/block_access_list.rs | 889   | +82  |
+-------------------------------------------------+-------+------+
| ethrex/crates/vm/backends/levm/mod.rs           | 1503  | +674 |
+-------------------------------------------------+-------+------+
| ethrex/crates/vm/backends/mod.rs                | 195   | +8   |
+-------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/db/gen_db.rs          | 545   | +42  |
+-------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/db/mod.rs             | 148   | +46  |
+-------------------------------------------------+-------+------+

@edg-l edg-l changed the base branch from main to bal-optimizations February 20, 2026 11:57
@edg-l edg-l moved this to In Progress in ethrex_l1 Feb 20, 2026
@lambdaclass lambdaclass deleted a comment from github-actions bot Feb 20, 2026
@lambdaclass lambdaclass deleted a comment from github-actions bot Feb 20, 2026
@lambdaclass lambdaclass deleted a comment from github-actions bot Feb 20, 2026
@github-actions
Copy link

github-actions bot commented Feb 20, 2026

Benchmark Results Comparison

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 2.959 ± 0.019 2.938 2.991 1.08 ± 0.01
main_levm_BubbleSort 2.729 ± 0.020 2.708 2.769 1.00
pr_revm_BubbleSort 3.081 ± 0.025 3.053 3.138 1.13 ± 0.01
pr_levm_BubbleSort 2.759 ± 0.039 2.703 2.849 1.01 ± 0.02

Benchmark Results: ERC20Approval

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Approval 999.6 ± 10.4 988.0 1018.7 1.00
main_levm_ERC20Approval 1043.0 ± 26.8 1026.5 1108.2 1.04 ± 0.03
pr_revm_ERC20Approval 1016.1 ± 10.0 999.9 1029.6 1.02 ± 0.01
pr_levm_ERC20Approval 1047.9 ± 10.6 1032.7 1067.7 1.05 ± 0.02

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 132.0 ± 1.3 130.5 134.0 1.00
main_levm_ERC20Mint 160.2 ± 5.3 156.5 174.4 1.21 ± 0.04
pr_revm_ERC20Mint 134.5 ± 1.4 132.6 136.3 1.02 ± 0.01
pr_levm_ERC20Mint 161.0 ± 2.7 158.8 168.2 1.22 ± 0.02

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 235.5 ± 2.5 232.7 241.2 1.00
main_levm_ERC20Transfer 270.8 ± 3.1 266.8 275.7 1.15 ± 0.02
pr_revm_ERC20Transfer 241.0 ± 4.8 236.7 252.3 1.02 ± 0.02
pr_levm_ERC20Transfer 271.9 ± 1.9 268.4 275.2 1.15 ± 0.01

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 229.8 ± 1.3 228.4 233.0 1.00
main_levm_Factorial 249.5 ± 5.4 245.2 261.6 1.09 ± 0.02
pr_revm_Factorial 230.6 ± 1.5 229.3 234.6 1.00 ± 0.01
pr_levm_Factorial 250.6 ± 3.6 246.3 256.9 1.09 ± 0.02

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.732 ± 0.034 1.651 1.768 1.00
main_levm_FactorialRecursive 9.609 ± 0.026 9.564 9.651 5.55 ± 0.11
pr_revm_FactorialRecursive 1.752 ± 0.039 1.706 1.814 1.01 ± 0.03
pr_levm_FactorialRecursive 9.631 ± 0.031 9.592 9.699 5.56 ± 0.11

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 206.2 ± 8.1 201.9 229.1 1.00 ± 0.04
main_levm_Fibonacci 227.3 ± 8.9 220.1 243.1 1.11 ± 0.04
pr_revm_Fibonacci 205.7 ± 1.8 204.3 209.6 1.00
pr_levm_Fibonacci 230.9 ± 5.2 226.3 241.0 1.12 ± 0.03

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 904.2 ± 12.6 891.2 924.5 1.30 ± 0.03
main_levm_FibonacciRecursive 693.6 ± 11.2 676.4 711.2 1.00
pr_revm_FibonacciRecursive 919.1 ± 9.6 909.6 938.1 1.33 ± 0.03
pr_levm_FibonacciRecursive 700.0 ± 12.6 684.2 727.8 1.01 ± 0.02

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.6 ± 0.1 8.5 8.8 1.00
main_levm_ManyHashes 9.8 ± 0.1 9.7 9.9 1.13 ± 0.01
pr_revm_ManyHashes 8.7 ± 0.0 8.6 8.8 1.01 ± 0.01
pr_levm_ManyHashes 9.9 ± 0.1 9.8 10.1 1.14 ± 0.02

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 255.9 ± 5.5 252.6 269.9 1.16 ± 0.04
main_levm_MstoreBench 222.3 ± 3.8 217.5 229.0 1.01 ± 0.03
pr_revm_MstoreBench 254.4 ± 1.5 252.6 257.9 1.15 ± 0.03
pr_levm_MstoreBench 220.9 ± 4.9 216.9 232.8 1.00

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 289.4 ± 1.8 287.9 292.8 1.06 ± 0.03
main_levm_Push 273.2 ± 7.0 269.9 293.1 1.00
pr_revm_Push 290.9 ± 0.9 289.3 292.1 1.06 ± 0.03
pr_levm_Push 277.2 ± 14.3 269.3 317.0 1.01 ± 0.06

Benchmark Results: SstoreBench_no_opt

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_SstoreBench_no_opt 165.4 ± 2.8 163.0 171.0 1.53 ± 0.03
main_levm_SstoreBench_no_opt 107.9 ± 0.5 107.1 108.6 1.00 ± 0.01
pr_revm_SstoreBench_no_opt 163.3 ± 4.6 157.3 168.9 1.51 ± 0.04
pr_levm_SstoreBench_no_opt 107.9 ± 0.5 107.1 108.6 1.00

@edg-l edg-l force-pushed the bal-parallel-exec branch from d1597f5 to 5633b93 Compare February 23, 2026 09:12
Base automatically changed from bal-optimizations to main February 23, 2026 16:01
@edg-l edg-l force-pushed the bal-parallel-exec branch from a87b72e to d2f646e Compare February 24, 2026 07:50
@edg-l edg-l changed the title feat(levm): parallel block execution via BAL dependency graph feat(levm): parallel block execution via BAL Feb 24, 2026
@edg-l edg-l force-pushed the bal-parallel-exec branch from 2f8ab0b to 7a4cd2d Compare February 24, 2026 15:17
edg-l added 15 commits February 24, 2026 17:21
Implement Phase 1 parallel transaction execution using EIP-7928 Block
Access List (BAL) write sets to detect conflicts and assign txs to
parallel execution groups.

- Add `build_parallel_groups`: builds conflict groups from BAL write
  sets. Conflicting txs are serialized in the same group; independent
  txs get separate groups for parallel execution. Same-sender txs are
  chained into the same group to preserve nonce order. Coinbase is
  excluded from conflict detection.

- Add `execute_block_parallel`: executes groups via rayon, each with
  its own `GeneralizedDatabase` seeded from post-system-call state.
  Coinbase fees are accumulated as deltas and applied to the main db
  after merge. System call updates and merged tx updates are sent to
  the merkleizer in two batches.

- Thread `header_bal: Option<&BlockAccessList>` through
  `Evm::execute_block_pipeline` and `LEVM::execute_block_pipeline`.
  When `Some(bal)` is provided (Amsterdam fork, engine API path), the
  parallel path is taken; otherwise falls back to the existing
  sequential loop.

- Add 10 unit tests for `build_parallel_groups` covering: empty block,
  single tx, same-sender chains, conflicting/non-conflicting pairs,
  coinbase exclusion, transitive conflict graphs, and mixed scenarios.
When a group contains multiple transactions, get_state_transitions_tx
promotes the coinbase balance to initial_accounts_state after each tx.
This means subsequent per-tx coinbase AccountUpdates show an accumulated
absolute balance, not an incremental delta.

Subtracting coinbase_initial_balance from each per-tx update was
double-counting fees from earlier txs in the same group, producing a
wrong state root.

Fix: read the final coinbase balance from initial_accounts_state once
per group (after all txs have been drained), and compute a single delta
per group instead of summing per-tx deltas.
…equential

Write-only conflict detection misses read-after-write (RAW) hazards:
if tx_j reads account X without writing it, and tx_i (i < j) writes X,
they end up in separate parallel groups — tx_j reads the pre-block value
instead of tx_i's write, producing a wrong state root.

Fix:
- Add `reads: Option<FxHashSet<Address>>` to GeneralizedDatabase, populated
  in load_account on first access (initial_accounts_state or store). Only
  enabled in parallel group dbs (None in all other paths, no overhead).
- Add execute_txs_sequential helper that runs all txs in order on the main
  db and returns receipts + merged AccountUpdates.
- After parallel execution, check each group's read set against all other
  groups' write sets. If any intersection is found (RAW conflict), discard
  the parallel results and re-run sequentially on the main db, which is
  already in the correct post-system-call state.

This is conservative (falls back on any read-write overlap regardless of
tx ordering) but guarantees correctness. False positives only mean
unnecessary sequential fallback — never a wrong state root.
Replace the address-level W-W greedy grouping + post-hoc sequential fallback
with a correct upfront conflict graph using Union-Find.

New algorithm in `build_parallel_groups`:
- Resource-level (slot-level) write sets from BAL: Balance, Nonce, Code,
  Storage(addr, slot)
- Per-tx read sets approximated from static metadata: sender balance/nonce,
  call target code/balance, EIP-2930 access list entries
- Union-Find for transitive grouping handles same-sender, W-W, and RAW conflicts
- RAW: if tx_j reads resource R and any earlier tx_i writes R, union(i,j)
- WAR (reader before writer) is safe and not serialized
- Coinbase excluded from all conflict detection

Remove the rejected sequential fallback (`execute_txs_sequential`) and the
post-hoc RAW check that re-ran the entire block sequentially on conflict.

Also remove the `reads: Option<FxHashSet<Address>>` field from
`GeneralizedDatabase` (was used only by the removed fallback).
Two correctness fixes from code review:

1. Coinbase delta: replace saturating_sub with explicit signed accounting
   using separate credit/debit U256 accumulators. Previously, if coinbase
   was a tx sender spending more ETH than it received in fees (rare but
   valid), saturating_sub(0) silently discarded the negative delta.

2. EIP-7702 authorization list: add Resource::Code(auth.address) to the
   read set for EIP-7702 txs. The delegate target's code is loaded at
   call time via the delegation pointer, so if an earlier tx deploys code
   to that address, this RAW hazard must be detected upfront.
   The authority address itself cannot be added (requires ecrecover at
   runtime); W-W detection via the BAL handles the authority code-write case.
The static read set (sender/to/access_list) misses cases where a called
contract internally reads a storage slot written by an earlier tx, without
declaring it in the EIP-2930 access list.

Fix: after building per-tx write sets from the BAL, build a map of
address → all written storage slots. When approximating read sets in
Phase 2, for any address that tx_j directly accesses (to or access_list),
add all of that address's written storage slots to tx_j's read set.

This catches the common RAW pattern where tx_j calls a contract whose
storage was modified by tx_i (direct-call case). Multi-hop internal
calls through addresses not in tx metadata remain an inherent limitation.
Any CALL transaction may transitively read ANY written storage slot in the
block through sub-calls to other contracts. Since we cannot determine the
full call graph statically from BAL metadata, we conservatively add ALL
block-level written storage slots to the read set of every call transaction.

This supersedes the previous per-address approach (adding only written slots
of the direct `to` address), which missed multi-hop patterns:
  tx_i writes Storage(A, s) → tx_j calls B → B calls A → A reads slot s

With this fix, tx_j.reads includes Storage(A, s), triggering the RAW
union with tx_i and ensuring sequential execution within the group.

The conservative approach reduces parallelism: all call txs touching written
storage are grouped together. ETH transfers and CREATE txs unaffected by
storage writes can still parallelize. Correctness takes priority here;
Block-STM or call-graph analysis could recover parallelism later.
… txs

The previous test suite only used balance writes and CREATE transactions,
so the new conservative multi-hop RAW detection was completely untested.

New tests cover:
- CALL tx to the same address as a storage writer → same group (W-W + CALL)
- CALL tx to a different address than the storage writer → same group (multi-hop RAW)
- Three txs with two storage writers and one unrelated CALL → all one group
- CREATE txs with disjoint storage writes → still parallel (no CALL branch triggered)
- WAR ordering (reader before writer) → no spurious serialization
… sequential fallback

The BAL (EIP-7928) only records writes, not reads. Read sets for parallel
grouping must be approximated statically, which previously only included
written storage slots for CALL txs. This missed RAW conflicts when a
contract reads an account balance (BALANCE opcode) or code
(EXTCODESIZE/DELEGATECALL) modified by an earlier tx not in the same group.

- Extend conservative read set to include written Code and non-sender
  Balance resources (sender balances excluded to avoid mass serialization
  since every tx writes Balance(sender) via gas fees)
- Add sequential execution fallback in add_block_pipeline: if parallel
  produces a gas/receipts/state mismatch, retry with a fresh VM without
  BAL to guarantee correctness for any remaining edge cases
Instead of deep-cloning the post-system-call CacheDB into each parallel
group, wrap it in Arc and add a shared_base field to GeneralizedDatabase.
Accounts are lazily cloned into initial_accounts_state on first access,
making get_state_transitions_tx transparent to the change.
- Binary search (partition_point) in seed_db_from_bal instead of reverse linear scan
- Batch prefetch_accounts/prefetch_storage on CachingDatabase with parallel inner fetch + single write-lock
- mem::take for system_seed to avoid cloning initial_accounts_state
- Cache chain_config in CachingDatabase via OnceLock
- Add rayon to ethrex-levm for parallel batch prefetch
- Add bal-devnet-2-light and bal-devnet-2-ethrex kurtosis fixtures
- Update ethereum-package revision
Skip initial_accounts_state cloning in parallel per-tx DBs (never
diffed), consolidate HashMap lookups in seed_db_from_bal, batch
prefetch in bal_to_account_updates, eliminate intermediate Vec
allocations, and streamline warm_block_from_bal code prefetch.
@edg-l edg-l force-pushed the bal-parallel-exec branch from 7a4cd2d to d4808c2 Compare February 24, 2026 16:21
edg-l added 14 commits February 24, 2026 17:23
Validate each parallel tx's execution results against the header BAL
claims, rejecting blocks with mismatched state mutations. Matches
geth's validation approach.
Amsterdam EF tests now exercise the parallel execution path as a
correctness check. After the normal sequential run succeeds, a second
two-pass run is performed: pass 1 re-executes sequentially to collect
the produced BAL, pass 2 re-executes on a fresh blockchain using that
BAL to drive the parallel code path, then verifies the post-state
matches.

Also threads the produced BAL through BlockExecutionPipelineResult and
adds add_block_pipeline_returning_bal to Blockchain.
- Extract add_block_pipeline_inner to deduplicate add_block_pipeline and add_block_pipeline_bal
- Use binary search (partition_point) for storage slot lookup in validate_tx_execution
- Rename _db to db in execute_block_parallel
- Add clarifying comments for stack_pool capacity, any_storage heuristic, and has_storage safety
- Remove blanket #![allow(dead_code)] from block_access_list.rs
- Downgrade [PARALLEL] log from info! to debug! to avoid flooding production logs
- Add doc comment explaining nested Result semantics in add_block_pipeline_inner
@edg-l edg-l marked this pull request as ready for review February 25, 2026 14:14
@github-actions
Copy link

🤖 Kimi Code Review

⚠️ Review failed: Kimi API request failed with status 429


Automated review by Kimi (Moonshot AI)

@greptile-apps
Copy link

greptile-apps bot commented Feb 25, 2026

Greptile Summary

This PR implements BAL-based parallel transaction execution for Amsterdam+ blocks, achieving a 67% performance improvement (2.0 Ggas/s vs 1.2 Ggas/s) on realistic workloads.

Key Changes

  • Parallel execution pipeline (execute_block_parallel): Each tx runs independently on its own database seeded with BAL-derived intermediate state, matching geth's approach
  • Three-phase BAL warming (warm_block_from_bal): Prefetches accounts → storage slots → contract codes in parallel batches
  • Per-tx validation (validate_tx_execution): Post-execution state is verified against BAL claims using a pre-built index for O(1) lookups
  • State root computation (bal_to_account_updates): Merkleizer receives state directly from BAL without re-executing transactions
  • Two-pass EF test verification: Amsterdam tests run both sequential and parallel paths to ensure identical results

Implementation Quality

The implementation is well-architected with:

  • Clean separation between sequential and parallel paths (no BAL = sequential, Some(BAL) = parallel)
  • Proper error handling with non-fatal warming failures logged but not blocking
  • Efficient data structures (FxHashMap, pre-sized allocations, binary search on sorted BAL entries)
  • Comprehensive unit tests (6 tests for bal_to_account_updates)
  • Production validation via Hive regression and Kurtosis devnet benchmarks

Architecture Highlights

The parallel path uses embarrassingly parallel execution via rayon with no conflict detection needed - the BAL provides complete state dependencies. Each transaction gets its own GeneralizedDatabase instance with:

  1. Shared read-only base state (post-system-call snapshot)
  2. BAL-seeded intermediate values for previous txs
  3. Skip of initial_accounts_state tracking (not needed since state comes from BAL)

All Amsterdam EF tests pass with both sequential and parallel execution producing identical post-state.

Confidence Score: 4/5

  • Safe to merge with high confidence - comprehensive testing and production validation demonstrate correctness
  • Score of 4 reflects solid implementation with extensive testing (EF tests, Hive, Kurtosis benchmarks) and clean architecture following geth's proven approach. Minor score reduction due to complexity of parallel execution and BAL validation logic, though validation is thorough. The two-pass test verification for Amsterdam blocks provides strong correctness guarantees.
  • No files require special attention - implementation is well-structured with appropriate error handling and validation throughout

Important Files Changed

Filename Overview
crates/vm/backends/levm/mod.rs Implements BAL-based parallel execution pipeline with state seeding, validation, and warming. Core logic appears sound with proper tx-level validation.
crates/common/types/block_access_list.rs Adds validation index and binary search helpers for efficient BAL lookups. Clean implementation with no issues found.
crates/vm/levm/src/db/gen_db.rs Extends GeneralizedDatabase with shared base state support and skip_initial_tracking flag for parallel execution. Well-designed for the parallel use case.
crates/blockchain/blockchain.rs Adds BAL parameter threading and warming failure logging. Clean integration with existing pipeline.
tooling/ef_tests/blockchain/test_runner.rs Implements two-pass parallel testing for Amsterdam blocks. Excellent correctness verification approach.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Block arrives with BAL] --> B{Amsterdam+ block?}
    B -->|No| C[Sequential execution path]
    B -->|Yes| D[Parallel execution path]
    
    D --> E[Thread 1: Warmer<br/>warm_block_from_bal]
    D --> F[Thread 2: Executor<br/>execute_block_parallel]
    D --> G[Thread 3: Merkleizer]
    
    E --> E1[Phase 1: Prefetch accounts]
    E1 --> E2[Phase 2: Prefetch storage slots]
    E2 --> E3[Phase 3: Prefetch contract codes]
    
    F --> F1[bal_to_account_updates<br/>Convert BAL to state updates]
    F1 --> F2[Send updates to merkleizer]
    F2 --> F3[Execute txs in parallel via rayon]
    
    F3 --> F4[For each tx:<br/>1. seed_db_from_bal<br/>2. execute_tx_in_block<br/>3. validate_tx_execution]
    
    F4 --> F5{Validation passed?}
    F5 -->|No| H[Reject block]
    F5 -->|Yes| I[Build receipts]
    
    G --> G1[Compute state root from updates]
    
    I --> J[Process withdrawals]
    J --> K[Extract requests]
    K --> L[Return BlockExecutionResult]
Loading

Last reviewed commit: 81de430

@github-actions
Copy link

🤖 Codex Code Review

Findings

  1. Consensus validation gap: BAL hash not checked when header BAL is provided
    In execute_block_pipeline, BAL hash validation only runs when a produced_bal exists, but the parallel path returns None, so the header-provided BAL is never validated against the header hash. This allows a peer-supplied BAL to drive execution/merkleization without proving it matches block_access_list_hash.
    File: crates/blockchain/blockchain.rs:443-461
    Suggestion: validate header_bal (when present) against the header hash in the execution thread, or validate before spawning execution.

  2. Fork-guard missing: BAL parallel path can run pre-Amsterdam
    The parallel BAL path is selected purely by header_bal.is_some(); it does not check is_amsterdam. If a caller mistakenly supplies a BAL for a pre-Amsterdam block, execution and merkleization will be driven by BAL data, which is a consensus violation.
    File: crates/vm/backends/levm/mod.rs:191-248
    Suggestion: gate the BAL path with is_amsterdam, and either ignore header_bal pre-fork or return an error.

  3. Potential spec gap: accounts missing from BAL are silently skipped
    In BAL validation, any modified account not present in the BAL is skipped, with the comment that the state root will catch discrepancies. This is true only if final state differs; if a tx touches an account but leaves it unchanged, the BAL might still be invalid under EIP-7928 requirements, yet would pass validation.
    File: crates/vm/backends/levm/mod.rs:861-872
    Suggestion: consider stricter handling for “modified but not in BAL” (e.g., verify against store/shared_base or treat as error if account.status indicates mutation beyond a read).

Performance note

  • seed_db_from_bal iterates all BAL accounts for every tx (O(N_accounts * N_txs)). This is likely a hotspot for large blocks.
    File: crates/vm/backends/levm/mod.rs:501-632
    Suggestion: consider precomputing per-account prefix values or per-tx slices to reduce repeated scans.

If you address Items 1–2, the rest looks structurally solid. The parallel BAL execution path is a thoughtful optimization and the validation index design is a good direction.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Now I have enough context to write a comprehensive review. Let me compile the findings.


Code Review: PR #6233 — feat(levm): parallel block execution via BAL

This is a well-structured and significant feature that implements EIP-7928 BAL-based parallel execution. The overall design mirrors geth's approach correctly. Below are findings ordered by severity.


Critical / High

1. rayon added unconditionally to ethrex-levm, breaking ZK guest environments

crates/vm/levm/Cargo.toml adds rayon as an unconditional dependency, and this is reflected in the Cargo.lock changes for risc0, sp1, openvm, and zisk guest programs. Rayon spawns OS threads, which are unavailable in ZK prover environments (they're typically no-std or single-threaded). This will either fail to compile or silently fall back incorrectly depending on each backend's threading model.

The parallel execution code should be feature-gated (e.g., behind a parallel or std feature), and the rayon dependency should be conditional on that feature. If these guest programs already have a mechanism for disabling threading (e.g., a zk feature), this should use that.

2. Silent validation gap in validate_tx_execution Part B when seeded_pos == 0

In crates/vm/backends/levm/mod.rs, Part B validates that execution didn't modify accounts beyond what BAL claims. However, when an account was loaded from the store/shared_base (not from a BAL entry), and execution modifies it, the check is explicitly skipped with a comment:

// If seeded_pos == 0, balance was never seeded (loaded from store/shared_base).
// We can't cheaply verify without store access. Skip.

This means a tx that incorrectly mutates an account whose initial state came from the store (e.g., an account with no prior BAL changes in the block) would pass per-tx validation. The state root would catch a true global discrepancy, but that check happens AFTER the BAL-derived account_updates have already been sent to the merkleizer — so by the time the root mismatch is detected, the block is validated against the wrong state.

Consider snapshotting the pre-seed account state into the per-tx DB so the comparison can be done without a store lookup, or validate the balance/nonce against the store value when seeded_pos == 0.


Medium

3. tx_idx as u16 truncation with no bounds check

In execute_block_parallel and execute_block_pipeline:

#[allow(clippy::cast_possible_truncation)]
Self::seed_db_from_bal(&mut tx_db, bal, tx_idx as u16)?;

This silently wraps for blocks with more than 65,535 transactions, producing an incorrect max_idx that would cause the wrong BAL entries to be applied. EIP-7928 should define a maximum, but a defensive assert!(tx_idx <= u16::MAX as usize) or an explicit error return would prevent subtle corruption if a malformed BAL is accepted.

4. Code::from_bytecode(seeded_code.clone()).hash in per-tx validation hot path

In validate_tx_execution Part B, validating whether execution changed an account's code recomputes the keccak256 hash of the entire bytecode:

let seeded_hash = if seeded_code.is_empty() {
    *EMPTY_KECCACK_HASH
} else {
    Code::from_bytecode(seeded_code.clone()).hash
};

This runs in the parallel validation loop for every modified account in every tx that doesn't have a code change at bal_idx. For large contracts (several KB of bytecode), this is a non-trivial allocation + hash per tx. The hash should be derivable from the BAL's CodeChange without recomputing it if Code caches the hash, or the comparison should be hash-only (comparing account.info.code_hash against the precomputed hash from the BAL entry, which can be computed once in build_validation_index).

5. has_storage: false for newly seeded accounts may silently skip storage trie reads

In seed_db_from_bal, when all account info fields are covered by the BAL (has_all_info == true), a new LevmAccount is inserted with has_storage: false:

let acc = db.current_accounts_state.entry(addr).or_insert_with(|| LevmAccount {
    info: AccountInfo::default(),
    storage: FxHashMap::default(),
    has_storage: false,    // <-- even if the account has on-chain storage
    status: AccountStatus::Modified,
});

The comment warns against reuse but doesn't address whether has_storage is consulted within the parallel execution itself. If any EVM opcode handler (e.g., SLOAD for a slot not yet seeded by BAL) checks has_storage to decide whether to fall back to the storage trie, it would incorrectly conclude "no storage" and return zero. This is distinct from the BAL-seeded slots (which are in acc.storage); it applies to slots accessed at runtime that weren't in the BAL's read or write list. If such a slot exists, the EVM would see 0 instead of the on-chain value.

6. add_block_pipeline_inner returns a nested Result

fn add_block_pipeline_inner(
    &self,
    block: Block,
    bal: Option<&BlockAccessList>,
) -> Result<(Option<BlockAccessList>, Result<(), ChainError>), ChainError>

The outer Ok can wrap an inner Err. This is unusual Rust — callers must double-unwrap, and it's easy to accidentally check only the outer Result. The motivation (returning the BAL even on storage failure) is understandable, but consider restructuring so the BAL is extracted before the storage step, or use a dedicated struct rather than nested Results.


Low / Nits

7. Duplicate has_writes computation in bal_to_account_updates

The has_writes predicate is evaluated twice per account: once as a .filter() over write_addrs, and again inside the for acct_changes in bal.accounts() loop. The filter's result (the set of accounts with writes) could be reused for the loop to avoid redundancy:

let write_addrs: Vec<Address> = bal.accounts().iter()
    .filter(|ac| { /* has_writes logic */ })
    .map(|ac| ac.address)
    .collect();
// ... prefetch ...
for acct_changes in bal.accounts().iter().filter(|ac| { /* duplicated */ }) { ... }

8. capacity / 4 for code map pre-allocation truncates to 0 for small inputs

In new_with_shared_base_and_capacity:

codes: FxHashMap::with_capacity_and_hasher(capacity / 4, Default::default()),

Integer division truncates: for capacity < 4, this allocates zero slots. Not a correctness issue, but a no-op pre-allocation.

9. Silent code prefetch errors in warm_block_from_bal Phase 3

code_hashes.par_iter().for_each(|&h| {
    let _ = store.get_account_code(h);
});

Errors are silently dropped. Other warming phases use map_err(|e| EvmError::...)?. Consider at least a debug! log on error here, consistent with the pattern established in blockchain.rs for warming failures.


Positive Observations

  • The BAL indexing semantics (0 = system calls, 1 = tx_0, ...) are consistently applied throughout seed_db_from_bal and validate_tx_execution, and are well-documented.
  • The build_validation_index() precomputation (one FxHashMap per block, shared read-only) is a good design choice — avoids per-tx linear scans of the BAL.
  • The two-pass EF test harness (run_two_pass_parallel) is a solid correctness check that exercises the parallel path against the sequential path's output without modifying existing test infrastructure.
  • prefetch_accounts/prefetch_storage defaulting to sequential fallback in the Database trait ensures backward compatibility for non-caching backends.
  • The 67% throughput improvement on realistic workloads is a strong result that justifies the complexity.

Automated review by Claude (Anthropic) · custom prompt

…l path

After merging main (which added the disable_balance_check param in #6259),
the BAL parallel execution call site was missing the argument. Normal
execution uses false (balance checks enabled).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

levm Lambda EVM implementation

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

[BAL] Parallel block execution

1 participant