Conversation
Lines of code reportTotal lines added: Detailed view |
Benchmark Results ComparisonNo significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
Benchmark Results: SstoreBench_no_opt
|
d1597f5 to
5633b93
Compare
a87b72e to
d2f646e
Compare
2f8ab0b to
7a4cd2d
Compare
Implement Phase 1 parallel transaction execution using EIP-7928 Block Access List (BAL) write sets to detect conflicts and assign txs to parallel execution groups. - Add `build_parallel_groups`: builds conflict groups from BAL write sets. Conflicting txs are serialized in the same group; independent txs get separate groups for parallel execution. Same-sender txs are chained into the same group to preserve nonce order. Coinbase is excluded from conflict detection. - Add `execute_block_parallel`: executes groups via rayon, each with its own `GeneralizedDatabase` seeded from post-system-call state. Coinbase fees are accumulated as deltas and applied to the main db after merge. System call updates and merged tx updates are sent to the merkleizer in two batches. - Thread `header_bal: Option<&BlockAccessList>` through `Evm::execute_block_pipeline` and `LEVM::execute_block_pipeline`. When `Some(bal)` is provided (Amsterdam fork, engine API path), the parallel path is taken; otherwise falls back to the existing sequential loop. - Add 10 unit tests for `build_parallel_groups` covering: empty block, single tx, same-sender chains, conflicting/non-conflicting pairs, coinbase exclusion, transitive conflict graphs, and mixed scenarios.
When a group contains multiple transactions, get_state_transitions_tx promotes the coinbase balance to initial_accounts_state after each tx. This means subsequent per-tx coinbase AccountUpdates show an accumulated absolute balance, not an incremental delta. Subtracting coinbase_initial_balance from each per-tx update was double-counting fees from earlier txs in the same group, producing a wrong state root. Fix: read the final coinbase balance from initial_accounts_state once per group (after all txs have been drained), and compute a single delta per group instead of summing per-tx deltas.
…equential Write-only conflict detection misses read-after-write (RAW) hazards: if tx_j reads account X without writing it, and tx_i (i < j) writes X, they end up in separate parallel groups — tx_j reads the pre-block value instead of tx_i's write, producing a wrong state root. Fix: - Add `reads: Option<FxHashSet<Address>>` to GeneralizedDatabase, populated in load_account on first access (initial_accounts_state or store). Only enabled in parallel group dbs (None in all other paths, no overhead). - Add execute_txs_sequential helper that runs all txs in order on the main db and returns receipts + merged AccountUpdates. - After parallel execution, check each group's read set against all other groups' write sets. If any intersection is found (RAW conflict), discard the parallel results and re-run sequentially on the main db, which is already in the correct post-system-call state. This is conservative (falls back on any read-write overlap regardless of tx ordering) but guarantees correctness. False positives only mean unnecessary sequential fallback — never a wrong state root.
Replace the address-level W-W greedy grouping + post-hoc sequential fallback with a correct upfront conflict graph using Union-Find. New algorithm in `build_parallel_groups`: - Resource-level (slot-level) write sets from BAL: Balance, Nonce, Code, Storage(addr, slot) - Per-tx read sets approximated from static metadata: sender balance/nonce, call target code/balance, EIP-2930 access list entries - Union-Find for transitive grouping handles same-sender, W-W, and RAW conflicts - RAW: if tx_j reads resource R and any earlier tx_i writes R, union(i,j) - WAR (reader before writer) is safe and not serialized - Coinbase excluded from all conflict detection Remove the rejected sequential fallback (`execute_txs_sequential`) and the post-hoc RAW check that re-ran the entire block sequentially on conflict. Also remove the `reads: Option<FxHashSet<Address>>` field from `GeneralizedDatabase` (was used only by the removed fallback).
Two correctness fixes from code review: 1. Coinbase delta: replace saturating_sub with explicit signed accounting using separate credit/debit U256 accumulators. Previously, if coinbase was a tx sender spending more ETH than it received in fees (rare but valid), saturating_sub(0) silently discarded the negative delta. 2. EIP-7702 authorization list: add Resource::Code(auth.address) to the read set for EIP-7702 txs. The delegate target's code is loaded at call time via the delegation pointer, so if an earlier tx deploys code to that address, this RAW hazard must be detected upfront. The authority address itself cannot be added (requires ecrecover at runtime); W-W detection via the BAL handles the authority code-write case.
The static read set (sender/to/access_list) misses cases where a called contract internally reads a storage slot written by an earlier tx, without declaring it in the EIP-2930 access list. Fix: after building per-tx write sets from the BAL, build a map of address → all written storage slots. When approximating read sets in Phase 2, for any address that tx_j directly accesses (to or access_list), add all of that address's written storage slots to tx_j's read set. This catches the common RAW pattern where tx_j calls a contract whose storage was modified by tx_i (direct-call case). Multi-hop internal calls through addresses not in tx metadata remain an inherent limitation.
Any CALL transaction may transitively read ANY written storage slot in the block through sub-calls to other contracts. Since we cannot determine the full call graph statically from BAL metadata, we conservatively add ALL block-level written storage slots to the read set of every call transaction. This supersedes the previous per-address approach (adding only written slots of the direct `to` address), which missed multi-hop patterns: tx_i writes Storage(A, s) → tx_j calls B → B calls A → A reads slot s With this fix, tx_j.reads includes Storage(A, s), triggering the RAW union with tx_i and ensuring sequential execution within the group. The conservative approach reduces parallelism: all call txs touching written storage are grouped together. ETH transfers and CREATE txs unaffected by storage writes can still parallelize. Correctness takes priority here; Block-STM or call-graph analysis could recover parallelism later.
… txs The previous test suite only used balance writes and CREATE transactions, so the new conservative multi-hop RAW detection was completely untested. New tests cover: - CALL tx to the same address as a storage writer → same group (W-W + CALL) - CALL tx to a different address than the storage writer → same group (multi-hop RAW) - Three txs with two storage writers and one unrelated CALL → all one group - CREATE txs with disjoint storage writes → still parallel (no CALL branch triggered) - WAR ordering (reader before writer) → no spurious serialization
… sequential fallback The BAL (EIP-7928) only records writes, not reads. Read sets for parallel grouping must be approximated statically, which previously only included written storage slots for CALL txs. This missed RAW conflicts when a contract reads an account balance (BALANCE opcode) or code (EXTCODESIZE/DELEGATECALL) modified by an earlier tx not in the same group. - Extend conservative read set to include written Code and non-sender Balance resources (sender balances excluded to avoid mass serialization since every tx writes Balance(sender) via gas fees) - Add sequential execution fallback in add_block_pipeline: if parallel produces a gas/receipts/state mismatch, retry with a fresh VM without BAL to guarantee correctness for any remaining edge cases
Instead of deep-cloning the post-system-call CacheDB into each parallel group, wrap it in Arc and add a shared_base field to GeneralizedDatabase. Accounts are lazily cloned into initial_accounts_state on first access, making get_state_transitions_tx transparent to the change.
- Binary search (partition_point) in seed_db_from_bal instead of reverse linear scan - Batch prefetch_accounts/prefetch_storage on CachingDatabase with parallel inner fetch + single write-lock - mem::take for system_seed to avoid cloning initial_accounts_state - Cache chain_config in CachingDatabase via OnceLock - Add rayon to ethrex-levm for parallel batch prefetch - Add bal-devnet-2-light and bal-devnet-2-ethrex kurtosis fixtures - Update ethereum-package revision
Skip initial_accounts_state cloning in parallel per-tx DBs (never diffed), consolidate HashMap lookups in seed_db_from_bal, batch prefetch in bal_to_account_updates, eliminate intermediate Vec allocations, and streamline warm_block_from_bal code prefetch.
7a4cd2d to
d4808c2
Compare
Validate each parallel tx's execution results against the header BAL claims, rejecting blocks with mismatched state mutations. Matches geth's validation approach.
Amsterdam EF tests now exercise the parallel execution path as a correctness check. After the normal sequential run succeeds, a second two-pass run is performed: pass 1 re-executes sequentially to collect the produced BAL, pass 2 re-executes on a fresh blockchain using that BAL to drive the parallel code path, then verifies the post-state matches. Also threads the produced BAL through BlockExecutionPipelineResult and adds add_block_pipeline_returning_bal to Blockchain.
- Extract add_block_pipeline_inner to deduplicate add_block_pipeline and add_block_pipeline_bal - Use binary search (partition_point) for storage slot lookup in validate_tx_execution - Rename _db to db in execute_block_parallel - Add clarifying comments for stack_pool capacity, any_storage heuristic, and has_storage safety
- Remove blanket #![allow(dead_code)] from block_access_list.rs - Downgrade [PARALLEL] log from info! to debug! to avoid flooding production logs - Add doc comment explaining nested Result semantics in add_block_pipeline_inner
🤖 Kimi Code ReviewAutomated review by Kimi (Moonshot AI) |
Greptile SummaryThis PR implements BAL-based parallel transaction execution for Amsterdam+ blocks, achieving a 67% performance improvement (2.0 Ggas/s vs 1.2 Ggas/s) on realistic workloads. Key Changes
Implementation QualityThe implementation is well-architected with:
Architecture HighlightsThe parallel path uses embarrassingly parallel execution via rayon with no conflict detection needed - the BAL provides complete state dependencies. Each transaction gets its own
All Amsterdam EF tests pass with both sequential and parallel execution producing identical post-state. Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| crates/vm/backends/levm/mod.rs | Implements BAL-based parallel execution pipeline with state seeding, validation, and warming. Core logic appears sound with proper tx-level validation. |
| crates/common/types/block_access_list.rs | Adds validation index and binary search helpers for efficient BAL lookups. Clean implementation with no issues found. |
| crates/vm/levm/src/db/gen_db.rs | Extends GeneralizedDatabase with shared base state support and skip_initial_tracking flag for parallel execution. Well-designed for the parallel use case. |
| crates/blockchain/blockchain.rs | Adds BAL parameter threading and warming failure logging. Clean integration with existing pipeline. |
| tooling/ef_tests/blockchain/test_runner.rs | Implements two-pass parallel testing for Amsterdam blocks. Excellent correctness verification approach. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Block arrives with BAL] --> B{Amsterdam+ block?}
B -->|No| C[Sequential execution path]
B -->|Yes| D[Parallel execution path]
D --> E[Thread 1: Warmer<br/>warm_block_from_bal]
D --> F[Thread 2: Executor<br/>execute_block_parallel]
D --> G[Thread 3: Merkleizer]
E --> E1[Phase 1: Prefetch accounts]
E1 --> E2[Phase 2: Prefetch storage slots]
E2 --> E3[Phase 3: Prefetch contract codes]
F --> F1[bal_to_account_updates<br/>Convert BAL to state updates]
F1 --> F2[Send updates to merkleizer]
F2 --> F3[Execute txs in parallel via rayon]
F3 --> F4[For each tx:<br/>1. seed_db_from_bal<br/>2. execute_tx_in_block<br/>3. validate_tx_execution]
F4 --> F5{Validation passed?}
F5 -->|No| H[Reject block]
F5 -->|Yes| I[Build receipts]
G --> G1[Compute state root from updates]
I --> J[Process withdrawals]
J --> K[Extract requests]
K --> L[Return BlockExecutionResult]
Last reviewed commit: 81de430
🤖 Codex Code ReviewFindings
Performance note
If you address Items 1–2, the rest looks structurally solid. The parallel BAL execution path is a thoughtful optimization and the validation index design is a good direction. Automated review by OpenAI Codex · custom prompt |
🤖 Claude Code ReviewNow I have enough context to write a comprehensive review. Let me compile the findings. Code Review: PR #6233 — feat(levm): parallel block execution via BALThis is a well-structured and significant feature that implements EIP-7928 BAL-based parallel execution. The overall design mirrors geth's approach correctly. Below are findings ordered by severity. Critical / High1.
The parallel execution code should be feature-gated (e.g., behind a 2. Silent validation gap in In // If seeded_pos == 0, balance was never seeded (loaded from store/shared_base).
// We can't cheaply verify without store access. Skip.This means a tx that incorrectly mutates an account whose initial state came from the store (e.g., an account with no prior BAL changes in the block) would pass per-tx validation. The state root would catch a true global discrepancy, but that check happens AFTER the BAL-derived Consider snapshotting the pre-seed account state into the per-tx DB so the comparison can be done without a store lookup, or validate the balance/nonce against the store value when Medium3. In #[allow(clippy::cast_possible_truncation)]
Self::seed_db_from_bal(&mut tx_db, bal, tx_idx as u16)?;This silently wraps for blocks with more than 65,535 transactions, producing an incorrect 4. In let seeded_hash = if seeded_code.is_empty() {
*EMPTY_KECCACK_HASH
} else {
Code::from_bytecode(seeded_code.clone()).hash
};This runs in the parallel validation loop for every modified account in every tx that doesn't have a code change at 5. In let acc = db.current_accounts_state.entry(addr).or_insert_with(|| LevmAccount {
info: AccountInfo::default(),
storage: FxHashMap::default(),
has_storage: false, // <-- even if the account has on-chain storage
status: AccountStatus::Modified,
});The comment warns against reuse but doesn't address whether 6. fn add_block_pipeline_inner(
&self,
block: Block,
bal: Option<&BlockAccessList>,
) -> Result<(Option<BlockAccessList>, Result<(), ChainError>), ChainError>The outer Low / Nits7. Duplicate The let write_addrs: Vec<Address> = bal.accounts().iter()
.filter(|ac| { /* has_writes logic */ })
.map(|ac| ac.address)
.collect();
// ... prefetch ...
for acct_changes in bal.accounts().iter().filter(|ac| { /* duplicated */ }) { ... }8. In codes: FxHashMap::with_capacity_and_hasher(capacity / 4, Default::default()),Integer division truncates: for 9. Silent code prefetch errors in code_hashes.par_iter().for_each(|&h| {
let _ = store.get_account_code(h);
});Errors are silently dropped. Other warming phases use Positive Observations
Automated review by Claude (Anthropic) · custom prompt |
…l path After merging main (which added the disable_balance_check param in #6259), the BAL parallel execution call site was missing the argument. Normal execution uses false (balance checks enabled).
Closes #6209
Summary
Implements BAL-based parallel transaction execution for Amsterdam+ blocks using EIP-7928 Block Access Lists.
Approach: BAL State Seeding
Each transaction runs independently on its own database, pre-seeded with BAL-derived intermediate state (same approach as geth). No conflict detection or grouping needed — the BAL provides the complete state dependency graph.
Pipeline (3 concurrent threads):
GeneralizedDatabaseseeded from BAL intermediate valuesbal_to_account_updates(BAL → AccountUpdates, no execution needed)Key functions (
crates/vm/backends/levm/mod.rs):execute_block_parallel— orchestrates the parallel path: sends BAL-derived AccountUpdates to merkleizer, then executes all txs in parallelbal_to_account_updates(bal, store)— converts BAL final values intoVec<AccountUpdate>for the merkleizer (last entry per field = post-block state)seed_db_from_bal(db, bal, max_idx)— pre-seeds a per-tx DB with cumulative BAL state through indexmax_idx(system calls + previous txs)warm_block_from_bal(bal, store)— 3-phase prefetch: accounts → storage slots → contract codesBAL indexing: 0 = system calls, 1 = tx 0, 2 = tx 1, ..., N+1 = withdrawals. For tx at index
i,seed_db_from_balapplies all changes withblock_access_index <= i.The parallel path is only triggered when
header_balisSome(Amsterdam+ blocks viaengine_newPayloadV4). All other callers passNoneand use the existing sequential loop unchanged.EF Test Verification (Two-Pass Parallel Check)
Amsterdam EF tests now exercise the parallel execution path as a correctness check. After the normal sequential run succeeds, a second two-pass run is performed:
add_block_pipeline_returning_bal(block, None), collecting the produced BAL for each block.add_block_pipeline(block, Some(&bal)), using the BAL from pass 1 to drive the parallel execution path.This ensures that the parallel execution path produces identical results to the sequential path across the entire Amsterdam EF test suite. Non-Amsterdam tests are unaffected. All Amsterdam EF tests pass both sequential and parallel execution.
Test plan
bal_to_account_updates— all pass (cargo test -p ethrex-vm bal_tests)devnets/bal/2fixture suite with parallel path enabledNotes
GeneralizedDatabase::compute_tx_diff()computes the diff between initial and final account state for the tx, thenBlockAccessList::validate_tx_diff(tx_idx, &diff)checks that balances, nonces, storage values, and code changes match exactly what the BAL declares for that tx index. Blocks with mismatched state mutations are rejected. This matches geth's per-tx validation approach.validate_block_access_list_hash) and used directly for both state root computation and per-tx state seeding. This mirrors geth's approach:BALStateTransition.IntermediateRoot()computes the state root entirely from BAL diffs (viareadAccountDiff+ModifiedAccounts) without re-executing transactions, andBALReader.getStateObject/initObjFromDiffseeds per-tx state from BAL intermediate values. No BAL re-recording happens during validation in either implementation.initial_accounts_statetracking (state transitions come from BAL, not from diffing)Benchmark Results
Benchmarked on a Kurtosis devnet (
bal-devnet-2-ethrex.yaml) comparing parallel execution (via BAL) vs sequential (main), both running on identical hardware. Nodes are in consensus (matching block hashes).Spamoor Setup (mainnet-like workload, ~700 tx/s target)
erc20txuniswap-swapseoatxstoragespamerc721txblobsBlocks average ~231 txs and ~60 Mgas each.
Results (43 non-empty blocks sampled)
bal-parallel-exec)main)~67% faster with BAL parallel execution on realistic mainnet-like workloads.