Skip to content

perf(levm): remove Result<OpcodeResult, VMError> to avoid expensive matches#4791

Merged
ilitteri merged 34 commits intomainfrom
refactor-opcode-handlers
Mar 4, 2026
Merged

perf(levm): remove Result<OpcodeResult, VMError> to avoid expensive matches#4791
ilitteri merged 34 commits intomainfrom
refactor-opcode-handlers

Conversation

@azteca1998
Copy link
Contributor

Motivation

Matching the result is slow on tight loops.

Description

Remove the result so that there is no need to match anything.

@github-actions github-actions bot added levm Lambda EVM implementation performance Block execution throughput and performance in general labels Oct 7, 2025
@github-actions
Copy link

github-actions bot commented Oct 7, 2025

Lines of code report

Total lines added: 254
Total lines removed: 179
Total lines changed: 433

Detailed view
+------------------------------------------------------------------------+-------+------+
| File                                                                   | Lines | Diff |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/blockchain/mempool.rs                                    | 390   | +2   |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/blockchain/payload.rs                                    | 719   | +3   |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/networking/rpc/eth/block.rs                              | 360   | +3   |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/call_frame.rs                                | 367   | +16  |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/arithmetic.rs                | 237   | +10  |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/bitwise_comparison.rs        | 208   | +7   |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/block.rs                     | 181   | +58  |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/dup.rs                       | 54    | +4   |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/environment.rs               | 350   | +44  |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/exchange.rs                  | 116   | +8   |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/keccak.rs                    | 30    | +5   |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/logging.rs                   | 43    | +6   |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/mod.rs                       | 30    | +19  |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/push.rs                      | 49    | +13  |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/stack_memory_storage_flow.rs | 305   | +43  |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcode_handlers/system.rs                    | 901   | -91  |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/opcodes.rs                                   | 565   | +13  |
+------------------------------------------------------------------------+-------+------+
| ethrex/crates/vm/levm/src/vm.rs                                        | 504   | -88  |
+------------------------------------------------------------------------+-------+------+

@github-actions
Copy link

github-actions bot commented Oct 8, 2025

Benchmark Results Comparison

No significant difference was registered for any benchmark run.

Detailed Results

Benchmark Results: BubbleSort

Command Mean [s] Min [s] Max [s] Relative
main_revm_BubbleSort 2.991 ± 0.016 2.969 3.014 1.00
main_levm_BubbleSort 3.131 ± 0.013 3.111 3.153 1.05 ± 0.01
pr_revm_BubbleSort 3.003 ± 0.019 2.984 3.024 1.00 ± 0.01
pr_levm_BubbleSort 3.140 ± 0.026 3.108 3.176 1.05 ± 0.01

Benchmark Results: ERC20Approval

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Approval 996.6 ± 11.9 978.6 1017.6 1.00 ± 0.01
main_levm_ERC20Approval 1194.0 ± 7.6 1183.7 1207.6 1.20 ± 0.01
pr_revm_ERC20Approval 992.3 ± 8.7 985.3 1014.2 1.00
pr_levm_ERC20Approval 1198.4 ± 11.4 1183.5 1219.8 1.21 ± 0.02

Benchmark Results: ERC20Mint

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Mint 135.1 ± 1.4 133.5 138.2 1.01 ± 0.01
main_levm_ERC20Mint 185.1 ± 0.7 184.5 186.4 1.38 ± 0.01
pr_revm_ERC20Mint 134.1 ± 1.1 132.5 136.2 1.00
pr_levm_ERC20Mint 187.8 ± 3.5 184.9 196.8 1.40 ± 0.03

Benchmark Results: ERC20Transfer

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ERC20Transfer 234.2 ± 1.6 231.6 236.5 1.00
main_levm_ERC20Transfer 312.4 ± 2.0 309.8 316.3 1.33 ± 0.01
pr_revm_ERC20Transfer 236.8 ± 2.4 232.7 241.5 1.01 ± 0.01
pr_levm_ERC20Transfer 312.0 ± 4.8 306.7 324.0 1.33 ± 0.02

Benchmark Results: Factorial

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Factorial 230.1 ± 1.8 228.7 234.1 1.00
main_levm_Factorial 383.2 ± 18.8 369.4 433.8 1.66 ± 0.08
pr_revm_Factorial 231.4 ± 2.1 229.0 236.4 1.01 ± 0.01
pr_levm_Factorial 376.4 ± 2.8 371.8 382.6 1.64 ± 0.02

Benchmark Results: FactorialRecursive

Command Mean [s] Min [s] Max [s] Relative
main_revm_FactorialRecursive 1.717 ± 0.051 1.611 1.790 1.00 ± 0.04
main_levm_FactorialRecursive 10.086 ± 0.065 10.025 10.244 5.89 ± 0.17
pr_revm_FactorialRecursive 1.711 ± 0.048 1.674 1.824 1.00
pr_levm_FactorialRecursive 10.065 ± 0.025 10.028 10.122 5.88 ± 0.16

Benchmark Results: Fibonacci

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Fibonacci 210.8 ± 6.9 208.0 230.5 1.00 ± 0.03
main_levm_Fibonacci 359.9 ± 6.0 353.3 374.9 1.71 ± 0.03
pr_revm_Fibonacci 210.4 ± 1.2 209.3 213.6 1.00
pr_levm_Fibonacci 364.3 ± 19.8 351.1 417.3 1.73 ± 0.09

Benchmark Results: FibonacciRecursive

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_FibonacciRecursive 883.3 ± 14.5 857.8 906.5 1.00
main_levm_FibonacciRecursive 976.2 ± 9.3 960.7 991.1 1.11 ± 0.02
pr_revm_FibonacciRecursive 892.0 ± 23.0 867.4 949.7 1.01 ± 0.03
pr_levm_FibonacciRecursive 973.6 ± 8.8 963.5 987.2 1.10 ± 0.02

Benchmark Results: ManyHashes

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_ManyHashes 8.5 ± 0.0 8.4 8.5 1.00 ± 0.01
main_levm_ManyHashes 11.4 ± 0.3 11.0 12.0 1.34 ± 0.04
pr_revm_ManyHashes 8.5 ± 0.1 8.4 8.7 1.00
pr_levm_ManyHashes 11.1 ± 0.1 11.0 11.1 1.31 ± 0.02

Benchmark Results: MstoreBench

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_MstoreBench 263.7 ± 5.7 259.0 274.0 1.01 ± 0.02
main_levm_MstoreBench 461.9 ± 21.0 442.5 500.1 1.78 ± 0.08
pr_revm_MstoreBench 259.8 ± 1.2 258.1 262.1 1.00
pr_levm_MstoreBench 447.9 ± 8.1 438.8 462.1 1.72 ± 0.03

Benchmark Results: Push

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_Push 298.4 ± 1.6 296.4 302.4 1.00
main_levm_Push 464.1 ± 8.3 447.5 475.0 1.56 ± 0.03
pr_revm_Push 299.3 ± 1.0 298.2 301.2 1.00 ± 0.01
pr_levm_Push 459.0 ± 14.9 437.5 475.7 1.54 ± 0.05

Benchmark Results: SstoreBench_no_opt

Command Mean [ms] Min [ms] Max [ms] Relative
main_revm_SstoreBench_no_opt 172.6 ± 6.2 164.4 185.5 1.44 ± 0.06
main_levm_SstoreBench_no_opt 119.5 ± 2.9 116.7 124.7 1.00
pr_revm_SstoreBench_no_opt 169.8 ± 3.6 163.9 176.4 1.42 ± 0.05
pr_levm_SstoreBench_no_opt 122.0 ± 7.1 116.6 138.8 1.02 ± 0.06

@github-actions
Copy link

github-actions bot commented Oct 8, 2025

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 65.522 ± 0.523 64.561 66.343 1.04 ± 0.01
head 62.800 ± 0.150 62.666 63.141 1.00

@azteca1998 azteca1998 marked this pull request as ready for review November 26, 2025 18:15
…atches

Refactor opcode handlers from impl VM methods to a trait-based pattern
(OpcodeHandler trait with struct-per-opcode), replacing method dispatch
with a const function pointer table for improved performance.

Resolve merge conflicts with main branch:
- Integrate EIP-7928 BAL recording into new handler structs
- Add Amsterdam fork opcode table (DUPN, SWAPN, EXCHANGE, SLOTNUM)
- Keep main's U256 types for chain_id and base_fee_per_gas
- Add get_code_length optimization for EXTCODESIZE
- Add access_storage_slot and record_storage_slot_to_bal helpers

Lint fixes:
- Replace all #[allow(clippy::...)] with #[expect(clippy::..., reason)]
- Convert .expect() to proper error propagation with map_err
- Fix type mismatches from U256 field changes
- Remove useless identity conversions
…handlers

In the opcode handler refactoring, eip7702_gas_consumed was charged via
increase_consumed_gas() before record_bal_call_touch(), reducing
gas_remaining before BAL recording could see the original value. This
caused BlockAccessListHashMismatch in EIP-7928 tests with EIP-7702
delegation + OOG scenarios.

The fix defers the eip7702 gas charge: compute gas_left for the call
gas calculation without modifying gas_remaining, then charge
gas_cost + eip7702_gas_consumed together after BAL recording. This
matches the main branch behavior where get_call_gas_params computed
gas_left internally without modifying gas_remaining.
@azteca1998 azteca1998 marked this pull request as ready for review February 27, 2026 12:20
@github-actions
Copy link

🤖 Kimi Code Review

Review Summary

This PR refactors LEVM's opcode dispatch mechanism to eliminate expensive match expressions, replacing them with a table-driven approach using the new OpcodeHandler trait. The change is performance-focused and touches many files.

Key Changes

  1. Opcode dispatch refactor: Replaced large match statements with a lookup table of handler structs implementing OpcodeHandler
  2. Type changes: Reduced U256 usage in favor of native u64 for block numbers, timestamps, and blob gas values where appropriate
  3. Memory safety: Added proper bounds checking and overflow handling throughout

Issues Found

1. Critical: Potential overflow in blob gas calculations (crates/common/types/block.rs:1075)

assert!(fake_exponential(57532635.into(), 3145728.into(), 3338477).is_ok());

The fake_exponential function can overflow with large inputs. While this is a test, the production usage in l1_committer.rs needs overflow protection.

2. Security: Unsafe transmute usage (crates/vm/levm/src/opcode_handlers/block.rs:67-75)

#[expect(unsafe_code, reason = "safe")]
vm.current_call_frame.stack.push(unsafe {
    let mut bytes = vm.db.store.get_block_hash(block_number)?.0;
    bytes.reverse();
    mem::transmute_copy::<[u8; 32], [u64; 4]>(&bytes)
})?;

This transmute assumes little-endian byte order which may not hold on all platforms. Consider using U256::from_big_endian() instead.

3. Performance: Unnecessary bounds checking (crates/vm/levm/src/opcode_handlers/arithmetic.rs:219-230)

The OpSignExtendHandler uses complex bit manipulation that could be simplified:

// Current complex implementation
if value.bit(8 * x + 7) {
    value |= U256::MAX << (8 * (x + 1));
} else if x != 31 {
    value &= (U256::one() << (8 * (x + 1))) - 1;
}

// Simpler approach
let mask = (U256::one() << (8 * (x + 1))) - U256::one();
value = if value.bit(8 * x + 7) {
    value | !mask
} else {
    value & mask
};

4. Correctness: Missing overflow check (crates/vm/levm/src/opcode_handlers/system.rs:1138-1144)

In get_call_gas_params, the subtraction gas_left - eip7702_gas_consumed should use checked_sub to prevent underflow.

5. Maintainability: Magic numbers (crates/vm/levm/src/opcode_handlers/system.rs:129-135)

The constants REMOVE_SLOT_COST, RESTORE_EMPTY_SLOT_COST, etc. should be extracted to constants.rs for better maintainability.

Positive Aspects

  • Performance: The table-driven dispatch eliminates branch misprediction from large match statements
  • Type safety: Moving from U256 to u64 for block numbers and timestamps reduces allocation overhead
  • Code organization: Handler structs provide better separation of concerns
  • Error handling: Consistent use of Result types throughout

Recommendations

  1. Add overflow checks for all arithmetic operations in gas calculations
  2. Replace unsafe transmutes with safe conversion functions
  3. Extract magic constants to named constants
  4. Consider adding micro-benchmarks to verify the performance improvement

Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

  1. High: U256 -> u64 narrowing can break blob-fee handling in runtime paths (liveness regression).
    The PR narrows blob base-fee filtering to u64, then errors when value does not fit. base_fee_per_blob_gas is computed as U256 by design and can grow beyond u64 under extreme blob congestion.

  2. High: estimate_blob_gas now fails on large-but-valid values due to new u64 conversion.
    Previously the computation stayed in U256; now it force-converts headroom-adjusted value to u64 and returns overflow error. That introduces avoidable failures in high-demand blob markets.

  3. Medium: RPC behavior regresses by narrowing blob fee fields to u64 (can return internal errors for valid values).
    RPC quantities are hex-encoded integers and should not be artificially capped to 64-bit.

  4. Low: test runners now contain unchecked unwrap() conversions that can panic on out-of-range EF fixtures.
    This is test-only, but it makes tooling less robust and masks proper error reporting.

The opcode-handler/table refactor itself looks structurally clean and aligned with the performance goal; main concerns are the numeric narrowing changes above.

I could not run tests in this environment because cargo attempted a rustup toolchain sync and failed with filesystem permissions.


Automated review by OpenAI Codex · custom prompt

@greptile-apps
Copy link

greptile-apps bot commented Feb 27, 2026

Greptile Summary

This PR implements a significant performance optimization for the LEVM by removing Result<OpcodeResult, VMError> from the hot execution loop. The key changes:

  • Opcode handlers now return OpcodeResult directly and store errors in a OnceCell<VMError>, deferring error checking until a halt occurs
  • Introduced OpcodeHandler trait with generic OpCodeFn::wrap to bridge between the new pattern and handler implementations
  • All opcode implementations refactored from VM methods to standalone handler structs implementing the trait
  • Semantic inversion of add_accessed_slot/add_accessed_address return values (now return cold status directly instead of warm status)
  • Type optimizations: block_number, timestamp, and blob gas fields changed from U256 to u64 where appropriate
  • Improved BLOCKHASH implementation with better error handling (returns zero for out-of-range values instead of throwing)

The refactoring is clean and systematic across all opcode categories. All call sites have been properly updated for the semantic changes.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • Well-structured performance optimization with systematic refactoring across all opcode handlers. The semantic change to add_accessed_slot/add_accessed_address has been correctly propagated to all call sites. Type optimizations are sound and appropriate. The error handling pattern using OnceCell is correct and maintains proper error propagation.
  • No files require special attention - the refactoring is consistent and thorough

Important Files Changed

Filename Overview
crates/vm/levm/src/vm.rs Core execution loop refactored to use OnceCell for error handling, avoiding Result matches in hot path; semantic changes to add_accessed_slot/address return values
crates/vm/levm/src/opcodes.rs OpCodeFn refactored to wrap OpcodeHandler trait, enabling error storage in OnceCell instead of Result returns
crates/vm/levm/src/opcode_handlers/mod.rs New OpcodeHandler trait introduced with eval method; provides common interface for all opcode implementations
crates/vm/levm/src/opcode_handlers/system.rs System operations (CALL, CREATE, etc.) converted to handler structs; maintains correct error propagation and gas accounting
crates/vm/levm/src/environment.rs Type optimizations: block_number and timestamp changed from U256 to u64; blob gas fields changed to Option<u64>
crates/vm/levm/src/opcode_handlers/stack_memory_storage_flow.rs Storage operations updated for inverted return value semantics of add_accessed_slot (now returns cold status)

Last reviewed commit: d5ee807

Copy link
Contributor

@ElFantasma ElFantasma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core architectural change — replacing Result<OpcodeResult, VMError> with OnceCell<VMError> + bare OpcodeResult to eliminate Result matching in the interpreter hot loop — is sound, and the trait-based OpcodeHandler pattern with OpCodeFn::new::<T>() const construction is a clean abstraction. The add_accessed_address/add_accessed_slot return semantics inversion (now returns true = cold) is applied consistently across the opcode handlers but creates an issue in one unmodified file (see below). See inline comments for tracing regressions in DELEGATECALL/CALLCODE.

crates/vm/levm/src/db/gen_db.rs:634access_storage_slot() still has let storage_slot_was_cold = !self.substate.add_accessed_slot(address, key); with the old ! negation. Since this PR inverts add_accessed_slot to return true = cold, the ! now gives storage_slot_was_cold = false when the slot IS cold. The method has no remaining callers after this PR (SLOAD/SSTORE call add_accessed_slot directly), so it's dead code — but it's pub and would silently give wrong warm/cold gas costs to any future caller. Either delete it or remove the !.

let data = vm.get_calldata(args_offset, args_len)?;
vm.tracer.enter(
CallType::DELEGATECALL,
vm.current_call_frame.msg_sender,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DELEGATECALL tracer parameters changed from old behavior:

  • Old: tracer.enter(DELEGATECALL, callframe.to /* current contract */, code_address /* delegate */, ...)
  • New: tracer.enter(DELEGATECALL, msg_sender, callframe.to, ...)

This changes from from the current contract to the original message sender, and to from the delegate's code address to the current contract. The old code had an explicit comment: "In this trace the from is the current contract, we don't want the from to be the EOA that sent the transaction". This will produce different debug_traceTransaction output for DELEGATECALL frames.

// Trace CALL operation.
let data = vm.get_calldata(args_offset, args_len)?;
vm.tracer.enter(
CallType::CALLCODE,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar tracer issue: CALLCODE to changed from code_address (the contract whose code is being executed) to callframe.to (the current contract itself). When EIP-7702 delegation is active, these differ — the tracer loses information about which contract's code is actually running.

opcode_table[Opcode::PUSH6 as usize] = OpCodeFn::new::<OpPushHandler<6>>();
opcode_table[Opcode::PUSH7 as usize] = OpCodeFn::new::<OpPushHandler<7>>();
opcode_table[Opcode::PUSH8 as usize] = OpCodeFn::new::<OpPushHandler<8>>();
opcode_table[Opcode::PUSH8 as usize] = OpCodeFn::new::<OpPushHandler<8>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: PUSH8 is assigned twice (lines 442-443). This is pre-existing from the old code but worth cleaning up while refactoring.

@github-project-automation github-project-automation bot moved this to In Progress in ethrex_l1 Feb 27, 2026
@github-project-automation github-project-automation bot moved this from In Progress to In Review in ethrex_l1 Mar 4, 2026
@ilitteri ilitteri added this pull request to the merge queue Mar 4, 2026
Merged via the queue into main with commit 71fbc0a Mar 4, 2026
66 checks passed
@ilitteri ilitteri deleted the refactor-opcode-handlers branch March 4, 2026 14:37
@github-project-automation github-project-automation bot moved this from Todo to Done in ethrex_performance Mar 4, 2026
@github-project-automation github-project-automation bot moved this from In Review to Done in ethrex_l1 Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

levm Lambda EVM implementation performance Block execution throughput and performance in general

Projects

Status: Done
Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants