feat(benchmark): add SLOAD/SSTORE benchmark test with multi-contract support (#2256)

CPerezz · web-flow · commit 54b46ea9c8c9 · 2025-10-23T22:36:19.000+08:00
* feat(benchmark): add SLOAD benchmark test with multi-contract support Add test_sload_empty_erc20_balanceof to benchmark SLOAD operations on non-existing storage slots using ERC20 balanceOf() queries. The idea of this benchmark is to exploit within a single or series of N contracts calls to non-existing addresses. On this way, we force clients to resolve as many tree branches as possible. * feat(benchmark): add SSTORE benchmark test using ERC20 approve Add test_sstore_erc20_approve that benchmarks SSTORE operations by calling approve(spender, amount) on pre-deployed ERC20 contracts. Follows the same pattern as the SLOAD benchmark: - Auto-discovers ERC20 contracts from stubs - Splits gas budget evenly across all discovered contracts - Uses counter as both spender address and amount - Forces SSTOREs to allowance mapping storage slots The test measures client performance when writing to many storage slots across multiple contracts, stressing state-handling write operations. * fix(benchmark): correct SSTORE benchmark gas calculation Fixed gas calculation for test_sstore_erc20_approve to ensure accurate gas usage prediction and prevent transaction reverts: Key fixes: - Added memory expansion cost (15 gas per contract) - Corrected G_LOW gas values in comments (5 gas, not 3) - Separated per-contract overhead from per-iteration costs - Improved cost calculation clarity with detailed opcode breakdown Gas calculation (10M gas, 3 contracts): - Intrinsic: 21,000 - Overhead per contract: 38 - Cost per iteration: 20,226 - Calls per contract: 164 - Expected gas used: 9,972,306 (99.72% utilization) * feat(benchmark): add mixed SLOAD/SSTORE benchmark with configurable ratios Add test_mixed_sload_sstore to test_multi_opcode.py that combines SLOAD and SSTORE operations with parameterized gas distribution ratios (50-50, 70-30, 90-10). The test stresses clients with mixed read/write workloads by: - Dividing gas budget evenly across all discovered ERC20 contract stubs - Splitting each contract's allocation by the specified percentage ratio - Executing balanceOf (cold SLOAD on empty slots) for the SLOAD portion - Executing approve (SSTORE to new allowance slots) for the SSTORE portion Verified gas calculations for 10M gas budget with 3 contracts (50-50 ratio): - SLOAD operations: ~2,312 gas/iteration → 719 calls per contract - SSTORE operations: ~20,226 gas/iteration → 82 calls per contract - Total operations: 2,403 state operations (2,157 SLOADs + 246 SSTOREs) - Gas usage: 9.98M / 10M (16K buffer, no out-of-gas errors) This benchmark enables testing different read/write ratios to identify client performance characteristics under varying state operation mixes. * refactor(benchmark): optimize SLOAD/SSTORE benchmarks per review feedback Address review comments by optimizing loop efficiency: 1. Move function selector MSTORE outside loops (Comment #2) - BALANCEOF_SELECTOR and APPROVE_SELECTOR now stored once per contract - Saves 3 gas (G_VERY_LOW) per iteration - Total savings: ~6,471 gas for 50-50 ratio with 10M budget and 3 contracts 2. Remove unused return data from CALL operations (Comment #1) - Changed ret_offset=96/128, ret_size=32 to ret_offset=0, ret_size=0 - Eliminates unnecessary memory expansion - Minor gas savings, cleaner implementation Skipped Comment #3 (use Op.GAS for addresses): - Would lose determinism (GAS varies per iteration) - Adds complexity for minimal benefit - Counter still needed for loop control Changes applied to: - test_sload_empty_erc20_balanceof - test_sstore_erc20_approve - test_mixed_sload_sstore (both SLOAD and SSTORE loops) * refactor(benchmark): simplify SLOAD benchmark memory layout and fix calldata encoding - Move selector MSTORE outside for-loop (saves gas per contract) - Use single counter at MEM[32] instead of duplicate at MEM[0] and MEM[64] - Fix calldata encoding by using args_offset=28 for correct ABI format - Selector now properly positioned at start of calldata * refactor(benchmark): simplify SSTORE benchmark memory layout and fix calldata encoding - Move selector MSTORE outside for-loop (saves gas per contract) - Use single counter at MEM[32] instead of duplicate at MEM[0] - Fix calldata encoding by using args_offset=28 for correct ABI format - Selector now properly positioned at start of calldata * refactor(benchmark): simplify mixed SLOAD/SSTORE memory layout and fix calldata encoding - Move selectors MSTORE outside for-loop (saves gas per contract) - Use separate memory regions for balanceOf and approve to avoid conflicts - Fix calldata encoding by using correct args_offset for proper ABI format - Selectors now properly positioned at start of calldata * refactor(benchmark): simplify mixed test to reuse memory layout consistently - Reuse MEM[0] for both selectors (sequential operations, no conflict) - Reuse MEM[32] for both counters (balanceOf then approve) - Reuse MEM[64] and MEM[96] for parameters - Consistent args_offset=28 for both operations (was 28 and 128) - Matches single-opcode test pattern for easier understanding - Reduces memory footprint from 196 bytes to 96 bytes * feat(benchmark): add parametrized contract count and stub filtering to single-opcode tests - Add parametrization for num_contracts [1, 5, 10, 20, 100] - Implement stub prefix filtering based on test function name - Add validation to error if insufficient matching stubs - Add SSTORE benchmark architecture documentation - Create README with setup instructions and stubs.json format * fix(benchmark): add type annotations to test functions * fix(benchmark): add AddressStubs type annotation to address_stubs parameter * feat(benchmark): add parametrized contract count, stub filtering, and correct gas calculations - Add num_contracts parametrization [1, 5, 10, 20, 100] to multi-opcode test - Implement stub prefix filtering for all benchmarks - Fix gas cost calculations to account for COLD/WARM account access - CALL operations: first call to each contract is COLD (2600), subsequent are WARM (100) - SSTORE operations: add cold storage access cost (2100) for zero-to-non-zero writes - Update gas calculation formulas to solve for calls per contract correctly * feat(benchmark): add parametrized contract count, stub filtering, and correct gas calculations - Add num_contracts parametrization [1, 5, 10, 20, 100] to multi-opcode test - Implement stub prefix filtering for all benchmarks - Fix gas cost calculations to account for COLD/WARM account access - CALL operations: first call to each contract is COLD (2600), subsequent are WARM (100) - SSTORE operations: add cold storage access cost (2100) for zero-to-non-zero writes - Update gas calculation formulas to solve for calls per contract correctly
diff --git a/tests/benchmark/stateful/bloatnet/README.md b/tests/benchmark/stateful/bloatnet/README.md
@@ -0,0 +1,76 @@
+# BloatNet Single-Opcode Benchmarks
+
+This directory contains benchmarks for testing single EVM opcodes (SLOAD, SSTORE) under state-heavy conditions using pre-deployed contracts.
+
+## Test Setup
+
+### Prerequisites
+
+1. Pre-deployed ERC20 contracts on the target network
+2. A JSON file containing contract addresses (stubs)
+
+### Address Stubs Format
+
+Create a JSON file (`stubs.json`) mapping test-specific stub names to deployed contract addresses:
+
+```json
+{
+  "test_sload_empty_erc20_balanceof_USDT": "0x1234567890123456789012345678901234567890",
+  "test_sload_empty_erc20_balanceof_USDC": "0x2345678901234567890123456789012345678901",
+  "test_sload_empty_erc20_balanceof_DAI": "0x3456789012345678901234567890123456789012",
+  "test_sload_empty_erc20_balanceof_WETH": "0x4567890123456789012345678901234567890123",
+  "test_sload_empty_erc20_balanceof_WBTC": "0x5678901234567890123456789012345678901234",
+
+  "test_sstore_erc20_approve_USDT": "0x1234567890123456789012345678901234567890",
+  "test_sstore_erc20_approve_USDC": "0x2345678901234567890123456789012345678901",
+  "test_sstore_erc20_approve_DAI": "0x3456789012345678901234567890123456789012",
+  "test_sstore_erc20_approve_WETH": "0x4567890123456789012345678901234567890123",
+  "test_sstore_erc20_approve_WBTC": "0x5678901234567890123456789012345678901234""
+}
+```
+
+**Naming Convention:**
+- Stub names MUST start with the test function name
+- Format: `{test_function_name}_{identifier}`
+- Example: `test_sload_empty_erc20_balanceof_USDT`
+
+
+### Running the Tests
+
+#### Execute Mode (Against Live Network)
+
+```bash
+# Run with specific number of contracts (e.g., only the 5-contract variant)
+PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run execute \
+  --address-stubs /path/to/stubs.json \
+  --fork=Prague \
+  tests/benchmark/stateful/bloatnet/test_single_opcode.py::test_sload_empty_erc20_balanceof \
+  -k "[5]" \
+  -v
+
+# Run all parametrized variants (1, 5, 10, 20 contracts)
+PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 uv run execute \
+  --address-stubs /path/to/stubs.json \
+  --fork=Prague \
+  tests/benchmark/stateful/bloatnet/test_single_opcode.py \
+  -v
+```
+
+
+## Test Parametrization
+
+Both tests are parametrized with `num_contracts = [1, 5, 10, 20, 100]`, generating 5 test variants each:
+
+- **1 contract**: Baseline single-contract performance
+- **5 contracts**: Small-scale multi-contract scenario
+- **10 contracts**: Medium-scale multi-contract scenario
+- **20 contracts**: Large-scale multi-contract scenario
+- **100 contracts**: Very large-scale multi-contract stress test
+
+### How Stub Filtering Works
+
+1. Test extracts its function name (e.g., `test_sload_empty_erc20_balanceof`)
+2. Filters stubs starting with that name from `stubs.json`
+3. Selects the **first N** matching stubs based on `num_contracts` parameter
+4. Errors if insufficient matching stubs found
+
diff --git a/tests/benchmark/stateful/bloatnet/test_multi_opcode.py b/tests/benchmark/stateful/bloatnet/test_multi_opcode.py
@@ -19,6 +19,7 @@
 )
 from ethereum_test_vm import Bytecode
 from ethereum_test_vm import Opcodes as Op
+from pytest_plugins.execute.pre_alloc import AddressStubs
 
 REFERENCE_SPEC_GIT_PATH = "DUMMY/bloatnet.md"
 REFERENCE_SPEC_VERSION = "1.0"
@@ -465,3 +466,257 @@ def test_bloatnet_balance_extcodehash(
         blocks=[Block(txs=[attack_tx])],
         post=post,
     )
+
+
+# ERC20 function selectors
+BALANCEOF_SELECTOR = 0x70A08231  # balanceOf(address)
+APPROVE_SELECTOR = 0x095EA7B3  # approve(address,uint256)
+
+
+@pytest.mark.valid_from("Prague")
+@pytest.mark.parametrize("num_contracts", [1, 5, 10, 20, 100])
+@pytest.mark.parametrize(
+    "sload_percent,sstore_percent",
+    [
+        pytest.param(50, 50, id="50-50"),
+        pytest.param(70, 30, id="70-30"),
+        pytest.param(90, 10, id="90-10"),
+    ],
+)
+def test_mixed_sload_sstore(
+    blockchain_test: BlockchainTestFiller,
+    pre: Alloc,
+    fork: Fork,
+    gas_benchmark_value: int,
+    address_stubs: AddressStubs,
+    num_contracts: int,
+    sload_percent: int,
+    sstore_percent: int,
+    request: pytest.FixtureRequest,
+) -> None:
+    """
+    BloatNet mixed SLOAD/SSTORE benchmark with configurable operation ratios.
+
+    This test:
+    1. Filters stubs matching test name prefix
+       (e.g., test_mixed_sload_sstore_*)
+    2. Uses first N contracts based on num_contracts parameter
+    3. Divides gas budget evenly across all selected contracts
+    4. For each contract, divides gas into SLOAD and SSTORE portions by
+       percentage
+    5. Executes balanceOf (SLOAD) and approve (SSTORE) calls per the ratio
+    6. Stresses clients with combined read/write operations on large
+       contracts
+    """
+    # Extract test function name for stub filtering
+    test_name = request.node.name.split("[")[0]  # Remove parametrization suffix
+
+    # Filter stubs that match the test name prefix
+    matching_stubs = [
+        stub_name for stub_name in address_stubs.root.keys() if stub_name.startswith(test_name)
+    ]
+
+    # Validate we have enough stubs
+    if len(matching_stubs) < num_contracts:
+        pytest.fail(
+            f"Not enough matching stubs for test '{test_name}'. "
+            f"Required: {num_contracts}, Found: {len(matching_stubs)}. "
+            f"Matching stubs: {matching_stubs}"
+        )
+
+    # Select first N stubs
+    selected_stubs = matching_stubs[:num_contracts]
+    gas_costs = fork.gas_costs()
+
+    # Calculate gas costs
+    intrinsic_gas = fork.transaction_intrinsic_cost_calculator()(calldata=b"")
+
+    # Fixed overhead for SLOAD loop
+    sload_loop_overhead = (
+        # Attack contract loop overhead
+        gas_costs.G_VERY_LOW * 2  # MLOAD counter (3*2)
+        + gas_costs.G_VERY_LOW * 2  # MSTORE selector (3*2)
+        + gas_costs.G_VERY_LOW * 3  # MLOAD + MSTORE address (3*3)
+        + gas_costs.G_BASE  # POP (2)
+        + gas_costs.G_BASE * 3  # SUB + MLOAD + MSTORE for counter decrement (2*3)
+        + gas_costs.G_BASE * 2  # ISZERO * 2 for loop condition (2*2)
+        + gas_costs.G_MID  # JUMPI (8)
+    )
+
+    # ERC20 balanceOf internal gas
+    sload_erc20_internal = (
+        gas_costs.G_VERY_LOW  # PUSH4 selector (3)
+        + gas_costs.G_BASE  # EQ selector match (2)
+        + gas_costs.G_MID  # JUMPI to function (8)
+        + gas_costs.G_JUMPDEST  # JUMPDEST at function start (1)
+        + gas_costs.G_VERY_LOW * 2  # CALLDATALOAD arg (3*2)
+        + gas_costs.G_KECCAK_256  # keccak256 static (30)
+        + gas_costs.G_KECCAK_256_WORD * 2  # keccak256 dynamic for 64 bytes (2*6)
+        + gas_costs.G_COLD_SLOAD  # Cold SLOAD - always cold for random addresses (2100)
+        + gas_costs.G_VERY_LOW * 3  # MSTORE result + RETURN setup (3*3)
+    )
+
+    # Fixed overhead for SSTORE loop
+    sstore_loop_overhead = (
+        # Attack contract loop body operations
+        gas_costs.G_VERY_LOW  # MSTORE selector at memory[32] (3)
+        + gas_costs.G_LOW  # MLOAD counter (5)
+        + gas_costs.G_VERY_LOW  # MSTORE spender at memory[64] (3)
+        + gas_costs.G_BASE  # POP call result (2)
+        # Counter decrement
+        + gas_costs.G_LOW  # MLOAD counter (5)
+        + gas_costs.G_VERY_LOW  # PUSH1 1 (3)
+        + gas_costs.G_VERY_LOW  # SUB (3)
+        + gas_costs.G_VERY_LOW  # MSTORE counter back (3)
+        # While loop condition check
+        + gas_costs.G_LOW  # MLOAD counter (5)
+        + gas_costs.G_BASE  # ISZERO (2)
+        + gas_costs.G_BASE  # ISZERO (2)
+        + gas_costs.G_MID  # JUMPI back to loop start (8)
+    )
+
+    # ERC20 approve internal gas
+    # Cold SSTORE: 22100 = 20000 base + 2100 cold access
+    sstore_erc20_internal = (
+        gas_costs.G_VERY_LOW  # PUSH4 selector (3)
+        + gas_costs.G_BASE  # EQ selector match (2)
+        + gas_costs.G_MID  # JUMPI to function (8)
+        + gas_costs.G_JUMPDEST  # JUMPDEST at function start (1)
+        + gas_costs.G_VERY_LOW  # CALLDATALOAD spender (3)
+        + gas_costs.G_VERY_LOW  # CALLDATALOAD amount (3)
+        + gas_costs.G_KECCAK_256  # keccak256 static (30)
+        + gas_costs.G_KECCAK_256_WORD * 2  # keccak256 dynamic for 64 bytes (12)
+        + gas_costs.G_COLD_SLOAD  # Cold SLOAD for allowance check (2100)
+        + gas_costs.G_STORAGE_SET  # SSTORE base cost (20000)
+        + gas_costs.G_COLD_SLOAD  # Additional cold storage access (2100)
+        + gas_costs.G_VERY_LOW  # PUSH1 1 for return value (3)
+        + gas_costs.G_VERY_LOW  # MSTORE return value (3)
+        + gas_costs.G_VERY_LOW  # PUSH1 32 for return size (3)
+        + gas_costs.G_VERY_LOW  # PUSH1 0 for return offset (3)
+    )
+
+    # Calculate gas budget per contract
+    available_gas = gas_benchmark_value - intrinsic_gas
+    gas_per_contract = available_gas // num_contracts
+
+    # For each contract, split gas by percentage
+    sload_gas_per_contract = (gas_per_contract * sload_percent) // 100
+    sstore_gas_per_contract = (gas_per_contract * sstore_percent) // 100
+
+    # Account for cold/warm transitions in CALL costs
+    # First SLOAD call is COLD (2600), rest are WARM (100)
+    sload_warm_cost = sload_loop_overhead + gas_costs.G_WARM_ACCOUNT_ACCESS + sload_erc20_internal
+    cold_warm_diff = gas_costs.G_COLD_ACCOUNT_ACCESS - gas_costs.G_WARM_ACCOUNT_ACCESS
+    sload_calls_per_contract = int((sload_gas_per_contract - cold_warm_diff) // sload_warm_cost)
+
+    # First SSTORE call is COLD (2600), rest are WARM (100)
+    sstore_warm_cost = (
+        sstore_loop_overhead + gas_costs.G_WARM_ACCOUNT_ACCESS + sstore_erc20_internal
+    )
+    sstore_calls_per_contract = int((sstore_gas_per_contract - cold_warm_diff) // sstore_warm_cost)
+
+    # Deploy selected ERC20 contracts using stubs
+    erc20_addresses = []
+    for stub_name in selected_stubs:
+        addr = pre.deploy_contract(
+            code=Bytecode(),
+            stub=stub_name,
+        )
+        erc20_addresses.append(addr)
+
+    # Log test requirements
+    print(
+        f"Total gas budget: {gas_benchmark_value / 1_000_000:.1f}M gas. "
+        f"~{gas_per_contract / 1_000_000:.1f}M gas per contract "
+        f"({sload_percent}% SLOAD, {sstore_percent}% SSTORE). "
+        f"Per contract: {sload_calls_per_contract} balanceOf calls, "
+        f"{sstore_calls_per_contract} approve calls."
+    )
+
+    # Build attack code that loops through each contract
+    attack_code: Bytecode = (
+        Op.JUMPDEST  # Entry point
+        + Op.MSTORE(offset=0, value=BALANCEOF_SELECTOR)  # Store selector once for all contracts
+    )
+
+    for erc20_address in erc20_addresses:
+        # For each contract, execute SLOAD operations (balanceOf)
+        attack_code += (
+            # Initialize counter in memory[32] = number of balanceOf calls
+            Op.MSTORE(offset=32, value=sload_calls_per_contract)
+            # Loop for balanceOf calls
+            + While(
+                condition=Op.MLOAD(32) + Op.ISZERO + Op.ISZERO,
+                body=(
+                    # Call balanceOf(address) on ERC20 contract
+                    # args_offset=28 reads: selector from MEM[28:32] + address
+                    # from MEM[32:64]
+                    Op.CALL(
+                        address=erc20_address,
+                        value=0,
+                        args_offset=28,
+                        args_size=36,
+                        ret_offset=0,
+                        ret_size=0,
+                    )
+                    + Op.POP  # Discard CALL success status
+                    # Decrement counter
+                    + Op.MSTORE(offset=32, value=Op.SUB(Op.MLOAD(32), 1))
+                ),
+            )
+        )
+
+        # For each contract, execute SSTORE operations (approve)
+        # Reuse the same memory layout as balanceOf
+        attack_code += (
+            # Store approve selector at memory[0] (reusing same slot)
+            Op.MSTORE(offset=0, value=APPROVE_SELECTOR)
+            # Initialize counter in memory[32] = number of approve calls
+            # (reusing same slot)
+            + Op.MSTORE(offset=32, value=sstore_calls_per_contract)
+            # Loop for approve calls
+            + While(
+                condition=Op.MLOAD(32) + Op.ISZERO + Op.ISZERO,
+                body=(
+                    # Store spender at memory[64] (counter as spender/amount)
+                    Op.MSTORE(offset=64, value=Op.MLOAD(32))
+                    # Call approve(spender, amount) on ERC20 contract
+                    # args_offset=28 reads: selector from MEM[28:32] +
+                    # spender from MEM[32:64] + amount from MEM[64:96]
+                    # Note: counter at MEM[32:64] is reused as spender,
+                    # and value at MEM[64:96] serves as the amount
+                    + Op.CALL(
+                        address=erc20_address,
+                        value=0,
+                        args_offset=28,
+                        args_size=68,
+                        ret_offset=0,
+                        ret_size=0,
+                    )
+                    + Op.POP  # Discard CALL success status
+                    # Decrement counter
+                    + Op.MSTORE(offset=32, value=Op.SUB(Op.MLOAD(32), 1))
+                ),
+            )
+        )
+
+    # Deploy attack contract
+    attack_address = pre.deploy_contract(code=attack_code)
+
+    # Run the attack
+    attack_tx = Transaction(
+        to=attack_address,
+        gas_limit=gas_benchmark_value,
+        sender=pre.fund_eoa(),
+    )
+
+    # Post-state
+    post = {
+        attack_address: Account(storage={}),
+    }
+
+    blockchain_test(
+        pre=pre,
+        blocks=[Block(txs=[attack_tx])],
+        post=post,
+    )
diff --git a/tests/benchmark/stateful/bloatnet/test_single_opcode.py b/tests/benchmark/stateful/bloatnet/test_single_opcode.py