feat(tests): multi opcode bloatnet ext cases (#2186)

CPerezz · gballet · web-flow · commit 675f1a7317b6 · 2025-10-01T17:14:35.000+01:00
* Add BloatNet tests

Signed-off-by: Guillaume Ballet &lt;3272758+gballet@users.noreply.github.com&gt;

* try building the contract

Signed-off-by: Guillaume Ballet &lt;3272758+gballet@users.noreply.github.com&gt;

* fix: SSTORE 0 -&gt; 1 match all values in the state

Signed-off-by: Guillaume Ballet &lt;3272758+gballet@users.noreply.github.com&gt;

* add the tx for 0 -&gt; 1 and 1 -&gt; 2

Signed-off-by: Guillaume Ballet &lt;3272758+gballet@users.noreply.github.com&gt;

* fix: linter issues

Signed-off-by: Guillaume Ballet &lt;3272758+gballet@users.noreply.github.com&gt;

* remove more whitespaces

Signed-off-by: Guillaume Ballet &lt;3272758+gballet@users.noreply.github.com&gt;

remove leftover single whitespace :|

* fix formatting

* move to benchmarks

Signed-off-by: Guillaume Ballet &lt;3272758+gballet@users.noreply.github.com&gt;

* fix linter value

* use the gas limit from the environment

* parameterize the written value in SSTORE

* fix linter issues

* update CHANGELOG.md

* fix format

* simplify syntax

* fix: start with an empty contract storage

* more fixes, but the result is still incorrect

* fix: finally fix the tests

* linter fix

* add SLOAD tests

* test(benchmark): implement CREATE2 addressing for bloatnet tests

- Add CREATE2 deterministic address calculation to overcome 24KB bytecode limit
- Fix While loop condition to properly iterate through contracts
- Account for memory expansion costs in gas calculations
- Add safety margins (50k gas reserve, 98% utilization) for stability
- Tests now scale to any gas limit without bytecode constraints
- Achieve 98% gas utilization with 10M and 20M gas limits

* refactor(benchmark): optimize gas calculations in bloatnet tests

- Remove gas reserve and 98% utilization logic for contract calculations
- Directly calculate the number of contracts based on available gas
- Introduce precise expected gas usage calculations for better accuracy
- Ensure tests scale effectively without unnecessary constraints

* refactor(benchmark):  bloatnet tests with unique bytecode for I/O optimization

- Update tests to generate unique bytecode for each contract, maximizing I/O reads during benchmarks.
- Clarify comments regarding bytecode generation and its impact on gas costs.
- Ensure CREATE2 addresses are calculated consistently using a base bytecode template.
- Improve test descriptions to reflect the changes in contract deployment strategy.

* refactor(benchmark): replace custom CREATE2 address calculation with utility function

- Remove the custom `calculate_create2_address` function in favor of the `compute_create2_address` utility.
- Update tests to utilize the new utility for consistent CREATE2 address calculations.
- Simplify code by eliminating unnecessary complexity in address calculation logic.
- Ensure that the CREATE2 prefix is directly set to 0xFF in the memory operation for clarity.

* CREATE2 factory approach working

* Version with EIP-7997 model working

* refactor(benchmark): imrpove contract deployment script with interactive selection and bytecode generation

- Introduced interactive contract type selection for deploying contracts in the bloatnet benchmark.
- Added support for multiple contract types: max_size_24kb, sload_heavy, storage_heavy, and custom.
- Refactored bytecode generation functions to improve clarity and maintainability.
- Updated README to reflect changes in deployment process and contract types.
- Ensured proper handling of factory deployment and transaction receipt checks.

* delete: remove obsolete test_create2.py script

This was commited unintentionally

* refactor(benchmark): optimize gas calculations for BALANCE + EXTCODECOPY pattern

- Updated the README to reflect the optimized gas cost for the BALANCE + EXTCODECOPY pattern, reducing it from ~5,007 to ~2,710 gas per contract.
- Modified the test_bloatnet_balance_extcodecopy function to read only 1 byte from the end of the bytecode, minimizing gas costs while maximizing contract targeting.
- Adjusted calculations for the number of contracts needed based on the new cost per contract, ensuring accurate benchmarks.

* refactor(benchmark): support non-fixed max_codesize

* chore: Remove all 24kB "hardcoded" refs

* fix: pre-commit lint hooks

* push updated deploy_create2_factory refactored with EEST as dep

* refactor(benchmark): enhance CREATE2 factory deployment and testing

- Updated the deploy_create2_factory_refactored.py script to improve the deployment of a CREATE2 factory with an initcode template, allowing for dynamic contract address generation.

- Modified test_bloatnet.py to support on-the-fly CREATE2 address generation, optimizing gas costs and improving test accuracy.
- Adjusted gas cost calculations in the README to reflect the new deployment approach, ensuring accurate benchmarks for BloatNet tests.

* remove: old_deploy_factory script

* chore: address PR review fixes

* fix(benchmark): correct import path for ethereum_test_vm

* chore(benchmark): update according to review comments

Also, renamed the test file to include only multi-opcode tests there and have a more clean directory for future test inclusions.

* refactor(benchmark): remove hardcoded parameters storing inside factory stub

- Fixed offset at which we COPYCODE
- Removed hardcoded values and added comments for clarity on factory storage layout and contract generation.

* chore: update pyproject.toml configuration

* refactor: rename test_mutiopcode.py to test_muti_opcode.py for consistency

* fix: correct import sorting in test_muti_opcode.py to fix CI lint error

* fix(benchmark): rename test file to fix typo

Rename test_muti_opcode.py to test_multi_opcode.py to fix filename typo

* fix(benchmark): update BloatNet tests to use factory's getConfig() method

Replace direct storage access with STATICCALL to factory's getConfig() method
in both test_bloatnet_balance_extcodesize and test_bloatnet_balance_extcodecopy.

Changes:
- Use STATICCALL to retrieve configuration from factory instead of SLOAD
- Add proper error handling for failed configuration calls
- Remove gas-limiting calculations, allowing tests to run until gas exhaustion
- Store configuration data in memory positions 96 and 128 for cleaner access

This makes the tests more robust and better aligned with the factory's
public interface, avoiding direct storage access assumptions.

* refactor(benchmark): enhance BloatNet test documentation and gas cost calculations

* revert: restore pyproject.toml to match main branch

Remove all changes to pyproject.toml to align with upstream main branch.
This ensures CI compatibility and prevents configuration conflicts.

* fix(benchmark): resolve W505 doc line length issues in test_multi_opcode.py

Fixed all documentation and comment lines exceeding 79 characters to comply
with lint requirements.

* refactor(benchmark): simplify STATICCALL usage in BloatNet tests.

* feat(benchmark): add gas exhaustion validation using expected_receipt

Implement solution to address reviewer's concern about test validation by using
EEST's expected_receipt feature to validate that benchmarks consume all gas.

Changes:
- Add TransactionReceipt import
- Add expected_receipt to both test transactions validating gas_used equals gas_limit
- Remove skip_gas_used_validation flag as validation is now explicit

This ensures tests can distinguish between:
- Early failure from invalid jump (~50K gas) indicating setup issues
- Full gas exhaustion (all gas consumed) indicating successful benchmark run

The invalid jump remains as a fail-fast mechanism for STATICCALL failures,
while expected_receipt validates the benchmark actually executed.

* fix(benchmark): restore skip_gas_used_validation flag

Re-add skip_gas_used_validation=True to both blockchain_test calls
as it was accidentally removed. This flag is still needed alongside
the expected_receipt validation.

* refactor(benchmark): improve readability using kwargs syntax for opcodes

Apply reviewer suggestions to use more readable kwargs syntax for memory
and stack operations throughout both test functions.

Changes:
- Use Op.MLOAD(offset) instead of Op.PUSH1(offset) + Op.MLOAD
- Use Op.MSTORE(offset, value) for cleaner memory writes
- Use Op.SHA3(offset, length) for hash operations
- Use Op.POP(Op.BALANCE) and Op.POP(Op.EXTCODESIZE) for cleaner stack ops
- Combine increment operations into single Op.MSTORE(32, Op.ADD(Op.MLOAD(32), 1))

This makes the bytecode generation more concise and easier to understand.

* fix(benchmark): shorten comment lines to meet doc length limit

* fix(benchmark): correct MSTORE operation to store init_code_hash properly

* fix(benchmark): address review comments - remove redundant validation and fix ADD syntax

---------

Signed-off-by: Guillaume Ballet &lt;3272758+gballet@users.noreply.github.com&gt;
Co-authored-by: Guillaume Ballet &lt;3272758+gballet@users.noreply.github.com&gt;
diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
@@ -236,7 +236,8 @@ Users can select any of the artifacts depending on their benchmarking or testing
 
 ### 🧪 Test Cases
 
-- ✨ [EIP-7951](https://eips.ethereum.org/EIPS/eip-7951): Add additional test cases for modular comparison and initcode context ([#2023](https://github.com/ethereum/execution-spec-tests/pull/2023), & [#2068](https://github.com/ethereum/execution-spec-tests/pull/2068)).
+- ✨ [BloatNet](https://bloatnet.info)/Multidimensional Metering: Add benchmarks to be used as part of the BloatNet project and also for Multidimensional Metering.
+- ✨ [EIP-7951](https://eips.ethereum.org/EIPS/eip-7951): Add additional test cases for modular comparison.
 - 🔀 Refactored `BLOBHASH` opcode context tests to use the `pre_alloc` plugin in order to avoid contract and EOA address collisions ([#1637](https://github.com/ethereum/execution-spec-tests/pull/1637)).
 - 🔀 Refactored `SELFDESTRUCT` opcode collision tests to use the `pre_alloc` plugin in order to avoid contract and EOA address collisions ([#1643](https://github.com/ethereum/execution-spec-tests/pull/1643)).
 - ✨ EIP-7594: Sanity test cases to send blob transactions and verify `engine_getBlobsVX` using the `execute` command ([#1644](https://github.com/ethereum/execution-spec-tests/pull/1644),[#1884](https://github.com/ethereum/execution-spec-tests/pull/1884)).
diff --git a/tests/benchmark/bloatnet/__init__.py b/tests/benchmark/bloatnet/__init__.py
@@ -0,0 +1 @@
+"""BloatNet benchmark tests for Ethereum execution spec tests."""
diff --git a/tests/benchmark/bloatnet/test_multi_opcode.py b/tests/benchmark/bloatnet/test_multi_opcode.py
@@ -0,0 +1,315 @@
+"""
+abstract: BloatNet bench cases extracted from https://hackmd.io/9icZeLN7R0Sk5mIjKlZAHQ.
+
+   The idea of all these tests is to stress client implementations to find out
+   where the limits of processing are focusing specifically on state-related
+   operations.
+"""
+
+import pytest
+
+from ethereum_test_forks import Fork
+from ethereum_test_tools import (
+    Account,
+    Alloc,
+    Block,
+    BlockchainTestFiller,
+    Transaction,
+    While,
+)
+from ethereum_test_vm import Bytecode
+from ethereum_test_vm import Opcodes as Op
+
+REFERENCE_SPEC_GIT_PATH = "DUMMY/bloatnet.md"
+REFERENCE_SPEC_VERSION = "1.0"
+
+
+# BLOATNET ARCHITECTURE:
+#
+#   [Initcode Contract]        [Factory Contract]              [24KB Contracts]
+#         (9.5KB)                    (116B)                     (N x 24KB each)
+#           │                          │                              │
+#           │  EXTCODECOPY             │   CREATE2(salt++)            │
+#           └──────────────►           ├──────────────────►     Contract_0
+#                                      ├──────────────────►     Contract_1
+#                                      ├──────────────────►     Contract_2
+#                                      └──────────────────►     Contract_N
+#
+#   [Attack Contract] ──STATICCALL──► [Factory.getConfig()]
+#           │                              returns: (N, hash)
+#           └─► Loop(i=0 to N):
+#                 1. Generate CREATE2 addr: keccak256(0xFF|factory|i|hash)[12:]
+#                 2. BALANCE(addr)    → 2600 gas (cold access)
+#                 3. EXTCODESIZE(addr) → 100 gas (warm access)
+#
+# HOW IT WORKS:
+#   1. Factory uses EXTCODECOPY to load initcode, avoiding PC-relative jumps
+#   2. Each CREATE2 deployment produces unique 24KB bytecode (via ADDRESS)
+#   3. All contracts share same initcode hash for deterministic addresses
+#   4. Attack rapidly accesses all contracts, stressing client's state handling
+
+
+@pytest.mark.valid_from("Prague")
+def test_bloatnet_balance_extcodesize(
+    blockchain_test: BlockchainTestFiller,
+    pre: Alloc,
+    fork: Fork,
+    gas_benchmark_value: int,
+):
+    """
+    BloatNet test using BALANCE + EXTCODESIZE with "on-the-fly" CREATE2
+    address generation.
+
+    This test:
+    1. Assumes contracts are already deployed via the factory (salt 0 to N-1)
+    2. Generates CREATE2 addresses dynamically during execution
+    3. Calls BALANCE (cold) then EXTCODESIZE (warm) on each
+    4. Maximizes cache eviction by accessing many contracts
+    """
+    gas_costs = fork.gas_costs()
+
+    # Calculate gas costs
+    intrinsic_gas = fork.transaction_intrinsic_cost_calculator()(calldata=b"")
+
+    # Cost per contract access with CREATE2 address generation
+    cost_per_contract = (
+        gas_costs.G_KECCAK_256  # SHA3 static cost for address generation (30)
+        + gas_costs.G_KECCAK_256_WORD * 3  # SHA3 dynamic cost (85 bytes = 3 words * 6)
+        + gas_costs.G_COLD_ACCOUNT_ACCESS  # Cold BALANCE (2600)
+        + gas_costs.G_BASE  # POP balance (2)
+        + gas_costs.G_WARM_ACCOUNT_ACCESS  # Warm EXTCODESIZE (100)
+        + gas_costs.G_BASE  # POP code size (2)
+        + gas_costs.G_BASE  # DUP1 before BALANCE (3)
+        + gas_costs.G_VERY_LOW * 4  # PUSH1 operations (4 * 3)
+        + gas_costs.G_LOW  # MLOAD for salt (3)
+        + gas_costs.G_VERY_LOW  # ADD for increment (3)
+        + gas_costs.G_LOW  # MSTORE salt back (3)
+        + 10  # While loop overhead
+    )
+
+    # Calculate how many contracts to access based on available gas
+    available_gas = gas_benchmark_value - intrinsic_gas - 1000  # Reserve for cleanup
+    contracts_needed = int(available_gas // cost_per_contract)
+
+    # Deploy factory using stub contract - NO HARDCODED VALUES
+    # The stub "bloatnet_factory" must be provided via --address-stubs flag
+    # The factory at that address MUST have:
+    # - Slot 0: Number of deployed contracts
+    # - Slot 1: Init code hash for CREATE2 address calculation
+    factory_address = pre.deploy_contract(
+        code=Bytecode(),  # Required parameter, but will be ignored for stubs
+        stub="bloatnet_factory",
+    )
+
+    # Log test requirements - deployed count read from factory storage
+    print(
+        f"Test needs {contracts_needed} contracts for "
+        f"{gas_benchmark_value / 1_000_000:.1f}M gas. "
+        f"Factory storage will be checked during execution."
+    )
+
+    # Build attack contract that reads config from factory and performs attack
+    attack_code = (
+        # Call getConfig() on factory to get num_deployed and init_code_hash
+        Op.STATICCALL(
+            gas=Op.GAS,
+            address=factory_address,
+            args_offset=0,
+            args_size=0,
+            ret_offset=96,
+            ret_size=64,
+        )
+        # Check if call succeeded
+        + Op.ISZERO
+        + Op.PUSH2(0x1000)  # Jump to error handler if failed (far jump)
+        + Op.JUMPI
+        # Load results from memory
+        # Memory[96:128] = num_deployed_contracts
+        # Memory[128:160] = init_code_hash
+        + Op.MLOAD(96)  # Load num_deployed_contracts
+        + Op.MLOAD(128)  # Load init_code_hash
+        # Setup memory for CREATE2 address generation
+        # Memory layout at 0: 0xFF + factory_addr(20) + salt(32) + hash(32)
+        + Op.MSTORE(0, factory_address)  # Store factory address at memory position 0
+        + Op.MSTORE8(11, 0xFF)  # Store 0xFF prefix at position (32 - 20 - 1)
+        + Op.MSTORE(32, 0)  # Store salt at position 32
+        # Stack now has: [num_contracts, init_code_hash]
+        + Op.PUSH1(64)  # Push memory position
+        + Op.MSTORE  # Store init_code_hash at memory[64]
+        # Stack now has: [num_contracts]
+        # Main attack loop - iterate through all deployed contracts
+        + While(
+            body=(
+                # Generate CREATE2 addr: keccak256(0xFF+factory+salt+hash)
+                Op.SHA3(11, 85)  # Generate CREATE2 address from memory[11:96]
+                # The address is now on the stack
+                + Op.DUP1  # Duplicate for EXTCODESIZE
+                + Op.POP(Op.BALANCE)  # Cold access
+                + Op.POP(Op.EXTCODESIZE)  # Warm access
+                # Increment salt for next iteration
+                + Op.MSTORE(32, Op.ADD(Op.MLOAD(32), 1))  # Increment and store salt
+            ),
+            # Continue while we haven't reached the limit
+            condition=Op.DUP1 + Op.PUSH1(1) + Op.SWAP1 + Op.SUB + Op.DUP1 + Op.ISZERO + Op.ISZERO,
+        )
+        + Op.POP  # Clean up counter
+    )
+
+    # Deploy attack contract
+    attack_address = pre.deploy_contract(code=attack_code)
+
+    # Run the attack
+    attack_tx = Transaction(
+        to=attack_address,
+        gas_limit=gas_benchmark_value,
+        sender=pre.fund_eoa(),
+    )
+
+    # Post-state: just verify attack contract exists
+    post = {
+        attack_address: Account(storage={}),
+    }
+
+    blockchain_test(
+        pre=pre,
+        blocks=[Block(txs=[attack_tx])],
+        post=post,
+    )
+
+
+@pytest.mark.valid_from("Prague")
+def test_bloatnet_balance_extcodecopy(
+    blockchain_test: BlockchainTestFiller,
+    pre: Alloc,
+    fork: Fork,
+    gas_benchmark_value: int,
+):
+    """
+    BloatNet test using BALANCE + EXTCODECOPY with on-the-fly CREATE2
+    address generation.
+
+    This test forces actual bytecode reads from disk by:
+    1. Assumes contracts are already deployed via the factory
+    2. Generating CREATE2 addresses dynamically during execution
+    3. Using BALANCE (cold) to warm the account
+    4. Using EXTCODECOPY (warm) to read 1 byte from the END of the bytecode
+    """
+    gas_costs = fork.gas_costs()
+    max_contract_size = fork.max_code_size()
+
+    # Calculate costs
+    intrinsic_gas = fork.transaction_intrinsic_cost_calculator()(calldata=b"")
+
+    # Cost per contract with EXTCODECOPY and CREATE2 address generation
+    cost_per_contract = (
+        gas_costs.G_KECCAK_256  # SHA3 static cost for address generation (30)
+        + gas_costs.G_KECCAK_256_WORD * 3  # SHA3 dynamic cost (85 bytes = 3 words * 6)
+        + gas_costs.G_COLD_ACCOUNT_ACCESS  # Cold BALANCE (2600)
+        + gas_costs.G_BASE  # POP balance (2)
+        + gas_costs.G_WARM_ACCOUNT_ACCESS  # Warm EXTCODECOPY base (100)
+        + gas_costs.G_COPY * 1  # Copy cost for 1 byte (3)
+        + gas_costs.G_BASE * 2  # DUP1 before BALANCE, DUP4 for address (6)
+        + gas_costs.G_VERY_LOW * 8  # PUSH operations (8 * 3 = 24)
+        + gas_costs.G_LOW * 2  # MLOAD for salt twice (6)
+        + gas_costs.G_VERY_LOW * 2  # ADD operations (6)
+        + gas_costs.G_LOW  # MSTORE salt back (3)
+        + gas_costs.G_BASE  # POP after EXTCODECOPY (2)
+        + 10  # While loop overhead
+    )
+
+    # Calculate how many contracts to access
+    available_gas = gas_benchmark_value - intrinsic_gas - 1000
+    contracts_needed = int(available_gas // cost_per_contract)
+
+    # Deploy factory using stub contract - NO HARDCODED VALUES
+    # The stub "bloatnet_factory" must be provided via --address-stubs flag
+    # The factory at that address MUST have:
+    # - Slot 0: Number of deployed contracts
+    # - Slot 1: Init code hash for CREATE2 address calculation
+    factory_address = pre.deploy_contract(
+        code=Bytecode(),  # Required parameter, but will be ignored for stubs
+        stub="bloatnet_factory",
+    )
+
+    # Log test requirements - deployed count read from factory storage
+    print(
+        f"Test needs {contracts_needed} contracts for "
+        f"{gas_benchmark_value / 1_000_000:.1f}M gas. "
+        f"Factory storage will be checked during execution."
+    )
+
+    # Build attack contract that reads config from factory and performs attack
+    attack_code = (
+        # Call getConfig() on factory to get num_deployed and init_code_hash
+        Op.STATICCALL(
+            gas=Op.GAS,
+            address=factory_address,
+            args_offset=0,
+            args_size=0,
+            ret_offset=96,
+            ret_size=64,
+        )
+        # Check if call succeeded
+        + Op.ISZERO
+        + Op.PUSH2(0x1000)  # Jump to error handler if failed (far jump)
+        + Op.JUMPI
+        # Load results from memory
+        # Memory[96:128] = num_deployed_contracts
+        # Memory[128:160] = init_code_hash
+        + Op.MLOAD(96)  # Load num_deployed_contracts
+        + Op.MLOAD(128)  # Load init_code_hash
+        # Setup memory for CREATE2 address generation
+        # Memory layout at 0: 0xFF + factory_addr(20) + salt(32) + hash(32)
+        + Op.MSTORE(0, factory_address)  # Store factory address at memory position 0
+        + Op.MSTORE8(11, 0xFF)  # Store 0xFF prefix at position (32 - 20 - 1)
+        + Op.MSTORE(32, 0)  # Store salt at position 32
+        # Stack now has: [num_contracts, init_code_hash]
+        + Op.PUSH1(64)  # Push memory position
+        + Op.MSTORE  # Store init_code_hash at memory[64]
+        # Stack now has: [num_contracts]
+        # Main attack loop - iterate through all deployed contracts
+        + While(
+            body=(
+                # Generate CREATE2 address
+                Op.SHA3(11, 85)  # Generate CREATE2 address from memory[11:96]
+                # The address is now on the stack
+                + Op.DUP1  # Duplicate for later operations
+                + Op.POP(Op.BALANCE)  # Cold access
+                # EXTCODECOPY(addr, mem_offset, last_byte_offset, 1)
+                # Read the LAST byte to force full contract load
+                + Op.PUSH1(1)  # size (1 byte)
+                + Op.PUSH2(max_contract_size - 1)  # code offset (last byte)
+                # Use salt as memory offset to avoid overlap
+                + Op.ADD(Op.MLOAD(32), 96)  # Add base memory offset for unique position
+                + Op.DUP4  # address (duplicated earlier)
+                + Op.EXTCODECOPY
+                + Op.POP  # Clean up address
+                # Increment salt for next iteration
+                + Op.MSTORE(32, Op.ADD(Op.MLOAD(32), 1))  # Increment and store salt
+            ),
+            # Continue while counter > 0
+            condition=Op.DUP1 + Op.PUSH1(1) + Op.SWAP1 + Op.SUB + Op.DUP1 + Op.ISZERO + Op.ISZERO,
+        )
+        + Op.POP  # Clean up counter
+    )
+
+    # Deploy attack contract
+    attack_address = pre.deploy_contract(code=attack_code)
+
+    # Run the attack
+    attack_tx = Transaction(
+        to=attack_address,
+        gas_limit=gas_benchmark_value,
+        sender=pre.fund_eoa(),
+    )
+
+    # Post-state
+    post = {
+        attack_address: Account(storage={}),
+    }
+
+    blockchain_test(
+        pre=pre,
+        blocks=[Block(txs=[attack_tx])],
+        post=post,
+    )

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+"""BloatNet benchmark tests for Ethereum execution spec tests."""`