Skip to content

feat(test-benchmark,test-fill): organize fixtures into gas-limit subdirs#2134

Draft
danceratopz wants to merge 4 commits intoethereum:forks/amsterdamfrom
danceratopz:use-subdirs-for-each-gaslimit
Draft

feat(test-benchmark,test-fill): organize fixtures into gas-limit subdirs#2134
danceratopz wants to merge 4 commits intoethereum:forks/amsterdamfrom
danceratopz:use-subdirs-for-each-gaslimit

Conversation

@danceratopz
Copy link
Member

@danceratopz danceratopz commented Feb 4, 2026

🗒️ Description

  • Split benchmark fixture outputs into per-gas-limit subdirectories when using --gas-benchmark-values.
  • Keep blockchain_tests_engine_x/pre_alloc shared at the output root and reject --output=stdout for gas benchmark runs.
  • Add a regression test and document the new output layout.

The subdir routing happens in the base test parametrizer, right before fixture_collector.add_fixture(...) decides the output path: https://github.com/danceratopz/execution-specs/blob/ade21760e4eb9dccc687b91d623d404bc1ed5ecf/packages/testing/src/execution_testing/cli/pytest_commands/plugins/filler/filler.py#L1554-L1576

The logic for the per‑gas subdir is only applied when both are true:

  1. --gas-benchmark-values is provided:
    • GasBenchmarkValues.from_config(request.config) is not None.
  2. The test is a benchmark test:
    • It has one of these markers: benchmark, stateful, or repricing.

Before

All fixtures for multiple gas limits were in the same files:

fixtures/
├── blockchain_tests
│   └── benchmark
│       └── compute
│           └── instruction
│               └── arithmetic
│                   ├── arithmetic.json
│                   ├── mod_arithmetic.json
│                   └── mod.json
├── blockchain_tests_engine
│   └── benchmark
│       └── compute
│           └── instruction
│               └── arithmetic
│                   ├── arithmetic.json
│                   ├── mod_arithmetic.json
│                   └── mod.json
└── blockchain_tests_engine_x
    ├── benchmark
    │   └── compute
    │       └── instruction
    │           └── arithmetic
    │               ├── arithmetic.json
    │               ├── mod_arithmetic.json
    │               └── mod.json
    └── pre_alloc
        ├── 0x398fa63f55cfbafc.json
        └── 0xa9149b4b29ac0473.json

E.g., fixtures/blockchain_tests/benchmark/compute/instruction/arithmetic/arithmetic.json contains both:

  • tests/benchmark/compute/instruction/test_arithmetic.py::test_arithmetic[benchmark-gas-value_1M-fork_Osaka-blockchain_test-opcode_ADD-]
  • tests/benchmark/compute/instruction/test_arithmetic.py::test_arithmetic[benchmark-gas-value_2M-fork_Osaka-blockchain_test-opcode_ADD-]

After

Fixtures with different gas-limit parameters are organized by sub-folder as:

fixtures/
├── blockchain_tests
│   ├── benchmark_gas_limit_0001M
│   │   └── compute
│   │       └── instruction
│   │           └── arithmetic
│   │               ├── arithmetic.json
│   │               ├── mod_arithmetic.json
│   │               └── mod.json
│   └── benchmark_gas_limit_0002M
│       └── compute
│           └── instruction
│               └── arithmetic
│                   ├── arithmetic.json
│                   ├── mod_arithmetic.json
│                   └── mod.json
├── blockchain_tests_engine
│   ├── benchmark_gas_limit_0001M
│   │   └── compute
│   │       └── instruction
│   │           └── arithmetic
│   │               ├── arithmetic.json
│   │               ├── mod_arithmetic.json
│   │               └── mod.json
│   └── benchmark_gas_limit_0002M
│       └── compute
│           └── instruction
│               └── arithmetic
│                   ├── arithmetic.json
│                   ├── mod_arithmetic.json
│                   └── mod.json
└── blockchain_tests_engine_x
    ├── benchmark_gas_limit_0001M
    │   └── compute
    │       └── instruction
    │           └── arithmetic
    │               ├── arithmetic.json
    │               ├── mod_arithmetic.json
    │               └── mod.json
    ├── benchmark_gas_limit_0002M
    │   └── compute
    │       └── instruction
    │           └── arithmetic
    │               ├── arithmetic.json
    │               ├── mod_arithmetic.json
    │               └── mod.json
    └── pre_alloc
        ├── 0x398fa63f55cfbafc.json
        └── 0xa9149b4b29ac0473.json

Changes

  • fill now writes benchmark fixtures under gas_limit_XXXXM/ (zero-padded, stable width).
  • Verification logic recognizes the new top-level gas-limit prefix.
  • Help text and docs updated to describe the layout and pre-alloc sharing.
  • Pytester test ensures no mixing of gas-limit keys and no root JSON output for benchmark runs.

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

image

@danceratopz danceratopz force-pushed the use-subdirs-for-each-gaslimit branch from 57092f9 to 635b120 Compare February 4, 2026 10:09
@danceratopz danceratopz force-pushed the use-subdirs-for-each-gaslimit branch from 635b120 to 070ec15 Compare February 4, 2026 10:35
@danceratopz danceratopz added A-test-fill Area: execution_testing.cli.pytest_commands.plugins.filler A-test-benchmark Area: execution_testing.benchmark and tests/benchmark labels Feb 4, 2026
@danceratopz danceratopz changed the title feat(test-benchmark): organize fixtures into gas-limit subdirs feat(test-benchmark,test-fill): organize fixtures into gas-limit subdirs Feb 4, 2026
@spencer-tb spencer-tb self-requested a review February 4, 2026 10:52
@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.07%. Comparing base (72addb2) to head (a5b22e4).
⚠️ Report is 20 commits behind head on forks/amsterdam.

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/amsterdam    #2134   +/-   ##
================================================
  Coverage            86.07%   86.07%           
================================================
  Files                  599      599           
  Lines                39472    39472           
  Branches              3780     3780           
================================================
  Hits                 33977    33977           
  Misses                4862     4862           
  Partials               633      633           
Flag Coverage Δ
unittests 86.07% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@spencer-tb spencer-tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative structure would be the following, which has the benefit of keeping the existing structure, the test types staying at the same dir level:

  fixtures/                                                                                                                                                                                         
  ├── blockchain_tests                                                                                                                                                                              
  │   ├── gas_limit_0001M/...                                                                                                                                                                       
  │   └── gas_limit_0002M/...                                                                                                                                                                       
  ├── blockchain_tests_engine                                                                                                                                                                       
  │   ├── gas_limit_0001M/...                                                                                                                                                                       
  │   └── gas_limit_0002M/...                                                                                                                                                                       
  └── blockchain_tests_engine_x                                                                                                                                                                     
      ├── pre_alloc/                                                                                                                                                    
      ├── gas_limit_0001M/...                                                                                                                                                                       
      └── gas_limit_0002M/...

I don't see any other issues here. The hasher would still work. Consume enginex could need a tweak in the future but not being used atm.

Another point: are those using the benchmarking tests happy with the structure as it might be a breaking change on there end.

@danceratopz
Copy link
Member Author

Thanks for the review @spencer-tb! Yes, I think this is a better structure!

  fixtures/                                                                                                                                                                                         
  ├── blockchain_tests                                                                                                                                                                              
  │   ├── gas_limit_0001M/...                                                                                                                                                                       
  │   └── gas_limit_0002M/...                                                                                                                                                                       
  ├── blockchain_tests_engine                                                                                                                                                                       
  │   ├── gas_limit_0001M/...                                                                                                                                                                       
  │   └── gas_limit_0002M/...                                                                                                                                                                       
  └── blockchain_tests_engine_x                                                                                                                                                                     
      ├── pre_alloc/                                                                                                                                                    
      ├── gas_limit_0001M/...                                                                                                                                                                       
      └── gas_limit_0002M/...

@danceratopz
Copy link
Member Author

Done! Also flattened one-level in the structure so it's clean. Description updated with verbose before and after trees!

@danceratopz
Copy link
Member Author

danceratopz commented Feb 9, 2026

I don't think we should make this change in benchmark release layouts without considering the corresponding change for forks. The linear growth of our new fixtures files with each new fork, is the real issue. Let's clear that up first and then come back to this PR (which could increase in scope?).

I made a discussion with poll for that here:

@danceratopz danceratopz marked this pull request as draft February 9, 2026 06:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-benchmark Area: execution_testing.benchmark and tests/benchmark A-test-fill Area: execution_testing.cli.pytest_commands.plugins.filler

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants