Skip to content

Conversation

@chfast
Copy link
Member

@chfast chfast commented Jan 29, 2025

πŸ—’οΈ Description

Add a worst-case test running a block with as many MOD instructions with arguments of the parametrized range.

πŸ”— Related Issues

#1571

βœ… Checklist

  • All: Set appropriate labels for the changes.
  • All: Considered squashing commits to improve commit history.
  • All: Added an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • Tests: All converted JSON/YML tests from ethereum/tests have been added to converted-ethereum-tests.txt.
  • Tests: A PR with removal of converted JSON/YML tests from ethereum/tests have been opened.
  • Tests: Included the type and version of evm t8n tool used to locally execute test cases: e.g., ref with commit hash or geth 1.13.1-stable-3f40e65.
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rather create a marker, e.g.:

@pytest.mark.benchmark

And then you can place these tests in tests/frontier/opcodes/ but still select them with uv run fill -m benchmark.

How do you see the clients consuming these tests for benchmarks? Is there a command in evmone to run benchmark state tests?

@chfast
Copy link
Member Author

chfast commented Jan 30, 2025

Sorry, I didn't put a proper description of this yet. But here are my thoughts about benchmarks in EEST.

  1. Benchmarks are formally regular tests and the existing types (state tests, blockchain tests, even EOF tests and transaction tests) are already very useful. They should be produced with fill and be consumed in a normal way.
  2. We still should separate them from the main tests, maybe by using a marker like @pytest.mark.benchmark and/or in a separate directory.
  3. They should not contribute to the test coverage. Therefore, not be part of the fixtures release or physically separated.
  4. We want to generate them for a single fork and likely a recent one. It is an open question if there should be a default fork for all benchmarks and how to upgrade the fork. But in generally, we are mostly interested about EVM performance near the tip of the chain, not in the historical execution.
  5. That's also the reason why sorting them by the fork names is not good idea. Potential categories are something like: real use cases (e.g. SHA1), micro-benchmarks, worst-case, etc. But I don't think we need to decide categories right now.
  6. I imagine used to be like this: if this is the regular test execution:
    evm test benchmark.json
    then benchmarking is:
    evm test --bench benchmark.json
    The difference is just with --bench the test is executed in the loop and the execution time is reported.
  7. Because of this loop execution we may need to put some additional requirements on the benchmarks. E.g. it should not modify a state, if it modifies a stare the state diff should be the same in every iteration, the test itself should execute for at least 1 ms.

@marioevz
Copy link
Member

@chfast I like the idea, and basically everything you mention is somewhat easily doable in EEST.

One question, for (6), is there a pass/fail condition? Or is this something that is left to the client test consumer discretion?

@chfast
Copy link
Member Author

chfast commented Feb 3, 2025

One question, for (6), is there a pass/fail condition? Or is this something that is left to the client test consumer discretion?

In the example evm test benchmark.json is a regular test execution: no change in the fail/pass conditions.

The additional --bench likely will execute a benchmark in a loop and report the timing statistics. What I try to do in my current tooling is to execute it once and check assertions. Only then switch to the loop execution. In general, you want to make sure the benchmarks are executed correctly otherwise there is no point to benchmark incorrect implementation (or the test itself). In practice the exact method my depend on the time for single execution. It is indeed up to the client teams to decided. But also up to us to audit these decisions.

@chfast chfast changed the title new(tests): add worst-case benchmark for MOD feat(tests): add worst-case benchmark for MOD May 20, 2025
@chfast chfast added scope:tests Scope: Changes EL client test cases in `./tests` type:feat type: Feature feature:zkevm labels May 20, 2025
@chfast
Copy link
Member Author

chfast commented May 20, 2025

Adapted to the zkevm task.

@chfast chfast requested review from jsign and marioevz May 20, 2025 20:45
@chfast chfast marked this pull request as ready for review May 20, 2025 20:49
Copy link
Collaborator

@jsign jsign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments for your consideration, but overall LGTM.

Cycles:

tests/zkevm/test_worst_compute.py::test_worst_mod[fork_Cancun-blockchain_test-mod_bits_63]-1    3918422193
tests/zkevm/test_worst_compute.py::test_worst_mod[fork_Cancun-blockchain_test-mod_bits_255]-1   6321807633
tests/zkevm/test_worst_compute.py::test_worst_mod[fork_Cancun-blockchain_test-mod_bits_127]-1   6559030064
tests/zkevm/test_worst_compute.py::test_worst_mod[fork_Cancun-blockchain_test-mod_bits_191]-1   8734621043

@chfast chfast force-pushed the bench/mod branch 2 times, most recently from ea4c288 to 426b758 Compare May 21, 2025 08:55
@chfast
Copy link
Member Author

chfast commented May 21, 2025

Added also SMOD variant.

Add a worst-case test running a block with as many MOD instructions
with arguments of the parametrized range.
Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@marioevz marioevz merged commit c3e5fb6 into ethereum:main May 21, 2025
14 checks passed
@marioevz marioevz deleted the bench/mod branch May 21, 2025 18:39
kclowes pushed a commit to kclowes/execution-spec-tests that referenced this pull request Oct 20, 2025
Add a worst-case test running a block with as many MOD instructions
with arguments of the parametrized range.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature:zkevm scope:tests Scope: Changes EL client test cases in `./tests` type:feat type: Feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants