Skip to content

Conversation

LouisTsai-Csie
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie commented Jul 24, 2025

🗒️ Description

As EIP-7825 is introduced in Fusaka upgrade, most of the legacy test case would fail. This issue add two test wrappers, benchmark_test and benchmark_state_test, to replace pure blockchain_test and state_test test type.

🔗 Related Issues or PRs

Issue #1896

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

@LouisTsai-Csie LouisTsai-Csie self-assigned this Jul 24, 2025
@LouisTsai-Csie LouisTsai-Csie force-pushed the benchmark-test-type branch 2 times, most recently from 641036c to af00ec2 Compare August 8, 2025 10:07
@LouisTsai-Csie LouisTsai-Csie marked this pull request as ready for review August 11, 2025 09:52
@LouisTsai-Csie
Copy link
Collaborator Author

There are some issue in generating the fixture. I compare to the newly created fixture, and the size is much larger than the original one. This should not happen and there should be the same content, so the same size. But this is not a big problem now.

The major issue now is to resolve the failing test in CI, which I could not reproduce now locally.

@LouisTsai-Csie LouisTsai-Csie marked this pull request as draft August 14, 2025 16:30
@CPerezz
Copy link

CPerezz commented Aug 29, 2025

This can come in handy for benchmark tests as basically they force the consumption of all the gas available. And that condition forces us to implement padding techniques to consume EXACTLY all the gas available in a block.

When in reality, for a benchmark, we don't care about this at all.
PRs affected:

@LouisTsai-Csie
Copy link
Collaborator Author

@CPerezz I think this is still necessary for Nethermind team (Increasing gas limit) and zkEVM team (proving the entire block)? For gas limit testing, I am not sure if they can only run 1 tx and then derive the entire block execution time from it

@CPerezz
Copy link

CPerezz commented Aug 30, 2025

@CPerezz I think this is still necessary for Nethermind team (Increasing gas limit) and zkEVM team (proving the entire block)? For gas limit testing, I am not sure if they can only run 1 tx and then derive the entire block execution time from it

But you can emit a warning if needed. Why does it need to be a failure not spending ALL the gas exactly? I agree it has to be within a bound. Sure. But to the unit in precision is really different. Specially when you have to account for mem expansion and other costs. It's almost impossible to not need padding.

I'm not advocating to remove this completely. But to relax it maybe. Or at least, it would be useful to know why does it need to fail specifically? When and Why was this introduced?

@LouisTsai-Csie
Copy link
Collaborator Author

@CPerezz Thank you for explanation, it is very clear! I will review the features included again and discuss with the team.

As you see this is still a draft and we welcome any feedback, we also want to know what does stateless client team need for benchmarking, what's your consideration when benchmarking?

@CPerezz
Copy link

CPerezz commented Sep 1, 2025

@LouisTsai-Csie So I'm just speaking in regards of "State bottlenecks" project. Which is within the stateless-consensus team. Our goal is to measure how different client impls behave when under heavy load and different state sizes among other things.

For that, we need these kind of benchmarks. But it results quite tricky to match perfectly the gas spent. And it's not required at all to be spent. 1% of wiggle room is enough to consider the benchmark useful even if it doesn't spend all the gas of the block.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants