Skip to content

Commit b3624b5

Browse files
LouisTsai-Csiemarioevzdanceratopz
authored
docs(benchmark): add a benchmark section and fix fill command (#2093)
Co-authored-by: Mario Vega <[email protected]> Co-authored-by: danceratopz <[email protected]>
1 parent 6b273bd commit b3624b5

File tree

6 files changed

+100
-13
lines changed

6 files changed

+100
-13
lines changed

docs/navigation.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
* [Adding a New Test](writing_tests/adding_a_new_test.md)
1616
* [Types of Test](writing_tests/types_of_tests.md)
1717
* [Writing a New Test](writing_tests/writing_a_new_test.md)
18+
* [Benchmarks](writing_tests/benchmarks.md)
1819
* [Test Markers](writing_tests/test_markers.md)
1920
* [Verifying Changes Locally](writing_tests/verifying_changes.md)
2021
* [Code Standards](writing_tests/code_standards.md)

docs/templates/base.md.j2

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,11 @@ Documentation for [`{{ pytest_node_id }}@{{ short_git_ref }}`]({{ source_code_ur
99
!!! example "Generate fixtures for these test cases for {{ target_or_valid_fork }} with:"
1010

1111
```console
12+
{% if is_benchmark %}
13+
fill -v {{ pytest_node_id }} -m benchmark
14+
{% else %}
1215
fill -v {{ pytest_node_id }} --fork {{ target_or_valid_fork }}
16+
{% endif %}
1317
```
1418
{% endif %}
1519

docs/writing_tests/benchmarks.md

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
# Benchmark Test Cases
2+
3+
Benchmark tests aim to maximize the usage of a specific opcode, precompile, or operation within a transaction or block. They are located in the `./tests/benchmarks` folder and the available test cases are documented in [test case reference](../tests/benchmark/index.md).
4+
5+
To fill a benchmark test, in addition to the usual test flags, you must include the `-m benchmark` flag. This is necessary because benchmark tests are ignored by default; they must be manually selected via the `benchmark` pytest marker (="tag"). This marker is applied to all tests under `./tests/benchmark/` automatically by the framework.
6+
7+
## Setting the Gas Limit for Benchmarking
8+
9+
To consume the full benchmark gas limit, use the `gas_benchmark_value` fixture as the gas limit:
10+
11+
```py
12+
def test_benchmark(
13+
blockchain_test: BlockchainTestFiller,
14+
pre: Alloc,
15+
gas_benchmark_value: int
16+
):
17+
...
18+
```
19+
20+
You can specify the block gas limit used in benchmark tests by setting the `--gas-benchmark-values` flag. This flag accepts a comma-separated list of values (in millions of gas), e.g. `--gas-benchmark-values 1,10,45,60`. This example would run the test 4 times, using a `gas_benchmark_value` of 1M, 10M, 45M, and 60M respectively.
21+
22+
Do not configure the transaction/block gas limit to `env.gas_limit`. When running in benchmark mode, the test framework sets this value to a very large number (e.g., `1_000_000_000_000`), this setup allows the framework to reuse a single genesis file for all specified gas limits. I.e., the example below is invalid:
23+
24+
```py
25+
def test_benchmark(
26+
blockchain_test: BlockchainTestFiller,
27+
pre: Alloc,
28+
env: Environment
29+
):
30+
...
31+
tx = Transaction(
32+
to=opcode_address,
33+
gas_limit=env.gas_limit, # Do not set the gas_limit manually.
34+
sender=pre.fund_eoa(),
35+
)
36+
...
37+
```
38+
39+
## Expected Gas Usage
40+
41+
In benchmark mode, the developer should set the expected gas consumption using the `expected_benchmark_gas_used` field. Benchmark tests do not need to consume the full gas limit, instead, you could calculate and specify the expected usage. If `expected_benchmark_gas_used` is not set, the test will fall back to using `gas_benchmark_value` as the expected value.
42+
43+
```py
44+
@pytest.mark.valid_from("Prague")
45+
def test_empty_block(
46+
blockchain_test: BlockchainTestFiller,
47+
pre: Alloc,
48+
):
49+
"""Test running an empty block as a baseline for fixed proving costs."""
50+
blockchain_test(
51+
pre=pre,
52+
post={},
53+
blocks=[Block(txs=[])],
54+
expected_benchmark_gas_used=0,
55+
)
56+
```
57+
58+
This is a safety check to make sure the benchmark works as expected. For example, if a test uses the `JUMP` instruction but the jump destination is invalid, each transaction will stop early. That means it won't use as much gas as we expected.
59+
60+
This check helps catch such issues. As a result, the post-storage comparison method via `SSTORE` is no longer needed, thereby reducing the additional storage cost.

src/pytest_plugins/filler/gen_test_doc/gen_test_doc.py

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -423,6 +423,8 @@ def create_function_page_props(self, test_functions: Dict["str", List[Item]]) ->
423423
]
424424
)
425425

426+
is_benchmark = items[0].get_closest_marker("benchmark") is not None
427+
426428
self.function_page_props[function_id] = FunctionPageProps(
427429
title=get_test_function_name(items[0]),
428430
source_code_url=source_url,
@@ -437,6 +439,7 @@ def create_function_page_props(self, test_functions: Dict["str", List[Item]]) ->
437439
docstring_one_liner=get_docstring_one_liner(items[0]),
438440
html_static_page_target=f"./{get_test_function_name(items[0])}.html",
439441
mkdocs_function_page_target=f"./{get_test_function_name(items[0])}/",
442+
is_benchmark=is_benchmark,
440443
)
441444

442445
def create_module_page_props(self) -> None:
@@ -451,6 +454,7 @@ def create_module_page_props(self) -> None:
451454
path=module_path,
452455
pytest_node_id=str(module_path),
453456
package_name=get_import_path(module_path),
457+
is_benchmark=function_page.is_benchmark,
454458
test_functions=[
455459
TestFunction(
456460
name=function_page.title,
@@ -462,6 +466,8 @@ def create_module_page_props(self) -> None:
462466
)
463467
else:
464468
existing_module_page = self.module_page_props[str(function_page.path)]
469+
if function_page.is_benchmark:
470+
existing_module_page.is_benchmark = True
465471
existing_module_page.test_functions.append(
466472
TestFunction(
467473
name=function_page.title,
@@ -493,15 +499,23 @@ def add_directory_page_props(self) -> None:
493499
fork = self.target_fork
494500
else:
495501
fork = directory_fork_name
502+
503+
is_benchmark = any(
504+
module_page.is_benchmark
505+
for module_page in self.module_page_props.values()
506+
if module_page.path.parent == directory
507+
)
508+
496509
self.page_props[str(directory)] = DirectoryPageProps(
497510
title=sanitize_string_title(str(directory.name)),
498511
path=directory,
499512
pytest_node_id=str(directory),
500513
source_code_url=generate_github_url(directory, branch_or_commit_or_tag=self.ref),
501514
# TODO: This won't work in all cases; should be from the development fork
502515
# Currently breaks for `tests/unscheduled/eip7692_eof_v1/index.md` # noqa: SC100
503-
target_or_valid_fork=fork.capitalize(),
516+
target_or_valid_fork=fork.capitalize() if fork else "Unknown",
504517
package_name=get_import_path(directory), # init.py will be used for docstrings
518+
is_benchmark=is_benchmark,
505519
)
506520

507521
def find_files_within_collection_scope(self, file_pattern: str) -> List[Path]:

src/pytest_plugins/filler/gen_test_doc/page_props.py

Lines changed: 12 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212

1313
import re
1414
from abc import abstractmethod
15-
from dataclasses import asdict, dataclass
15+
from dataclasses import asdict, dataclass, field
1616
from pathlib import Path
1717
from typing import IO, Any, ContextManager, Dict, List, Protocol
1818

@@ -104,6 +104,7 @@ class PagePropsBase:
104104
path: Path
105105
pytest_node_id: str
106106
package_name: str
107+
is_benchmark: bool = False
107108

108109
@property
109110
@abstractmethod
@@ -137,8 +138,8 @@ def write_page(self, file_opener: FileOpener, jinja2_env: Environment):
137138
class EipChecklistPageProps(PagePropsBase):
138139
"""Properties used to generate the EIP checklist page."""
139140

140-
eip: int
141-
lines: List[str]
141+
eip: int = 0
142+
lines: List[str] = field(default_factory=list)
142143

143144
@property
144145
def template(self) -> str:
@@ -174,13 +175,13 @@ class FunctionPageProps(PagePropsBase):
174175
corresponding static HTML pages.
175176
"""
176177

177-
test_case_count: int
178-
fixture_formats: List[str]
179-
test_type: str
180-
docstring_one_liner: str
181-
html_static_page_target: str
182-
mkdocs_function_page_target: str
183-
cases: List[TestCase]
178+
test_case_count: int = 0
179+
fixture_formats: List[str] = field(default_factory=list)
180+
test_type: str = ""
181+
docstring_one_liner: str = ""
182+
html_static_page_target: str = ""
183+
mkdocs_function_page_target: str = ""
184+
cases: List[TestCase] = field(default_factory=list)
184185

185186
@property
186187
def template(self) -> str:
@@ -229,7 +230,7 @@ class TestFunction:
229230
class ModulePageProps(PagePropsBase):
230231
"""Definitions used for test modules, e.g., `tests/berlin/eip2930_access_list/test_acl.py`."""
231232

232-
test_functions: List[TestFunction]
233+
test_functions: List[TestFunction] = field(default_factory=list)
233234

234235
@property
235236
def template(self) -> str:

tests/benchmark/__init__.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,8 @@
1-
"""abstract: Tests for zkVMs."""
1+
"""
2+
abstract: Benchmark tests for EVMs.
3+
Benchmark tests aim to maximize the usage of a specific opcode,
4+
precompile, or operation within a transaction or block. These can
5+
be executed against EVM implementations to ensure they handle
6+
pathological cases efficiently and correctly, allowing Ethereum to
7+
safely [Scale the L1](https://protocol.ethereum.foundation/).
8+
"""

0 commit comments

Comments
 (0)