Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ dependencies = [
"ruamel.yaml==0.18.15",
"setuptools==80.9.0",
"trie==3.1.0",
"wrapt==2.0.1",
]

[project.optional-dependencies]
Expand Down
10 changes: 10 additions & 0 deletions tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,16 @@ def execute_test(test_case: TestCase, dumper: Dumper):
for name, kind, data in test_case.case_fn():
if kind == "meta":
meta[name] = data
elif kind == "pydantic":
# new data type for spec traces
outputs += [
# dump trace data for yaml serialization
("trace", "data", data.model_dump(mode="json", exclude_none=True)),
] + [
(name, "ssz", value)
# ssz artifacts are already serialized and will be compressed by the dumper
for name, value in data.artifacts.items()
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list concatenation [...] + [...] with += creates an unnecessary intermediate list. For better readability and a minor performance improvement, you could use list.append and list.extend in separate steps.

                # new data type for spec traces
                outputs.append(("trace", "data", data.model_dump(mode="json", exclude_none=True)))
                outputs.extend(
                    (name, "ssz", value)
                    # ssz artifacts are already serialized and will be compressed by the dumper
                    for name, value in data.artifacts.items()
                )

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list concatenation outputs += [...] + [...] creates two intermediate lists before extending the outputs list. This can be made more readable and slightly more performant by using list.append and list.extend directly.

                # dump trace data for yaml serialization
                outputs.append(("trace", "data", data.model_dump(mode="json", exclude_none=True)))
                # ssz artifacts are already serialized and will be compressed by the dumper
                outputs.extend(
                    (name, "ssz", value)
                    for name, value in data.artifacts.items()
                )

else:
method = getattr(dumper, f"dump_{kind}", None)
if method is None:
Expand Down
133 changes: 133 additions & 0 deletions tests/infra/trace/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
## Spec trace framework

This is an implementation of #4603 a new testing framework for the Ethereum
consensus spec tests, based on tracing spec method calls and recording them in a
structured trace file.

The basic idea is make tests more simple and linear and hide the minutiae of
dumping data into the test harness (`@spec_trace` decorator) and automate
everything that doesn't have to be manual.

Spec is being wrapped into a transparent proxy object and all method calls are
being tracked including any state mutations before and after. Final state is
recorded and all relevant artifacts including all states are being saved into
SSZ artifacts (hash-addressed to avoid duplication).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This README is very informative! I have a few minor suggestions to improve grammar and clarity for even better readability:

These are just suggestions, feel free to apply them as you see fit.


### Usage and example test

```python
from tests.infra.trace import spec_trace


@with_all_phases
@spec_state_test # keep these like before
@spec_trace # this is the thing that makes the magic happen
def test_linear_sanity_slots_222(
spec, state
): # spec and state can be positional but the name matters
# just use spec methods, they are traced automagically, and state is dumped
spec.process_slot(state)
```

this is for example purposes put into

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's standard to capitalize the first word of a sentence.

Suggested change
this is for example purposes put into
This is for example purposes put into

`tests/core/pyspec/eth2spec/test/gloas/sanity/test_slots_2.py` and can be run
with something like

```
make reftests fork=gloas runner=sanity k=linear_sanity_slots_222 verbose=true
```

that produces a trace in
`../consensus-spec-tests/tests/minimal/gloas/sanity/slots_2/pyspec_tests/linear_sanity_slots_222/trace.yaml`

### Spec trace file example

```yaml
default_fork: gloas
trace:
- {op: load_state, state_root:
95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
- op: spec_call
method: process_slot
input: {state:
95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
- {op: assert_state, state_root:
41f562b491baaa9fdd981973c8aef64bb7c663c4b07f35141c16afc9e11184c1.ssz_snappy}
```

In this example, `process_slot` does not return anything but we can see the
initial state and the final state being dumped automatically and they are
different. In the other more complex example test (omitted here for brevity) we
can examine how complex inputs and outputs being dumped and how out-of-band
state mutations are being tracked with assert and load steps.

A non-normative example of a little more complex inputs and outputs:

```yaml
- op: spec_call
method: get_current_epoch
input: {state:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy}
assert_output: 2
- op: spec_call
method: get_seed
input:
state:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy
epoch: 2
domain_type: '0x00000000'
assert_output: '0x79edc6cbb9ffac34477afe87d8569b7afab320241ce69d70c6d0c0a1839379df'
```

simple primitive types here (e.g. int or bytes, not containers) are serialized
directly into yaml (even in cases where they subclass SSZ `View` and can
technically be dumped as ssz artifacts), bytes are dumped as 0x-prefixed hex
strings which seems appropriate. SSZ artifacts are always referred to by root
hash-based filename so that there's no need to maintain any mappings or
filename-generating logic.

### Implementation details

wrapt is used to wrap spec methods and record their calls, parameters and
results. A decorator is used to set things up. Some simple pydantic models are
used for the trace file structure and some sanitation/formatting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A minor wording suggestion: 'sanitization' is the more common term in software engineering for cleaning or filtering data, whereas 'sanitation' typically refers to public health. Using 'sanitization' would be more precise here.

Suggested change
used for the trace file structure and some sanitation/formatting.
used for the trace file structure and some sanitization/formatting.


From a consumer standpoint (i.e. test runner) new tests using this decorator
behave differently and are being detected by a new data type (a pydantic model
instance). Some logic was added to `execute_test` in
`tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py` to catch that
new case and apply new serialization method.

The way `wrapt.ObjectProxy` operates is that it allows you to create a proxy
object for e.g. consensus spec module and override things on it without
affecting any of the underlying logic (unlike monkey-patching). In our case here
we override all lowercase methods in the spec object by wrapping them in a
`wrapt` decorator with tracer function. Whenever a state is detected in any of
the method calls it gets automatically tracked and then it's checked again after
each method call to check for mutations. Everything is saved in a pydantic model
object for further dumping using existing reftest tooling.

### TODO

This is still being cooked.

Integration with test runner is done by yielding pydantic model as a new data
type and very simple change in the runner. It would be more natural to just
return the value but that would require supporting non-generator functions as
tests which would require much more code than seems reasonable here.

I tried my best to separate core logic from the boilerplate needed but it could
be improved upon.

Some cleanup and polishing is still required. Typing could be improved.

### Credits

Thanks to Leo for the initial idea and guidance, and to all the reviewers who
helped refine this.

Thanks to Cristobal for the first prototype of this framework, it's not used
here but I reviewed 4724 and got some inspiration from that.

Thanks to IG organizers, mentors, sponsors and fellow builders for making this
possible!
1 change: 1 addition & 0 deletions tests/infra/trace/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .decorator import spec_trace # noqa: F401
53 changes: 53 additions & 0 deletions tests/infra/trace/decorator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
import functools
import inspect
from collections.abc import Callable, Generator
from typing import Any

from .traced_spec import RecordingSpec


def spec_trace(fn: Callable) -> Callable:
"""
Decorator to wrap a pyspec test and record execution traces.
Usage:
@with_all_phases # or other decorators
@spec_state_test # still needed as before
@spec_trace # new decorator to record trace
def test_my_feature(spec, ...):
...
"""

@functools.wraps(fn)
def wrapper(*args: Any, **kwargs: Any) -> Generator:
# 1. Bind arguments to find 'spec' and fixtures
try:
bound_args = inspect.signature(fn).bind(*args, **kwargs)
bound_args.apply_defaults()
except TypeError as e:
raise ValueError(
f"Failed to bind arguments for test function '{fn.__name__}': {e}"
) from e

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good practice to raise a more specific error when arguments fail to bind. A TypeError is often raised in this scenario, but wrapping it in a ValueError provides more context about the failure's significance in this specific application. Consider re-raising as a TypeError for better semantic accuracy, as the issue is with the type or number of arguments passed to the function, not their values.

Suggested change
raise ValueError(
f"Failed to bind arguments for test function '{fn.__name__}': {e}"
) from e
raise TypeError(
f"Failed to bind arguments for test function '{fn.__name__}': {e}"
) from e


if "spec" not in bound_args.arguments:
raise ValueError(
f"spec argument not found for test function '{fn.__name__}', cannot proceed"
)

# 2. Get the actual spec instance
real_spec = bound_args.arguments["spec"]

# 3. Inject the recorder
recorder = RecordingSpec(real_spec)
bound_args.arguments["spec"] = recorder

# 4. Run test & Save trace
try:
fn(*bound_args.args, **bound_args.kwargs)
finally:
# we need to do this after execution is done before returning data
recorder._finalize_trace()

# yield data so that runner can pick it up and dump
yield "trace", "pydantic", recorder.model

return wrapper
106 changes: 106 additions & 0 deletions tests/infra/trace/example_trace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
default_fork: gloas
trace:
- {op: load_state, state_root:
95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
- op: spec_call
method: process_slots
input: {state:
95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy,
slot: 8}
- op: spec_call
method: process_slots
input: {state:
5eda5f14e032cc1821dd272f3fcbdf031f1bb7ea0a56eda0b247e9eabb09dd53.ssz_snappy,
slot: 16}
- {op: assert_state, state_root:
94921c903a7696c239dc749d38a1963dfc062fb08885340f39279a5346bdea62.ssz_snappy}
- {op: load_state, state_root:
f766adc187b97a82f93e93d282198353b3aea83a74a91d94c992535cb1a9bb28.ssz_snappy}
- op: spec_call
method: process_slots
input: {state:
f766adc187b97a82f93e93d282198353b3aea83a74a91d94c992535cb1a9bb28.ssz_snappy,
slot: 23}
- op: spec_call
method: process_slot
input: {state:
d0c2247f19af245cf9822678aa3ab52cc076e784f23b1146372cf41433a0a395.ssz_snappy}
- op: spec_call
method: process_justification_and_finalization
input: {state:
6e2a5be5877d15d253fa89ebc2143d2ee6e5c0a4777aea8ea6a197e60185c7f9.ssz_snappy}
- op: spec_call
method: process_inactivity_updates
input: {state:
6e2a5be5877d15d253fa89ebc2143d2ee6e5c0a4777aea8ea6a197e60185c7f9.ssz_snappy}
- op: spec_call
method: process_rewards_and_penalties
input: {state:
6e2a5be5877d15d253fa89ebc2143d2ee6e5c0a4777aea8ea6a197e60185c7f9.ssz_snappy}
- op: spec_call
method: process_registry_updates
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_slashings
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_eth1_data_reset
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_pending_deposits
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_pending_consolidations
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_effective_balance_updates
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_slashings_reset
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_randao_mixes_reset
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_historical_summaries_update
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_participation_flag_updates
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_sync_committee_updates
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_proposer_lookahead
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_builder_pending_payments
input: {state:
216dcc57820542ed089ebc0a7cf482272af284e4b750f72c5b217154d4caf425.ssz_snappy}
- op: spec_call
method: get_current_epoch
input: {state:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy}
assert_output: 2
- op: spec_call
method: get_seed
input:
state:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy
epoch: 2
domain_type: '0x00000000'
assert_output: '0x79edc6cbb9ffac34477afe87d8569b7afab320241ce69d70c6d0c0a1839379df'
- {op: assert_state, state_root:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy}
Loading
Loading