Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ test = [
"pytest-html==4.1.1",
"pytest-xdist==3.8.0",
"pytest==8.4.2",
"wrapt==2.0.1",
]
lint = [
"codespell==2.4.1",
Expand Down
8 changes: 8 additions & 0 deletions tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,14 @@ def execute_test(test_case: TestCase, dumper: Dumper):
for name, kind, data in test_case.case_fn():
if kind == "meta":
meta[name] = data
elif kind == "pydantic":
# new data type for spec traces
outputs.append(("trace", "data", data.model_dump(mode="json", exclude_none=True)))
outputs.extend(
(name, "ssz", value)
# ssz artifacts are already serialized and will be compressed by the dumper
for name, value in data.artifacts.items()
)
else:
method = getattr(dumper, f"dump_{kind}", None)
if method is None:
Expand Down
134 changes: 134 additions & 0 deletions tests/infra/trace/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# Spec trace framework

This is an implementation of ethereum/consensus-specs#4603, a new testing
framework for the Ethereum consensus spec tests, based on tracing spec method
calls and recording them in a structured trace file.

The basic idea is to make tests simpler and more linear and hide the minutiae of
dumping data into the test harness (`@spec_trace` decorator) and automate
everything that doesn't have to be manual.

The spec is wrapped into a transparent proxy object and all method calls are
being tracked including any state mutations before and after. Final state is
recorded and all relevant artifacts, including all states, are being saved into
SSZ artifacts (hash-addressed to avoid duplication).

## Usage and example test

You can find this example in `tests/infra/trace/test_example_slots_2.py`:

```python
from tests.infra.trace.decorator import spec_trace


@with_all_phases
@spec_state_test # keep these like before
@spec_trace # this is the thing that makes the magic happen
def test_linear_sanity_slots_222(
spec, state
): # spec and state can be positional but the name matters
# just use spec methods, they are traced automagically, and state is dumped
spec.process_slot(state)
```

Example of using example test with reftests:

```bash
cp -v tests/infra/trace/test_example_slots_2.py tests/core/pyspec/eth2spec/test/gloas/sanity/test_slots_2.py
make reftests fork=gloas runner=sanity k=linear_sanity_slots_222 verbose=true
```

that produces a trace in
`../consensus-spec-tests/tests/minimal/gloas/sanity/slots_2/pyspec_tests/linear_sanity_slots_222/trace.yaml`

## Spec trace file example

```yaml
default_fork: gloas
trace:
- {op: load_state, state_root:
95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
- op: spec_call
method: process_slot
input: {state:
95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
- {op: assert_state, state_root:
41f562b491baaa9fdd981973c8aef64bb7c663c4b07f35141c16afc9e11184c1.ssz_snappy}
```

In this example, `process_slot` does not return anything but we can see the
initial state and the final state being dumped automatically and they are
different. In the other more complex example test (omitted here for brevity) we
can examine how complex inputs and outputs being dumped and how out-of-band
state mutations are being tracked with assert and load steps.

A non-normative example of a little more complex inputs and outputs:

```yaml
- op: spec_call
method: get_current_epoch
input: {state:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy}
assert_output: 2
- op: spec_call
method: get_seed
input:
state:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy
epoch: 2
domain_type: '0x00000000'
assert_output: '0x79edc6cbb9ffac34477afe87d8569b7afab320241ce69d70c6d0c0a1839379df'
```

simple primitive types here (e.g. int or bytes, not containers) are serialized
directly into yaml (even in cases where they subclass SSZ `View` and can
technically be dumped as ssz artifacts), bytes are dumped as 0x-prefixed hex
strings which seems appropriate. SSZ artifacts are always referred to by root
hash-based filename so that there's no need to maintain any mappings or
filename-generating logic.

### Implementation details

wrapt is used to wrap spec methods and record their calls, parameters and
results. A decorator is used to set things up. Some simple pydantic models are
used for the trace file structure and some sanitation/formatting.

From a consumer standpoint (i.e. test runner) new tests using this decorator
behave differently and are being detected by a new data type yielded (a pydantic
model instance). Some logic was added to `execute_test` in
`tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py` to catch that
new case and apply new serialization method.

The way `wrapt.ObjectProxy` operates is that it allows you to create a proxy
object for e.g. consensus spec module and override things on it without
affecting any of the underlying logic (unlike monkey-patching). In our case here
we override all lowercase methods in the spec object by wrapping them in a
`wrapt` decorator with tracer function. Whenever a state is detected in any of
the method calls it gets automatically tracked and then it's checked again after
each method call to check for mutations. Everything is saved in a pydantic model
object for further dumping using existing reftest tooling.

## TODO

This is still being cooked.

I tried my best to separate core logic from the boilerplate needed but it could
be improved upon.

Some cleanup and polishing is still required.

Typing could be improved.

More example tests showcasing new features (or potentially some actually needed
tests that were waiting for this) could be added.

## Credits

Thanks to Leo for the initial idea and guidance, and to all the reviewers who
helped refine this.

Thanks to Cristobal for the first prototype of this framework, it's not used
here but I reviewed 4724 and got some inspiration from that.

Thanks to IG organizers, mentors, sponsors and fellow builders for making this
possible!
Empty file added tests/infra/trace/__init__.py
Empty file.
51 changes: 51 additions & 0 deletions tests/infra/trace/decorator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
import functools
import inspect
from collections.abc import Callable, Generator
from typing import Any

from .traced_spec import RecordingSpec


def spec_trace(fn: Callable) -> Callable:
"""
Decorator to wrap a pyspec test and record execution traces.
Usage:
@with_all_phases # or other decorators
@spec_state_test # still needed as before
@spec_trace # new decorator to record trace
def test_my_feature(spec, ...):
...
"""

@functools.wraps(fn)
def wrapper(*args: Any, **kwargs: Any) -> Generator:
# 1. Bind arguments to find 'spec' and fixtures
try:
bound_args = inspect.signature(fn).bind(*args, **kwargs)
bound_args.apply_defaults()
except TypeError as e:
raise TypeError(
f"Failed to bind arguments for test function '{fn.__name__}': {e}"
) from e

if "spec" not in bound_args.arguments:
raise ValueError(
f"spec argument not found for test function '{fn.__name__}', cannot proceed"
)

# 2. Get the actual spec instance
real_spec = bound_args.arguments["spec"]

# 3. Inject the recorder
recorder: RecordingSpec = RecordingSpec(real_spec)
bound_args.arguments["spec"] = recorder

# 4. Run test & Save trace
fn(*bound_args.args, **bound_args.kwargs)
# we need to do this after execution is done before returning data
recorder.finalize_trace()

# yield data so that runner can pick it up and dump
yield "trace", "pydantic", recorder.model

return wrapper
106 changes: 106 additions & 0 deletions tests/infra/trace/example_trace.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
default_fork: gloas
trace:
- {op: load_state, state_root:
95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
- op: spec_call
method: process_slots
input: {state:
95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy,
slot: 8}
- op: spec_call
method: process_slots
input: {state:
5eda5f14e032cc1821dd272f3fcbdf031f1bb7ea0a56eda0b247e9eabb09dd53.ssz_snappy,
slot: 16}
- {op: assert_state, state_root:
94921c903a7696c239dc749d38a1963dfc062fb08885340f39279a5346bdea62.ssz_snappy}
- {op: load_state, state_root:
f766adc187b97a82f93e93d282198353b3aea83a74a91d94c992535cb1a9bb28.ssz_snappy}
- op: spec_call
method: process_slots
input: {state:
f766adc187b97a82f93e93d282198353b3aea83a74a91d94c992535cb1a9bb28.ssz_snappy,
slot: 23}
- op: spec_call
method: process_slot
input: {state:
d0c2247f19af245cf9822678aa3ab52cc076e784f23b1146372cf41433a0a395.ssz_snappy}
- op: spec_call
method: process_justification_and_finalization
input: {state:
6e2a5be5877d15d253fa89ebc2143d2ee6e5c0a4777aea8ea6a197e60185c7f9.ssz_snappy}
- op: spec_call
method: process_inactivity_updates
input: {state:
6e2a5be5877d15d253fa89ebc2143d2ee6e5c0a4777aea8ea6a197e60185c7f9.ssz_snappy}
- op: spec_call
method: process_rewards_and_penalties
input: {state:
6e2a5be5877d15d253fa89ebc2143d2ee6e5c0a4777aea8ea6a197e60185c7f9.ssz_snappy}
- op: spec_call
method: process_registry_updates
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_slashings
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_eth1_data_reset
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_pending_deposits
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_pending_consolidations
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_effective_balance_updates
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_slashings_reset
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_randao_mixes_reset
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_historical_summaries_update
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_participation_flag_updates
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_sync_committee_updates
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_proposer_lookahead
input: {state:
a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
- op: spec_call
method: process_builder_pending_payments
input: {state:
216dcc57820542ed089ebc0a7cf482272af284e4b750f72c5b217154d4caf425.ssz_snappy}
- op: spec_call
method: get_current_epoch
input: {state:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy}
assert_output: 2
- op: spec_call
method: get_seed
input:
state:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy
epoch: 2
domain_type: '0x00000000'
assert_output: '0x79edc6cbb9ffac34477afe87d8569b7afab320241ce69d70c6d0c0a1839379df'
- {op: assert_state, state_root:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy}
Loading