IvanAnishchuk · IvanAnishchuk · Nov 21, 2025 · Dec 2, 2025 · Dec 2, 2025 · Dec 20, 2025
diff --git a/pyproject.toml b/pyproject.toml
@@ -24,6 +24,7 @@ dependencies = [
   "ruamel.yaml==0.18.15",
   "setuptools==80.9.0",
   "trie==3.1.0",
+  "wrapt==2.0.1",
 ]
 
 [project.optional-dependencies]

diff --git a/tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py b/tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py
@@ -93,6 +93,16 @@ def execute_test(test_case: TestCase, dumper: Dumper):
         for name, kind, data in test_case.case_fn():
             if kind == "meta":
                 meta[name] = data
+            elif kind == "pydantic":
+                # new data type for spec traces
+                outputs += [
+                    # dump trace data for yaml serialization
+                    ("trace", "data", data.model_dump(mode="json", exclude_none=True)),
+                ] + [
+                    (name, "ssz", value)
+                    # ssz artifacts are already serialized and will be compressed by the dumper
+                    for name, value in data.artifacts.items()
+                ]
             else:
                 method = getattr(dumper, f"dump_{kind}", None)
                 if method is None:

diff --git a/tests/infra/trace/README.md b/tests/infra/trace/README.md
@@ -0,0 +1,133 @@
+## Spec trace framework
+
+This is an implementation of #4603 a new testing framework for the Ethereum
+consensus spec tests, based on tracing spec method calls and recording them in a
+structured trace file.
+
+The basic idea is make tests more simple and linear and hide the minutiae of
+dumping data into the test harness (`@spec_trace` decorator) and automate
+everything that doesn't have to be manual.
+
+Spec is being wrapped into a transparent proxy object and all method calls are
+being tracked including any state mutations before and after. Final state is
+recorded and all relevant artifacts including all states are being saved into
+SSZ artifacts (hash-addressed to avoid duplication).
+
+### Usage and example test
+
+```python
+from tests.infra.trace import spec_trace
+
+
+@with_all_phases
+@spec_state_test  # keep these like before
+@spec_trace  # this is the thing that makes the magic happen
+def test_linear_sanity_slots_222(
+    spec, state
+):  # spec and state can be positional but the name matters
+    # just use spec methods, they are traced automagically, and state is dumped
+    spec.process_slot(state)
+```
+
+this is for example purposes put into
-this is for example purposes put into
+This is for example purposes put into
-this is for example purposes put into
+This is for example purposes put into
+`tests/core/pyspec/eth2spec/test/gloas/sanity/test_slots_2.py` and can be run
+with something like
+
+```
+make reftests fork=gloas runner=sanity k=linear_sanity_slots_222 verbose=true
+```
+
+that produces a trace in
+`../consensus-spec-tests/tests/minimal/gloas/sanity/slots_2/pyspec_tests/linear_sanity_slots_222/trace.yaml`
+
+### Spec trace file example
+
+```yaml
+default_fork: gloas
+trace:
+- {op: load_state, state_root:
+    95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
+- op: spec_call
+  method: process_slot
+  input: {state:
+      95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
+- {op: assert_state, state_root:
+    41f562b491baaa9fdd981973c8aef64bb7c663c4b07f35141c16afc9e11184c1.ssz_snappy}
+```
+
+In this example, `process_slot` does not return anything but we can see the
+initial state and the final state being dumped automatically and they are
+different. In the other more complex example test (omitted here for brevity) we
+can examine how complex inputs and outputs being dumped and how out-of-band
+state mutations are being tracked with assert and load steps.
+
+A non-normative example of a little more complex inputs and outputs:
+
+```yaml
+- op: spec_call
+  method: get_current_epoch
+  input: {state:
+      0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy}
+  assert_output: 2
+- op: spec_call
+  method: get_seed
+  input:
+    state:
+      0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy
+    epoch: 2
+    domain_type: '0x00000000'
+  assert_output: '0x79edc6cbb9ffac34477afe87d8569b7afab320241ce69d70c6d0c0a1839379df'
+```
+
+simple primitive types here (e.g. int or bytes, not containers) are serialized
+directly into yaml (even in cases where they subclass SSZ `View` and can
+technically be dumped as ssz artifacts), bytes are dumped as 0x-prefixed hex
+strings which seems appropriate. SSZ artifacts are always referred to by root
+hash-based filename so that there's no need to maintain any mappings or
+filename-generating logic.
+
+### Implementation details
+
+wrapt is used to wrap spec methods and record their calls, parameters and
+results. A decorator is used to set things up. Some simple pydantic models are
+used for the trace file structure and some sanitation/formatting.
-used for the trace file structure and some sanitation/formatting.
+used for the trace file structure and some sanitization/formatting.
-used for the trace file structure and some sanitation/formatting.
+used for the trace file structure and some sanitization/formatting.
+
+From a consumer standpoint (i.e. test runner) new tests using this decorator
+behave differently and are being detected by a new data type (a pydantic model
+instance). Some logic was added to `execute_test` in
+`tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py` to catch that
+new case and apply new serialization method.
+
+The way `wrapt.ObjectProxy` operates is that it allows you to create a proxy
+object for e.g. consensus spec module and override things on it without
+affecting any of the underlying logic (unlike monkey-patching). In our case here
+we override all lowercase methods in the spec object by wrapping them in a
+`wrapt` decorator with tracer function. Whenever a state is detected in any of
+the method calls it gets automatically tracked and then it's checked again after
+each method call to check for mutations. Everything is saved in a pydantic model
+object for further dumping using existing reftest tooling.
+
+### TODO
+
+This is still being cooked.
+
+Integration with test runner is done by yielding pydantic model as a new data
+type and very simple change in the runner. It would be more natural to just
+return the value but that would require supporting non-generator functions as
+tests which would require much more code than seems reasonable here.
+
+I tried my best to separate core logic from the boilerplate needed but it could
+be improved upon.
+
+Some cleanup and polishing is still required. Typing could be improved.
+
+### Credits
+
+Thanks to Leo for the initial idea and guidance, and to all the reviewers who
+helped refine this.
+
+Thanks to Cristobal for the first prototype of this framework, it's not used
+here but I reviewed 4724 and got some inspiration from that.
+
+Thanks to IG organizers, mentors, sponsors and fellow builders for making this
+possible!
diff --git a/tests/infra/trace/__init__.py b/tests/infra/trace/__init__.py
@@ -0,0 +1 @@
+from .decorator import spec_trace  # noqa: F401
diff --git a/tests/infra/trace/decorator.py b/tests/infra/trace/decorator.py
@@ -0,0 +1,53 @@
+import functools
+import inspect
+from collections.abc import Callable, Generator
+from typing import Any
+
+from .traced_spec import RecordingSpec
+
+
+def spec_trace(fn: Callable) -> Callable:
+    """
+    Decorator to wrap a pyspec test and record execution traces.
+    Usage:
+        @with_all_phases  # or other decorators
+        @spec_state_test  # still needed as before
+        @spec_trace  # new decorator to record trace
+        def test_my_feature(spec, ...):
+            ...
+    """
+
+    @functools.wraps(fn)
+    def wrapper(*args: Any, **kwargs: Any) -> Generator:
+        # 1. Bind arguments to find 'spec' and fixtures
+        try:
+            bound_args = inspect.signature(fn).bind(*args, **kwargs)
+            bound_args.apply_defaults()
+        except TypeError as e:
+            raise ValueError(
+                f"Failed to bind arguments for test function '{fn.__name__}': {e}"
+            ) from e
-            raise ValueError(
-                f"Failed to bind arguments for test function '{fn.__name__}': {e}"
-            ) from e
+            raise TypeError(
+                f"Failed to bind arguments for test function '{fn.__name__}': {e}"
+            ) from e
-            raise ValueError(
-                f"Failed to bind arguments for test function '{fn.__name__}': {e}"
-            ) from e
+            raise TypeError(
+                f"Failed to bind arguments for test function '{fn.__name__}': {e}"
+            ) from e
+
+        if "spec" not in bound_args.arguments:
+            raise ValueError(
+                f"spec argument not found for test function '{fn.__name__}', cannot proceed"
+            )
+
+        # 2. Get the actual spec instance
+        real_spec = bound_args.arguments["spec"]
+
+        # 3. Inject the recorder
+        recorder = RecordingSpec(real_spec)
+        bound_args.arguments["spec"] = recorder
+
+        # 4. Run test & Save trace
+        try:
+            fn(*bound_args.args, **bound_args.kwargs)
+        finally:
+            # we need to do this after execution is done before returning data
+            recorder._finalize_trace()
+
+            # yield data so that runner can pick it up and dump
+            yield "trace", "pydantic", recorder.model
+
+    return wrapper
diff --git a/tests/infra/trace/example_trace.yaml b/tests/infra/trace/example_trace.yaml
@@ -0,0 +1,106 @@
+default_fork: gloas
+trace:
+- {op: load_state, state_root: 
+    95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
+- op: spec_call
+  method: process_slots
+  input: {state: 
+      95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy,
+    slot: 8}
+- op: spec_call
+  method: process_slots
+  input: {state: 
+      5eda5f14e032cc1821dd272f3fcbdf031f1bb7ea0a56eda0b247e9eabb09dd53.ssz_snappy,
+    slot: 16}
+- {op: assert_state, state_root: 
+    94921c903a7696c239dc749d38a1963dfc062fb08885340f39279a5346bdea62.ssz_snappy}
+- {op: load_state, state_root: 
+    f766adc187b97a82f93e93d282198353b3aea83a74a91d94c992535cb1a9bb28.ssz_snappy}
+- op: spec_call
+  method: process_slots
+  input: {state: 
+      f766adc187b97a82f93e93d282198353b3aea83a74a91d94c992535cb1a9bb28.ssz_snappy,
+    slot: 23}
+- op: spec_call
+  method: process_slot
+  input: {state: 
+      d0c2247f19af245cf9822678aa3ab52cc076e784f23b1146372cf41433a0a395.ssz_snappy}
+- op: spec_call
+  method: process_justification_and_finalization
+  input: {state: 
+      6e2a5be5877d15d253fa89ebc2143d2ee6e5c0a4777aea8ea6a197e60185c7f9.ssz_snappy}
+- op: spec_call
+  method: process_inactivity_updates
+  input: {state: 
+      6e2a5be5877d15d253fa89ebc2143d2ee6e5c0a4777aea8ea6a197e60185c7f9.ssz_snappy}
+- op: spec_call
+  method: process_rewards_and_penalties
+  input: {state: 
+      6e2a5be5877d15d253fa89ebc2143d2ee6e5c0a4777aea8ea6a197e60185c7f9.ssz_snappy}
+- op: spec_call
+  method: process_registry_updates
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_slashings
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_eth1_data_reset
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_pending_deposits
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_pending_consolidations
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_effective_balance_updates
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_slashings_reset
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_randao_mixes_reset
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_historical_summaries_update
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_participation_flag_updates
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_sync_committee_updates
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_proposer_lookahead
+  input: {state: 
+      a7f798e2212b92d0e2185444ecb5785c60e2548f2e95cb4d5518ed26e8dc017f.ssz_snappy}
+- op: spec_call
+  method: process_builder_pending_payments
+  input: {state: 
+      216dcc57820542ed089ebc0a7cf482272af284e4b750f72c5b217154d4caf425.ssz_snappy}
+- op: spec_call
+  method: get_current_epoch
+  input: {state: 
+      0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy}
+  assert_output: 2
+- op: spec_call
+  method: get_seed
+  input:
+    state: 
+      0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy
+    epoch: 2
+    domain_type: '0x00000000'
+  assert_output: '0x79edc6cbb9ffac34477afe87d8569b7afab320241ce69d70c6d0c0a1839379df'
+- {op: assert_state, state_root: 
+    0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy}