Skip to content

Conversation

@IvanAnishchuk
Copy link
Owner

@IvanAnishchuk IvanAnishchuk commented Dec 20, 2025

Testing things

Summary by CodeRabbit

  • New Features

    • Test runner and tracing framework now emit structured pydantic spec traces, expanded trace outputs, and SSZ artifacts with sanitized serialized inputs/outputs.
  • Documentation

    • Added README documenting the trace framework, trace file format, usage examples, and integration notes.
  • Tests

    • Added unit and example tests covering tracing, sanitization, artifact handling, and end-to-end trace generation.
  • Chores

    • Added runtime test dependency: wrapt.

✏️ Tip: You can customize this high-level summary in your review settings.

pydantic models for the spec trace
core spec tracing logic
use wrapt to wrap the spec and intercept the calls
tracing decorator
some basic unit tests for the trace recorder
some converted test examples
use 0x prefix for hex bytes in trace
a README with a short explanation how tracing works
add "method" to StepModel for spec_call op
remove unneeded things
address some more requirements, format, etc.
new approach - decorator just generates data for dumper
add the auto-assert of state in the end of test trace
adjust assert/load tracing logic according to the issue
rename record_spec_trace -> spec_trace
test fixes
more simplicity
some cleanup
@gemini-code-assist
Copy link

Summary of Changes

Hello @IvanAnishchuk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request implements a novel tracing framework for Ethereum consensus specification tests, dubbed 'neotrace'. The primary goal is to enhance the testing process by providing a detailed, automated record of spec method interactions. It achieves this by dynamically proxying spec methods to capture arguments, return values, and state mutations, which are then serialized into human-readable YAML trace files alongside their corresponding SSZ artifacts. This framework aims to make tests more linear and simplify the dumping of complex data, offering a clearer insight into test execution flows.

Highlights

  • New Spec Tracing Framework: Introduces 'neotrace', a new framework for Ethereum consensus spec tests that automatically records method calls and state changes.
  • Transparent Method Proxying: Leverages the wrapt library to transparently wrap spec methods, allowing for interception and recording of calls without altering the original logic.
  • Structured Trace File Generation: Generates structured YAML trace files that detail execution steps, including load_state, spec_call, and assert_state operations, along with inputs, outputs, and state roots.
  • SSZ Artifact Management: Automatically saves relevant SSZ artifacts (like states and blocks) as content-addressed files, avoiding duplication and simplifying data management.
  • Pydantic Model Integration: Uses Pydantic models to define the schema for the trace files, ensuring data validation and consistent serialization.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@IvanAnishchuk
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an impressive new tracing framework for Ethereum consensus spec tests. The use of wrapt for transparently proxying the spec object and Pydantic for data modeling is well-executed. The implementation is clean and the addition of unit tests for the tracer itself is a great practice.

I've left a few minor comments, mostly related to documentation typos and small code style improvements. Overall, this is a solid contribution that should significantly simplify writing and maintaining spec tests.

Comment on lines 98 to 105
outputs += [
# dump trace data for yaml serialization
("trace", "data", data.model_dump(mode="json", exclude_none=True)),
] + [
(name, "ssz", value)
# ssz artifacts are already serialized and will be compressed by the dumper
for name, value in data._artifacts.items()
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list concatenation [...] + [...] followed by outputs += can be slightly simplified by using outputs.extend() with a generator expression that chains the two parts. This avoids creating an intermediate list.

                outputs.extend([
                    # dump trace data for yaml serialization
                    ("trace", "data", data.model_dump(mode="json", exclude_none=True)),
                    *((name, "ssz", value)
                      # ssz artifacts are already serialized and will be compressed by the dumper
                      for name, value in data._artifacts.items()),
                ])

consensus spec tests, based on tracing spec method calls and recording them in a
structured trace file.

The basic idea is make tests more simple and linear and hide the minutae of

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo found: minutae should be minutiae.

Suggested change
The basic idea is make tests more simple and linear and hide the minutae of
The basic idea is make tests more simple and linear and hide the minutiae of

structured trace file.

The basic idea is make tests more simple and linear and hide the minutae of
dumping data into the test harness (`@spec_trace` decoratior) and automate

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo found: decoratior should be decorator.

Suggested change
dumping data into the test harness (`@spec_trace` decoratior) and automate
dumping data into the test harness (`@spec_trace` decorator) and automate

dumping data into the test harness (`@spec_trace` decoratior) and automate
everything that doesn't have to be manual.

Spec is being wrapped into a transparent proxy object and all method call are

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo found: all method call are should be all method calls are.

Suggested change
Spec is being wrapped into a transparent proxy object and all method call are
Spec is being wrapped into a transparent proxy object and all method calls are

In this example, `process_slot` does not return anything but we can see the
initial state and the final state being dumped automatically and they are
different. In the other more complex example test (omitted here for brewety) we

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Typo found: brewety should be brevity.

Suggested change
different. In the other more complex example test (omitted here for brewety) we
different. In the other more complex example test (omitted here for brevity) we

@@ -0,0 +1 @@
from tests.infra.trace.decorator import spec_trace as spec_trace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The alias as spec_trace is redundant here. You can simplify the import.

Suggested change
from tests.infra.trace.decorator import spec_trace as spec_trace
from tests.infra.trace.decorator import spec_trace

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist this is intentional for re-exporting

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct that this import in __init__.py is for re-exporting spec_trace from the decorator module. However, the as spec_trace part is still redundant. When you import spec_trace from tests.infra.trace.decorator, it is already named spec_trace in the current namespace. The as keyword is typically used to rename an imported object (e.g., import long_name as short_name).

Removing as spec_trace will not change the functionality or the re-export behavior, but it will make the line more concise and adhere to common Python practices for clarity.

Suggested change
from tests.infra.trace.decorator import spec_trace as spec_trace
from tests.infra.trace.decorator import spec_trace
Suggested change
from tests.infra.trace.decorator import spec_trace as spec_trace
from tests.infra.trace.decorator import spec_trace

Comment on lines 86 to 89
if isinstance(v, str):
return str(v)
if isinstance(v, int):
return int(v)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The str(v) and int(v) calls are redundant since the isinstance checks already confirm the types. You can remove these explicit casts for cleaner code.

Suggested change
if isinstance(v, str):
return str(v)
if isinstance(v, int):
return int(v)
if isinstance(v, str):
return v
if isinstance(v, int):
return v

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist this is intentional to catch and coerce subtypes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification, @IvanAnishchuk. Your explanation regarding the intentional coercion of subtypes makes perfect sense. In that context, the explicit str(v) and int(v) calls are indeed necessary and not redundant. I'll retract my previous suggestion.

@@ -0,0 +1,64 @@
"""
FIXME: I put this file here to test reftest vector generation with the new approach - using existing runners and dumpers system, most likely this should not be merged like this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This FIXME note suggests that this file is for temporary testing. Please ensure it's removed or moved to an appropriate location before merging this pull request.

payment = create_builder_pending_payment(spec, i, spec.MIN_ACTIVATION_BALANCE, 1)
state.builder_pending_payments[i] = payment

[x for x in run_epoch_processing_with(spec, state, "process_builder_pending_payments")]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This list comprehension is used just to exhaust the generator, and the result is discarded. A more idiomatic way to do this is to use list() and assign to _ to make it clear the result is intentionally unused.

Suggested change
[x for x in run_epoch_processing_with(spec, state, "process_builder_pending_payments")]
list(run_epoch_processing_with(spec, state, "process_builder_pending_payments"))

@IvanAnishchuk IvanAnishchuk changed the title featr(trace): neotrace - PR for testing and self-reviews feat(trace): neotrace - PR for testing and self-reviews Dec 20, 2025
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces neotrace, a new testing framework for tracing Ethereum consensus spec method calls. The implementation uses wrapt to proxy the spec object and pydantic to define and serialize the trace structure. The core logic for tracing, handling state mutations, and serializing artifacts seems well-designed and is accompanied by a good set of unit tests. The integration with the existing test runner is minimal and clean. I've included some suggestions for code clarity, style, and removal of temporary test files to help finalize this feature. Overall, this is a great addition.

@@ -0,0 +1,64 @@
"""
FIXME: I put this file here to test reftest vector generation with the new approach - using existing runners and dumpers system, most likely this should not be merged like this

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This file includes a FIXME comment indicating it is for testing purposes and should likely not be merged in its current state. Please remove this file before merging the pull request. If it is intended to be a permanent example, it should be moved to a more appropriate location and the FIXME comment should be removed.

Comment on lines 98 to 105
outputs += [
# dump trace data for yaml serialization
("trace", "data", data.model_dump(mode="json", exclude_none=True)),
] + [
(name, "ssz", value)
# ssz artifacts are already serialized and will be compressed by the dumper
for name, value in data._artifacts.items()
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list concatenation and list comprehension can be slightly simplified by using list.extend for better readability and potentially better performance, as it avoids creating intermediate lists.

                outputs.append(("trace", "data", data.model_dump(mode="json", exclude_none=True)))
                # ssz artifacts are already serialized and will be compressed by the dumper
                outputs.extend(
                    (name, "ssz", value)
                    for name, value in data._artifacts.items()
                )

structured trace file.

The basic idea is make tests more simple and linear and hide the minutae of
dumping data into the test harness (`@spec_trace` decoratior) and automate

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the word "decorator".

Suggested change
dumping data into the test harness (`@spec_trace` decoratior) and automate
dumping data into the test harness (`@spec_trace` decorator) and automate

dumping data into the test harness (`@spec_trace` decoratior) and automate
everything that doesn't have to be manual.

Spec is being wrapped into a transparent proxy object and all method call are

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a grammatical error here. It should be "all method calls are" instead of "all method call are".

Suggested change
Spec is being wrapped into a transparent proxy object and all method call are
Spec is being wrapped into a transparent proxy object and all method calls are

In this example, `process_slot` does not return anything but we can see the
initial state and the final state being dumped automatically and they are
different. In the other more complex example test (omitted here for brewety) we

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the word "brevity".

Suggested change
different. In the other more complex example test (omitted here for brewety) we
different. In the other more complex example test (omitted here for brevity) we

@@ -0,0 +1 @@
from tests.infra.trace.decorator import spec_trace as spec_trace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The alias as spec_trace is redundant in this import statement. It can be simplified.

Suggested change
from tests.infra.trace.decorator import spec_trace as spec_trace
from tests.infra.trace.decorator import spec_trace

"""
Load state step in the execution trace.
Used when a previously-unseen state is used in spec all.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the docstring. "spec all" should be "spec call".

Suggested change
Used when a previously-unseen state is used in spec all.
Used when a previously-unseen state is used in a spec call.

"""

op: Literal["assert_state"] = Field(default="assert_state")
# literal type + default value looks duplicated but I think required

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment appears to be a developer note and should be removed before merging.

Comment on lines 80 to 97
def sanitize_data(cls, v: Any) -> Any:
# convert raw bytes to 0x-prefixed hex
if isinstance(v, bytes):
return f"0x{v.hex()}"
# coerce primitive types into their raw form
# (pre-processor just passes them through without coercing)
if isinstance(v, str):
return str(v)
if isinstance(v, int):
return int(v)
# recursively clean simple structures
if isinstance(v, tuple):
return tuple(cls.sanitize_data(x) for x in v)
if isinstance(v, list):
return [cls.sanitize_data(x) for x in v]
if isinstance(v, dict):
return {k: cls.sanitize_data(val) for k, val in v.items()}
return v

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Since the project uses Python 3.10+, the sanitize_data method can be refactored using a match statement for improved readability and conciseness.

Suggested change
def sanitize_data(cls, v: Any) -> Any:
# convert raw bytes to 0x-prefixed hex
if isinstance(v, bytes):
return f"0x{v.hex()}"
# coerce primitive types into their raw form
# (pre-processor just passes them through without coercing)
if isinstance(v, str):
return str(v)
if isinstance(v, int):
return int(v)
# recursively clean simple structures
if isinstance(v, tuple):
return tuple(cls.sanitize_data(x) for x in v)
if isinstance(v, list):
return [cls.sanitize_data(x) for x in v]
if isinstance(v, dict):
return {k: cls.sanitize_data(val) for k, val in v.items()}
return v
def sanitize_data(cls, v: Any) -> Any:
match v:
case bytes():
# convert raw bytes to 0x-prefixed hex
return f"0x{v.hex()}"
case str() | int():
# coerce primitive types into their raw form
# (pre-processor just passes them through without coercing)
return v
case tuple():
return tuple(cls.sanitize_data(x) for x in v)
case list():
return [cls.sanitize_data(x) for x in v]
case dict():
return {k: cls.sanitize_data(val) for k, val in v.items()}
case _:
return v

payment = create_builder_pending_payment(spec, i, spec.MIN_ACTIVATION_BALANCE, 1)
state.builder_pending_payments[i] = payment

[x for x in run_epoch_processing_with(spec, state, "process_builder_pending_payments")]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using a list comprehension for its side effects (to exhaust a generator) is not idiomatic. A simple for loop would be more explicit and readable.

Suggested change
[x for x in run_epoch_processing_with(spec, state, "process_builder_pending_payments")]
for _ in run_epoch_processing_with(spec, state, "process_builder_pending_payments"): pass

@IvanAnishchuk
Copy link
Owner Author

@gemini-code-assist compare with requirements in ethereum#4603 and feedback given in ethereum#4755 - is everything addressed, are there noticeable gaps or differences, is anything implemented in a non-optimal or non-idiomatic way? Be thorough, don't skip any details.

@gemini-code-assist
Copy link

Hello @IvanAnishchuk, thanks for reaching out! I've thoroughly reviewed your pull request, focusing on the new neotrace framework. I appreciate you asking for a detailed comparison against the referenced issues and feedback. Please note that I do not have direct access to external URLs, so my assessment regarding https://github.com/ethereum/consensus-specs/issues/4603 and https://github.com/ethereum/consensus-specs/pull/4755 is based on the context provided within this pull request's description and the new README.md.

Overall, the neotrace framework appears to be a well-conceived and robust solution for enhancing the testing process by providing detailed, automated tracing of spec method interactions. The use of wrapt for transparent proxying and pydantic for structured trace file generation are excellent choices that leverage established Python best practices.

Let's break down your questions:

Is everything addressed (in the spirit of the linked issues/PRs)?

Based on the README.md (tests/infra/trace/README.md), this PR directly addresses the need for a new testing framework for Ethereum consensus spec tests, specifically referencing #4603. The core idea of making tests more linear, automating data dumping, and tracking state mutations seems to be comprehensively covered by the proposed neotrace framework. The README also acknowledges inspiration from 4724, suggesting that prior discussions and prototypes have informed this design.

From the perspective of the stated goals in your README, the core functionality of transparent method proxying, structured trace file generation (YAML), and SSZ artifact management is well-implemented.

Are there noticeable gaps or differences?

The README.md itself provides a very transparent and helpful TODO section, which I will reiterate and elaborate on slightly:

  • Integration with Test Runner (Non-Generator Functions):

    "Integration with test runner is done by yielding pydantic model as a new data type and very simple change in the runner. It would be more natural to just return the value but that would require supporting non-generator functions as tests which would require much more code that seems reasonable here."

    This is a known design compromise. While yielding is functional, supporting non-generator functions would indeed simplify test authoring. The current approach is pragmatic given the scope, but it's a valid point for future enhancement.

  • Separation of Concerns / Boilerplate:

    "I tried my best to separate core logic from the boilerplate needed but it could be improved upon."

    This is a general statement, but it's always good to keep an eye on refactoring opportunities as the framework evolves. For a first iteration, the current structure is quite clear.

  • Cleanup and Polishing:

    "Some cleanup and polishing is still required."

    Similar to the above, this is a continuous process. No specific critical issues were identified in this area during my review.

  • Typing:

    "Typing could be improved."

    While the core logic has type hints, there might be areas where they could be more precise or comprehensive, especially in the _process_arg method or in the mock objects within test_traced_spec.py. For example, some Any types could potentially be narrowed down with more specific Union or TypeVar definitions if the complexity doesn't outweigh the benefit.

One minor observation, not necessarily a gap, is the reliance on _artifacts as a PrivateAttr in TraceConfig. While this is an appropriate use of Pydantic's PrivateAttr for internal state not meant for serialization, it does mean that any external code interacting with the TraceConfig object would need to be aware of this private attribute if it needed to access the raw artifacts directly. This is a design choice and not inherently problematic.

Is anything implemented in a non-optimal or non-idiomatic way?

I found the implementation to be largely optimal and idiomatic for Python, especially within the context of the eth2spec codebase and the chosen libraries:

  • wrapt.ObjectProxy: Using wrapt for transparent method interception is a highly idiomatic and robust way to implement proxying in Python without resorting to more intrusive monkey-patching. The dynamic wrapping in __getattr__ is efficient.

  • Pydantic Models: The use of Pydantic for defining the trace schema (models.py) is excellent. It provides strong type validation, clear data structure, and convenient serialization/deserialization. The @field_serializer methods for handling SSZ suffixes and byte-to-hex conversion are clean and idiomatic for Pydantic.

  • State Tracking Logic: The _capture_pre_state and _capture_post_state methods in traced_spec.py effectively manage state changes and automatically inject load_state and assert_state operations. This

@IvanAnishchuk
Copy link
Owner Author

@gemini-code-assist Okay, here's a quote of the original issue with requirements (ethereum#4603) please thoroughly review this PR against this:

Spec test sequence trace, a general test vector format for the Ethereum consensus layer

Goals:

  • Get more spec tests and generated vectors from them, by reducing the barriers and simplifying the process to create them
  • Test developers should be able to use the spec normally, without having to know how to use yield statements to record stuff.
  • So many formats make it more difficult for people writing test to know how to write them and choosing formats
  • So many formats added all the time requires for consumers to add them in their code
  • Formats are not flexible, leading us to add the same type of test many times and preventing us from creating tests with more different and complex scenarios
  • Currently, if a big object is used many times in many tests, it is stored many times using disk space. This is easily fixable.

Test writer interface

The test writer just adds a decorator @spec_trace to the test. All the calls to the spec methods will be recorded automatically, because instead of a spec, they will receive a proxy to it.

Tests consumer interface

default_fork: "deneb"
trace:
	- op: "load_state"
	  state_root: 7a......
	- op: "spec_call"
	  method: "get_validator_activation_churn_limit"
	  assert_output: 100
	- op: "spec_call"
	  method: "process_slots"
	  input:
	    - slot: 15
	- op: "spec_call"
	  method: "process_slot"
	- op: "spec_call"
	  method: "process_epoch"
	- op: "assert_state"
	  state_root: 98a.....

The test developer doesn't have to add load_state or assert_state manually. The spec proxy will automatically add them.

Implementation details and space optimisations

Spec proxy

The test receives, instead of the spec object an proxy to the spec.

There are many python libraries that help implementing proxy objects, some of them are:

  • wrapt
  • lazy_object_proxy
  • MappingProxyType
  • proxy.py
  • proxyscrape

Spec calls recording

The proxy records all calls to the spec internally, with its inputs and outputs in sequence.

Objects storage

The input and outputs that are ssz serializable will be treated differently. The proxy will detected them automatically by its type, and store them all in the same directory with name {hash_root}.ssz_snappy. In the YAML of the input/ouput we will store the file name.

Because the name is the hash_root if an object is used many times, it is only stored once.

State optimisation

Many spec methods, take state as an input and modify it. It would be ineficient and unnecessary to store the whole state for each of these calls.

The proxy will:

  • The first time a state is used as input, will add a load_state entry to the trace.
  • Will record the state as result of each call.
  • When running a method, if the input state matches the previous output state, nothing is recorded.
  • If the input state of a method is different from the modified state of the last method that modified the state, the proxy will ad a assert_state entry with the last output state and a load_state with the new state. This records the test developer modifies manually the state.
  • At the end of the test, the proxy will add an assert_state entry to finish the test vector with the final state.

Trace capture and serialisation

The test trace is stored in a Pydantic object to help with validation and serialisation. Something like:

class BaseOperation(BaseModel):
model_config = ConfigDict(extra="forbid")
op: str

class LoadStateOp(BaseOperation):
op: Literal["load_state"]
state_root: str

class SpecCallOp(BaseOperation):
op: Literal["spec_call"]
method: str
input: Optional[List[dict[str, Any]]] = None
assert_output: Optional[Any] = None

class AssertStateOp(BaseOperation):
op: Literal["assert_state"]
state_root: str

class TraceConfig(BaseModel):
default_fork: str
trace: List[Union[LoadStateOp, SpecCallOp, AssertStateOp]] = Field(
..., discriminator="op"
)
As currently the tests vectors are in YAML it will be outputted in YAML, but aditionally in json with json-schema. The json-schema will help in creating consumes more automatically.

Example

For example porpouses

@spec_trace
def test_an_example_test(spec, state):
churn_limit = spec.get_validator_activation_churn_limit(state)
spec.process_slots(state, 15)
state.validators[0].balance = 0
spec.process_slot(state)
spec.process_epoch(state)
Will output something like:

default_fork: "deneb"
trace:
	- op: "load_state"
	  state_root: 7a......
	- op: "spec_call"
	  method: "get_validator_activation_churn_limit"
	  assert_output: 100
	- op: "spec_call"
	  method: "process_slots"
	  input:
	    - slot: 15
	- op: "load_state"
	  state_root: 8c......
	- op: "spec_call"
	  method: "process_slot"
	- op: "spec_call"
	  method: "process_epoch"
	- op: "assert_state"
	  state_root: 98a.....

Let's note several things:

  • The test writer doesn't have to do anything special, creates a normal pytest test of the spec, without yields or anything outside calling the functions.
  • The first time the state is used, the load of it is automatically generated
  • When the state is manually modified a load of it is automatically generated
  • At the end of the test the state root is automatically asserted

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a new testing framework, neotrace, for creating spec traces. The implementation uses wrapt to proxy the spec object and record method calls, arguments, return values, and state mutations. The core logic is well-structured, using Pydantic models for the trace format and a decorator-based approach for easy integration into existing tests. The changes include the core tracer implementation, unit tests, example tests, documentation, and necessary dependency updates. The overall approach is solid. I've found a few minor issues, mostly related to documentation typos and code style, which I've detailed in the comments.

Comment on lines 8 to 117
dumping data into the test harness (`@spec_trace` decoratior) and automate
everything that doesn't have to be manual.

Spec is being wrapped into a transparent proxy object and all method call are
being tracked including any state mutations before and after. Final state is
recorded and all relevant artifacts including all states are being saved into
SSZ artifacts (hash-addressed to avoid duplication).

### Usage and example test

```python
from tests.infra.trace import spec_trace


@with_all_phases
@spec_state_test # keep these like before
@spec_trace # this is the thing that makes the magic happen
def test_linear_sanity_slots_222(
spec, state
): # spec and state can be positional but the name matters
# just use spec methods, they are traced automagically, and state is dumped
spec.process_slot(state)
```

this is for example purposes put into
`tests/core/pyspec/eth2spec/test/gloas/sanity/test_slots_2.py` and can be run
with something like

```
make reftests fork=gloas runner=sanity k=linear_sanity_slots_222 verbose=true
```

that produces a trace in
`../consensus-spec-tests/tests/minimal/gloas/sanity/slots_2/pyspec_tests/linear_sanity_slots_222/trace.yaml`

### Spec trace file example

```yaml
default_fork: gloas
trace:
- {op: load_state, state_root:
95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
- op: spec_call
method: process_slot
input: {state:
95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy}
- {op: assert_state, state_root:
41f562b491baaa9fdd981973c8aef64bb7c663c4b07f35141c16afc9e11184c1.ssz_snappy}
```
In this example, `process_slot` does not return anything but we can see the
initial state and the final state being dumped automatically and they are
different. In the other more complex example test (omitted here for brewety) we
can examine how complex inputs and outputs being dumped and how out-of-band
state mutations are being tracked with assert and load steps.

A non-normative example of a little more complex inputs and outputs:

```yaml
- op: spec_call
method: get_current_epoch
input: {state:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy}
assert_output: 2
- op: spec_call
method: get_seed
input:
state:
0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy
epoch: 2
domain_type: '0x00000000'
assert_output: '0x79edc6cbb9ffac34477afe87d8569b7afab320241ce69d70c6d0c0a1839379df'
```

simple primitive types here (e.g. int or bytes, not containers) are serialized
directly into yaml (even in cases where they subclass SSZ `View` and can
technically be dumped as ssz artifacts), bytes are dumped as 0x-prefixed hex
strings which seems appropriate. SSZ artifacts are always referred to by root
hash-based filename so that there's no need to maintain any mappings or
filename-generating logic.

### Implementation details

wrapt is used to wrap spec methods and record their calls, parameters and
results. A decorator is used to set things up. Some simple pydantic models are
used for the trace file structure and some sanitation/formatting.

From a consumer standpoint (i.e. test runner) new tests using this decorator
behave differently and are being detected by a new data type (a pydantic model
instance). Some logic was added to `execute_test` in
`tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py` to catch that
new case and apply new serialization method.

The way `wrapt.ObjectProxy` operates is that it allows you to create a proxy
object for e.g. consensus spec module and override things on it without
affecting any of the underlying logic (unlike monkey-patching). In our case here
we override all lowercase methods in the spec object by wrapping them in a
`wrapt` decorator with tracer function. Whenever a state is detected in any of
the method calls it gets automatically tracked and then it's checked again after
each method call to check for mutations. Everything is saved in a pydantic model
object for further dumping using existing reftest tooling.

### TODO

This is still being cooked.

Integration with test runner is done by yielding pydantic model as a new data
type and very simple change in the runner. It would be more natural to just
return the value but that would require supporting non-generator functions as
tests which would require much more code that seems reasonable here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are a few typos and grammatical errors in the README which could be corrected to improve clarity.

  • On line 8: decoratior should be decorator.
  • On line 11: method call are should be method calls are.
  • On line 60: brewety should be brevity.
  • On line 114: yielding pydantic model should be yielding a pydantic model.
  • On line 117: much more code that seems reasonable here should be much more code than seems reasonable here.

@@ -0,0 +1 @@
from tests.infra.trace.decorator import spec_trace as spec_trace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The import alias as spec_trace is redundant because the imported name is the same. This can be simplified.

Suggested change
from tests.infra.trace.decorator import spec_trace as spec_trace
from tests.infra.trace.decorator import spec_trace

payment = create_builder_pending_payment(spec, i, spec.MIN_ACTIVATION_BALANCE, 1)
state.builder_pending_payments[i] = payment

[x for x in run_epoch_processing_with(spec, state, "process_builder_pending_payments")]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This list comprehension is used only for its side effect of exhausting the generator, but the resulting list is not used. This can be confusing as it looks like a value is being computed and stored. A for loop would be more explicit about the intent and is generally preferred for code that relies on side effects.

for _ in run_epoch_processing_with(spec, state, "process_builder_pending_payments"):
    pass

@IvanAnishchuk
Copy link
Owner Author

@gemini-code-assist

@gemini-code-assist
Copy link

Hello @IvanAnishchuk, thank you for providing the full context of ethereum/consensus-specs#4603. This detailed specification is incredibly helpful for a precise comparison. I will now re-evaluate the neotrace framework against these explicit requirements. My previous assessment was based on the PR description and README.md, so this new information allows for a much more thorough analysis.

Overall, the neotrace implementation aligns remarkably well with the goals and detailed requirements outlined in ethereum/consensus-specs#4603. The design choices made in the PR directly address almost all points raised in the issue.

Let's break down the comparison point by point:

Goals:

  • Reduce barriers, simplify process to create tests: The @spec_trace decorator and the transparent proxying mechanism directly achieve this. Test developers can write standard Python code interacting with the spec object, and the tracing happens automatically in the background. This is a significant reduction in boilerplate compared to manual data dumping.
  • Test developers should be able to use the spec normally, without having to know how to use yield statements to record stuff: This is largely met for the test logic itself. The test function (test_linear_sanity_slots_222 in test_example_slots_2.py) does not contain yield statements. However, the spec_trace decorator's wrapper function does yield the recorder._model at the end. The README.md acknowledges this as a known compromise:

    "Integration with test runner is done by yielding pydantic model as a new data type and very simple change in the runner. It would be more natural to just return the value but that would require supporting non-generator functions as tests which would require much more code that seems reasonable here."
    This is a pragmatic choice given the current test runner architecture and is clearly documented.

  • So many formats make it more difficult for people writing test to know how to write them and choosing formats: The neotrace framework introduces a single, structured YAML format (backed by Pydantic models) for traces, which consolidates the output of spec method interactions. This directly addresses the goal of reducing format proliferation for this specific type of test.
  • So many formats added all the time requires for consumers to add them in their code: By providing a standardized Pydantic schema, consumers can more easily parse and validate the trace data, reducing the effort required to integrate new test vectors.
  • Formats are not flexible, leading us to add the same type of test many times and preventing us from creating tests with more different and complex scenarios: The structured nature of the trace (sequence of load_state, spec_call, assert_state) and the ability to capture arbitrary inputs/outputs via SSZ artifacts provides a highly flexible format for representing complex scenarios. The example test_builder_333 in test_example_slots_2.py demonstrates this flexibility.
  • Currently, if a big object is used many times in many tests, it is stored many times using disk space. This is easily fixable: This is fully addressed by the content-addressed SSZ artifact storage ({hash_root}.ssz_snappy). The _artifacts dictionary in TraceConfig and the _process_arg method in RecordingSpec ensure that SSZ objects are stored once and referenced by their hash, preventing duplication.

Test writer interface:

  • The test writer just adds a decorator @spec_trace to the test. All the calls to the spec methods will be recorded automatically, because instead of a spec, they will receive a proxy to it.
    This is precisely how the PR is implemented. The decorator.py file defines @spec_trace, which injects a RecordingSpec proxy into the test function's spec argument. The example tests in test_example_slots_2.py clearly demonstrate this usage.

Tests consumer interface:

  • The example YAML structure provided in the issue (default_fork, trace with load_state, spec_call, assert_state operations):
    The models.py file defines TraceConfig, LoadStateOp, SpecCallOp, and AssertStateOp that perfectly match the proposed YAML structure. The example_trace.yaml file also showcases this format, including input and assert_output fields with SSZ references and primitive types.
  • The test developer doesn't have to add load_state or assert_state manually. The spec proxy will automatically add them.
    This is a core feature of the RecordingSpec in traced_spec.py. The _capture_pre_state and _record_auto_assert_step methods are responsible for automatically injecting LoadStateOp and AssertStateOp entries into the trace, as demonstrated by the unit tests in test_traced_spec.py (e.g., test_basic_function_call, test_state_mutation_and_deduplication).

Implementation details and space optimisations:

  • Spec proxy (wrapt, lazy_object_proxy, MappingProxyType, proxy.py, proxyscrape): The PR explicitly uses wrapt (pyproject.toml, uv.lock, traced_spec.py), which was one of the suggested libraries, confirming an idiomatic and robust choice for transparent proxying.
  • Spec calls recording (inputs and outputs in sequence):
    The _spec_call_hook in RecordingSpec captures method_name, serial_params (inputs), and result (outputs) for each spec_call, storing them sequentially in _model.trace.
  • Objects storage (SSZ serializable by {hash_root}.ssz_snappy, YAML stores filename, deduplication):
    The _process_arg method in RecordingSpec handles this. It checks if an argument is is_serializable (an SSZ View and not a primitive), calculates its hash_tree_root().hex(), stores the serialized bytes in _model._artifacts (ensuring deduplication), and returns the f"{ssz_hash}.ssz_snappy" string for inclusion in the YAML trace. This perfectly matches the requirement.
  • State optimisation (detailed logic for load_state and assert_state based on state changes):
    The _capture_pre_state and _capture_post_state methods in RecordingSpec implement this logic precisely:
    • First time state is used: _capture_pre_state detects _last_state_root is None and adds a LoadStateOp.
    • Record state as result of each call: The _capture_post_state method updates _last_state_root with the current state's root after each call if the state has changed.
    • Input state matches previous output state: _capture_pre_state checks old_root == new_root and does not add a LoadStateOp if they match.
    • Input state different from last modified state: _capture_pre_state detects old_root != new_root, adds an AssertStateOp for the old_root, and then a LoadStateOp for the new_root. This handles manual state modifications outside of spec calls.
    • End of test: _finalize_trace calls _record_auto_assert_step, which adds a final AssertStateOp for the last known state root.
      The unit tests in test_traced_spec.py (especially test_state_mutation_and_deduplication) confirm that this complex state tracking logic is correctly implemented.
  • Trace capture and serialisation (Pydantic, YAML, JSON with JSON-schema):
    The models.py defines the Pydantic schema. The gen_runner.py modification (elif kind == "pydantic") uses data.model_dump(mode="json", exclude_none=True) to serialize the TraceConfig model, which can then be outputted as YAML (or JSON). The field_serializer decorators in models.py handle the byte-to-hex conversion and SSZ suffixing for the JSON output mode, which is suitable for generating JSON-schema compatible output.

Example:

  • The example provided in the issue (@spec_trace def test_an_example_test(spec, state): ...) and its expected YAML output are fully consistent with the implementation in the PR. The README.md and test_example_slots_2.py provide similar examples that produce the described trace structure.

Noticeable Gaps or Differences (revisiting previous points with new context):

  • Integration with Test Runner (Non-Generator Functions): As noted in the issue's goals and the PR's README.md, the current implementation requires the test runner to handle a yielded Pydantic model. While functional, supporting non-generator functions would simplify test authoring further. This is a known, documented design compromise, and not a gap in addressing the spirit of the issue, but rather a pragmatic choice for the initial implementation.
  • Separation of Concerns / Boilerplate: The PR's structure (separate decorator.py, traced_spec.py, models.py) demonstrates a good effort to separate concerns. The README.md's TODO about improving boilerplate is a general statement for continuous improvement, not a specific gap against the issue's requirements.
  • Cleanup and Polishing: Again, a general TODO for future work, not a specific unaddressed requirement from the issue.
  • Typing: The README.md mentions typing could be improved. While the core logic has type hints, there are indeed some Any types, particularly in models.py (input: dict[str, Any | str | None], assert_output: Any | str | None) and traced_spec.py (_process_arg(self, arg: View | Any)). While Pydantic handles validation at runtime, more precise type hints could enhance static analysis and developer experience. This is a minor point and doesn't detract from the functionality.

Is anything implemented in a non-optimal or non-idiomatic way?

My assessment remains that the implementation is largely optimal and highly idiomatic for Python within this ecosystem:

  • wrapt.ObjectProxy: This is an excellent, Pythonic choice for transparent proxying, avoiding more brittle monkey-patching. It directly addresses the issue's suggestion of using a proxy library.
  • Pydantic Models: The use of Pydantic for schema definition, validation, and serialization is a modern and robust approach, perfectly aligning with the issue's requirement for structured trace data and JSON-schema compatibility.
  • State Tracking Logic: The detailed logic for _capture_pre_state, _capture_post_state, and _record_auto_assert_step is well-designed to handle state mutations and deduplication efficiently and correctly, directly implementing the complex state optimization requirements from the issue.
  • SSZ Artifact Management: The content-addressed storage of SSZ artifacts is a clever and efficient solution for disk space optimization and deduplication, as explicitly requested in the issue.

In conclusion, the neotrace framework, as implemented in this pull request, is an extremely faithful and well-executed solution to the requirements laid out in ethereum/consensus-specs#4603. The design choices are sound, idiomatic, and directly address the stated goals and implementation details. The minor points about yield in the decorator and potential typing improvements are acknowledged by the author and do not represent critical deviations from the core requirements.

@IvanAnishchuk
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a powerful new tracing framework for Ethereum consensus spec tests. The implementation is well-thought-out, leveraging wrapt for transparent proxying and pydantic for structured data modeling. The inclusion of comprehensive unit tests for the tracing mechanism itself is a great practice. My review includes several suggestions to improve code clarity, maintainability, and encapsulation by refactoring duplicated code, improving error messages, and adhering more strictly to object-oriented principles. These changes should make the new framework even more robust and easier for other developers to understand and extend.

] + [
(name, "ssz", value)
# ssz artifacts are already serialized and will be compressed by the dumper
for name, value in data._artifacts.items()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The code directly accesses the private _artifacts attribute of the data object (TraceConfig instance). To improve encapsulation and adhere to good object-oriented practices, it's better to expose these artifacts through a public property on the TraceConfig model.

In tests/infra/trace/models.py, please add this property to TraceConfig:

@property
def artifacts(self) -> dict[str, bytes]:
    return self._artifacts

Then, you can use data.artifacts here.

Suggested change
for name, value in data._artifacts.items()
for name, value in data.artifacts.items()

Comment on lines 24 to 25
except TypeError as e:
raise RuntimeError("non-test invocation detected") from e

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The RuntimeError message "non-test invocation detected" is misleading. A TypeError from inspect.signature().bind() typically occurs due to a mismatch between the test function's signature and the arguments provided, which is a test definition issue. A more specific error message would be more helpful for debugging.

Suggested change
except TypeError as e:
raise RuntimeError("non-test invocation detected") from e
except TypeError as e:
raise RuntimeError(f"Failed to bind arguments for test function '{fn.__name__}': {e}") from e

recorder._finalize_trace()

# yield data so that runner can pick it up and dump
yield "trace", "pydantic", recorder._model

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The decorator directly accesses the private _model attribute of the RecordingSpec instance. To improve encapsulation, you should expose the model via a public property.

In tests/infra/trace/traced_spec.py, please add this property to the RecordingSpec class:

@property
def model(self) -> TraceConfig:
    return self._model

Then, you can use recorder.model here.

Suggested change
yield "trace", "pydantic", recorder._model
yield "trace", "pydantic", recorder.model

Comment on lines 24 to 58
class LoadStateOp(TraceStepModel):
"""
Load state step in the execution trace.
Used when a previously-unseen state is used in a spec call.
State root is recorded as 'state_root'.
"""

op: Literal["load_state"] = Field(default="load_state")
state_root: str = Field(pattern=r"^[0-9a-f]{64}$")

@field_serializer("state_root", mode="plain", when_used="always")
@classmethod
def sanitize_data(cls, v: str) -> str:
# add ssz_snappy suffix (dumper handles the actual compression)
return f"{v}.ssz_snappy"


class AssertStateOp(TraceStepModel):
"""
Assert state step in the execution trace.
Auto-added at the end of the trace with the last known state root.
State root is recorded as 'state_root'.
"""

op: Literal["assert_state"] = Field(default="assert_state")
state_root: str = Field(pattern=r"^[0-9a-f]{64}$")

@field_serializer("state_root", mode="plain", when_used="always")
@classmethod
def sanitize_data(cls, v: str) -> str:
# add ssz_snappy suffix (dumper handles the actual compression)
return f"{v}.ssz_snappy"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The LoadStateOp and AssertStateOp models share the same state_root field and sanitize_data serializer. This duplication can be avoided by extracting the common logic into a base class. This will make the code more maintainable and less error-prone.

class StateOp(TraceStepModel):
    """
    Abstract base class for operations involving a state root.
    """

    state_root: str = Field(pattern=r"^[0-9a-f]{64}$")

    @field_serializer("state_root", mode="plain", when_used="always")
    @classmethod
    def sanitize_data(cls, v: str) -> str:
        # add ssz_snappy suffix (dumper handles the actual compression)
        return f"{v}.ssz_snappy"


class LoadStateOp(StateOp):
    """
    Load state step in the execution trace.

    Used when a previously-unseen state is used in a spec call.
    State root is recorded as 'state_root'.
    """

    op: Literal["load_state"] = Field(default="load_state")


class AssertStateOp(StateOp):
    """
    Assert state step in the execution trace.

    Auto-added at the end of the trace with the last known state root.
    State root is recorded as 'state_root'.
    """

    op: Literal["assert_state"] = Field(default="assert_state")

Comment on lines 85 to 88
if isinstance(v, str):
return str(v)
if isinstance(v, int):
return int(v)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sanitize_data method contains redundant type casting for str and int. Since isinstance already confirms the type, calling str(v) on a string or int(v) on an integer is unnecessary. Removing these calls will make the code cleaner.

Suggested change
if isinstance(v, str):
return str(v)
if isinstance(v, int):
return int(v)
if isinstance(v, str):
return v
if isinstance(v, int):
return v

Comment on lines 24 to 25
# We must re-assign the `state` variable, as `process_slot`
# is a pure function that returns a new, modified state.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is misleading. It states that process_slot is a pure function that returns a new state, but it actually mutates the state object in-place. This should be corrected to avoid confusion for developers using this as an example.

Suggested change
# We must re-assign the `state` variable, as `process_slot`
# is a pure function that returns a new, modified state.
# The `process_slot` function mutates the state object in-place,
# so there is no need to re-assign the `state` variable.

Draft a naive approach to trace data dumping.
Small fixes and lots of polish.
Test improvements.
Add shared base class for Assert/Load state ops.
Stricter and more concise typing.
Simplify serialization.
@IvanAnishchuk
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a new tracing framework for spec tests, which is a great addition for improving test simplicity and debugging. The implementation using wrapt and Pydantic models is well-structured. I've found a few minor areas for improvement in the core tracer logic regarding redundant code and unused parameters. I've also suggested some minor wording improvements in the README file to enhance clarity. Overall, this is a solid contribution.

Comment on lines 3 to 14
This is an implementation of #4603 a new testing framework for the Ethereum
consensus spec tests, based on tracing spec method calls and recording them in a
structured trace file.

The basic idea is make tests more simple and linear and hide the minutiae of
dumping data into the test harness (`@spec_trace` decorator) and automate
everything that doesn't have to be manual.

Spec is being wrapped into a transparent proxy object and all method calls are
being tracked including any state mutations before and after. Final state is
recorded and all relevant artifacts including all states are being saved into
SSZ artifacts (hash-addressed to avoid duplication).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This README is very informative! I have a few minor suggestions to improve grammar and clarity for even better readability:

These are just suggestions, feel free to apply them as you see fit.

Comment on lines 27 to 29
raise ValueError(
f"Failed to bind arguments for test function '{fn.__name__}': {e}"
) from e

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good practice to raise a more specific error when arguments fail to bind. A TypeError is often raised in this scenario, but wrapping it in a ValueError provides more context about the failure's significance in this specific application. Consider re-raising as a TypeError for better semantic accuracy, as the issue is with the type or number of arguments passed to the function, not their values.

Suggested change
raise ValueError(
f"Failed to bind arguments for test function '{fn.__name__}': {e}"
) from e
raise TypeError(
f"Failed to bind arguments for test function '{fn.__name__}': {e}"
) from e


@wrapt.decorator
def _spec_call_hook(
self, wrapped: Callable, instance: "RecordingSpec", args: tuple, kwargs: dict

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The instance parameter is unused and has an incorrect type hint. According to wrapt documentation, instance is the object to which the wrapped function was bound (the original spec object), not RecordingSpec. Since it's unused, you can rename it to _ and type it as Any to signify it's intentionally ignored.

Suggested change
self, wrapped: Callable, instance: "RecordingSpec", args: tuple, kwargs: dict
self, wrapped: Callable, _: Any, args: tuple, kwargs: dict

Comment on lines 156 to 158
if state is None:
# unnecessary safeguard/type hint
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if state is None: check is redundant. The isinstance(state, View) check on line 154 already handles the case where state is None, as isinstance(None, View) evaluates to False. You can remove these lines. A similar redundant check exists in _capture_post_state at line 190.

Comment on lines 168 to 174
def _record_step(
self,
op: str,
method: str,
params: SERIALIZED_KWARGS,
result: SERIALIZED_ARGS,
) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The op parameter is unused in _record_step since SpecCallOp sets the op field with a default value. You can remove it from the function signature. Consequently, you can also remove the op_name variable at line 113 and update the call to _record_step at line 132 in _spec_call_hook to self._record_step(method_name, serial_params, serial_result).

    def _record_step(
        self,
        method: str,
        params: SERIALIZED_KWARGS,
        result: SERIALIZED_ARGS,
    ) -> None:

Comment on lines 190 to 192
if state is None:
# unnecessary safeguard/type hint
return

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if state is None: check is redundant. The isinstance(state, View) check on line 186 already handles the case where state is None, as isinstance(None, View) evaluates to False. You can remove these lines.

@IvanAnishchuk
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an excellent new tracing framework, neotrace, for Ethereum consensus spec tests. The use of wrapt for transparent proxying and Pydantic for data modeling is well-executed. The overall implementation is strong, providing a much simpler way to write tests. I've found one critical bug that would cause a runtime error and have also included a few suggestions to improve code readability and documentation clarity. Once the critical issue is addressed, this will be a great addition.

serial_result: SERIALIZED_ARGS = self._process_arg(result)

# D. Record the successful step
self._record_spec_op_step(method_name, serial_params, serial_result)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There's a typo in the method name. You are calling _record_spec_op_step, but the method is defined as _record_spec_call_step on line 164. This will raise an AttributeError at runtime. Please rename the method call to match the definition.

Suggested change
self._record_spec_op_step(method_name, serial_params, serial_result)
self._record_spec_call_step(method_name, serial_params, serial_result)

Comment on lines 97 to 105
# new data type for spec traces
outputs += [
# dump trace data for yaml serialization
("trace", "data", data.model_dump(mode="json", exclude_none=True)),
] + [
(name, "ssz", value)
# ssz artifacts are already serialized and will be compressed by the dumper
for name, value in data.artifacts.items()
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list concatenation [...] + [...] with += creates an unnecessary intermediate list. For better readability and a minor performance improvement, you could use list.append and list.extend in separate steps.

                # new data type for spec traces
                outputs.append(("trace", "data", data.model_dump(mode="json", exclude_none=True)))
                outputs.extend(
                    (name, "ssz", value)
                    # ssz artifacts are already serialized and will be compressed by the dumper
                    for name, value in data.artifacts.items()
                )

@@ -0,0 +1,133 @@
## Spec trace framework

This is an implementation of ethereum/consensus-specs#4603 a new testing

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Adding a comma here improves readability.

Suggested change
This is an implementation of ethereum/consensus-specs#4603 a new testing
This is an implementation of ethereum/consensus-specs#4603, a new testing

spec.process_slot(state)
```

this is for example purposes put into

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's standard to capitalize the first word of a sentence.

Suggested change
this is for example purposes put into
This is for example purposes put into

@IvanAnishchuk
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an impressive new tracing framework, neotrace, for testing the Ethereum consensus specs. The framework cleverly uses wrapt to create a proxy around the spec object, intercepting and recording method calls, arguments, and state mutations. Pydantic models are used to define a structured trace format, which is a great choice for ensuring data integrity. The implementation is well-structured, with clear separation of concerns between the decorator, the proxy object, data models, and tests. The addition of unit tests for the tracer itself is also a very good practice.

My review includes a few suggestions to further improve the code. I've pointed out an opportunity to enhance readability and performance in the test runner integration. I also found a minor path inconsistency in the new README.md file. Finally, I've suggested improvements to the type hints in typing.py to more accurately reflect the recursive nature of the data structures, which will enhance maintainability and type safety.

Comment on lines 98 to 105
outputs += [
# dump trace data for yaml serialization
("trace", "data", data.model_dump(mode="json", exclude_none=True)),
] + [
(name, "ssz", value)
# ssz artifacts are already serialized and will be compressed by the dumper
for name, value in data.artifacts.items()
]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The list concatenation outputs += [...] + [...] creates two intermediate lists before extending the outputs list. This can be made more readable and slightly more performant by using list.append and list.extend directly.

                # dump trace data for yaml serialization
                outputs.append(("trace", "data", data.model_dump(mode="json", exclude_none=True)))
                # ssz artifacts are already serialized and will be compressed by the dumper
                outputs.extend(
                    (name, "ssz", value)
                    for name, value in data.artifacts.items()
                )

Comment on lines 32 to 34
This is for example purposes put into
`tests/core/pyspec/eth2spec/test/gloas/sanity/test_slots_2.py` and can be run
with something like

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The README mentions that the example test is in tests/core/pyspec/eth2spec/test/gloas/sanity/test_slots_2.py, but in this pull request, the example test test_linear_sanity_slots_222 is located in the newly added file tests/infra/trace/test_example_slots_2.py. Please update the path in the documentation to reflect the correct location of the example test file to avoid confusion.

Comment on lines 11 to 18
# two recursion levels max
SERIALIZED_ARGS: TypeAlias = SERIALIZED | list[SERIALIZED] | list[list[SERIALIZED] | SERIALIZED]
SERIALIZED_KWARGS: TypeAlias = dict[str, SERIALIZED_ARGS]

# typing aliases for non-serialized values and arguments
RAW: TypeAlias = View | None # allowed simple argument types (View is wide!)
RAW_ARGS: TypeAlias = RAW | Sequence[RAW] | Sequence[Sequence[RAW] | RAW]
RAW_KWARGS: TypeAlias = dict[str, RAW_ARGS]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type aliases SERIALIZED_ARGS and RAW_ARGS are defined to support a limited nesting depth, and the comment for SERIALIZED_ARGS explicitly states "two recursion levels max". However, the functions that process these types are recursive and can handle arbitrary nesting. To improve type safety and accurately reflect the implementation, these aliases should be defined recursively.

Suggested change
# two recursion levels max
SERIALIZED_ARGS: TypeAlias = SERIALIZED | list[SERIALIZED] | list[list[SERIALIZED] | SERIALIZED]
SERIALIZED_KWARGS: TypeAlias = dict[str, SERIALIZED_ARGS]
# typing aliases for non-serialized values and arguments
RAW: TypeAlias = View | None # allowed simple argument types (View is wide!)
RAW_ARGS: TypeAlias = RAW | Sequence[RAW] | Sequence[Sequence[RAW] | RAW]
RAW_KWARGS: TypeAlias = dict[str, RAW_ARGS]
# Allows for arbitrarily nested lists of serialized data
NestedSerializable: TypeAlias = SERIALIZED | list["NestedSerializable"]
SERIALIZED_ARGS: TypeAlias = NestedSerializable
SERIALIZED_KWARGS: TypeAlias = dict[str, SERIALIZED_ARGS]
# typing aliases for non-serialized values and arguments
RAW: TypeAlias = View | None # allowed simple argument types (View is wide!)
NestedRaw: TypeAlias = RAW | Sequence["NestedRaw"]
RAW_ARGS: TypeAlias = NestedRaw
RAW_KWARGS: TypeAlias = dict[str, RAW_ARGS]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist No, not in this case, implicit recursive types are not compatible with something in the reftest framework and also we can't use py3.12+ type syntax for compatibility reason. And actual arguments we have are never nested beyond two levels - but feel free to suggest how to type annotate given these constraints.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the clarification. I understand now that the actual arguments are never nested beyond two levels, and that implicit recursive types or Python 3.12+ syntax are not compatible with your framework.

Given these constraints, the current type definitions for SERIALIZED_ARGS and RAW_ARGS are indeed correct and accurately reflect the maximum nesting depth of two levels. My previous suggestion was based on a misunderstanding of the actual data constraints and compatibility requirements.

Therefore, no changes are needed to these type aliases.

@IvanAnishchuk
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed new "spec trace" framework for testing. It leverages wrapt to proxy the spec object, recording method calls, I/O, and state changes into a structured trace file. The implementation is robust, using Pydantic for data modeling, try...finally for guaranteed cleanup, and a clever state tracking mechanism. The new functionality is well-structured across multiple files, including models, the decorator, the core proxy logic, and comprehensive unit tests. My review includes a couple of minor suggestions to improve documentation clarity and correct a type hint in a test mock. Overall, this is an excellent addition to the testing infrastructure.

spec.process_slot(state)
```

This is for example purposes can be put into e.g.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This sentence is phrased a bit awkwardly. For improved clarity, consider rephrasing it.

Suggested change
This is for example purposes can be put into e.g.
This can, for example, be put into e.g.

# Does not modify state
pass

def iterate_something(self, state: BeaconState, arg_list1: tuple, arg_list2: list) -> None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint for the return value is None, but the function actually returns a list. This should be corrected to -> list to match the implementation.

Suggested change
def iterate_something(self, state: BeaconState, arg_list1: tuple, arg_list2: list) -> None:
def iterate_something(self, state: BeaconState, arg_list1: tuple, arg_list2: list) -> list:

@IvanAnishchuk
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an impressive new tracing framework, neotrace, for the consensus-spec tests. The implementation is well-structured, leveraging wrapt for non-invasive call tracing and pydantic for clear data modeling. The inclusion of comprehensive documentation and dedicated unit tests for the framework itself is commendable and speaks to the quality of the contribution. My review includes a couple of minor suggestions to enhance the readability and maintainability of the core tracer logic. Overall, this is an excellent and well-executed feature.

Comment on lines 93 to 95
if name in ["finalize_trace", "model"]:
# avoid wrapping our own methods
return real_attr

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This if block appears to be redundant and can be safely removed. The conditions it checks for are already handled by the preceding if statement on line 90:

  • The model property is not callable, so it's filtered out by not callable(real_attr).
  • The finalize_trace method is actually named _finalize_trace, which is filtered out by name.startswith("_").

Removing this block will simplify the __getattr__ method.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist valid suggestion, but that particular method should be public and thus does need a special condition here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are absolutely right, IvanAnishchuk! My apologies for the oversight. I incorrectly assumed finalize_trace was a private method (_finalize_trace). Since it is a public method and intended to be unwrapped, the explicit if name in ["finalize_trace", "model"]: condition is indeed necessary to prevent it from being wrapped by the _spec_call_hook decorator. Thank you for clarifying this important detail. The block should remain as is.

Comment on lines 155 to 162
if (old_root := self._last_state_root) != (new_root := state.hash_tree_root().hex()):
# Assert last output state (was serialized in capture_post_state)
if old_root:
self._model.trace.append(AssertStateOp(state_root=old_root))
# Handle out-of-band mutation / add LoadState:
# note: this is always serialized before so no process_arg
self._model.trace.append(LoadStateOp(state_root=new_root))
self._last_state_root = new_root

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The if condition on line 155 is quite dense due to the use of two walrus operators, which can harm readability. Refactoring this block to separate the assignments from the comparison makes the logic clearer and easier to maintain.

Suggested change
if (old_root := self._last_state_root) != (new_root := state.hash_tree_root().hex()):
# Assert last output state (was serialized in capture_post_state)
if old_root:
self._model.trace.append(AssertStateOp(state_root=old_root))
# Handle out-of-band mutation / add LoadState:
# note: this is always serialized before so no process_arg
self._model.trace.append(LoadStateOp(state_root=new_root))
self._last_state_root = new_root
new_root = state.hash_tree_root().hex()
if self._last_state_root != new_root:
# Assert last output state (was serialized in capture_post_state)
if self._last_state_root:
self._model.trace.append(AssertStateOp(state_root=self._last_state_root))
# Handle out-of-band mutation / add LoadState:
# note: this is always serialized before so no process_arg
self._model.trace.append(LoadStateOp(state_root=new_root))
self._last_state_root = new_root

@IvanAnishchuk
Copy link
Owner Author

/gemini review

@IvanAnishchuk
Copy link
Owner Author

@gemini-code-assist Can you provide any examples where handling of such dictionaries would be necessary? Based on actual spec code, please.

@gemini-code-assist
Copy link

Hello @IvanAnishchuk, thank you for this excellent follow-up question regarding the handling of dictionaries within the neotrace framework. It allows us to delve deeper into the serialization logic.

You're asking for examples where handling of dictionaries would be necessary, particularly in the context of actual spec code. Let's clarify how dictionaries are currently processed and identify a scenario where the current implementation might fall short.

Current Dictionary Handling

  1. Top-level input dictionary in SpecCallOp: The SpecCallOp model's input field is defined as SERIALIZED_KWARGS, which is dict[str, SERIALIZED_ARGS]. The sanitize_kwargs field_serializer for this input field correctly iterates over the dictionary's values and applies simple_sanitize_data to each of them. This means that if a spec function is called with keyword arguments, and those arguments are primitives, lists, or SSZ View objects, they will be correctly processed and serialized (e.g., bytes to hex strings, View objects to SSZ artifact references).

    For example, in the get_seed call from example_trace.yaml:

    input:
      state:
        0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy
      epoch: 2
      domain_type: '0x00000000'

    Here, state, epoch, and domain_type are the direct values of the input dictionary, and they are correctly sanitized.

  2. RecordingSpec._process_arg function: This function is responsible for preparing individual arguments (both positional and keyword) before they are added to the SpecCallOp.input dictionary. It recursively handles Sequence types (like lists and tuples) by iterating over their elements and calling _process_arg on each element. It also handles View objects by saving them as SSZ artifacts.

    However, the current implementation of _process_arg in traced_spec.py has a limitation when it encounters a dictionary as an argument (or as an element within a list/tuple argument) that needs its values to be processed:

    # traced_spec.py
    def _process_arg(self, arg: RAW_ARGS) -> SERIALIZED_ARGS:
        # ...
        # recursively handle lists and tuples
        if isinstance(arg, Sequence) and not isinstance(arg, str | bytes):
            return [self._process_arg(elem) for elem in arg]
        # ...

    If arg is a dict, isinstance(arg, Sequence) evaluates to True. The list comprehension [self._process_arg(elem) for elem in arg] will then iterate over the keys of the dictionary, not its values. This means the dictionary's structure and its actual values (which might contain bytes or View objects needing serialization) would be lost or incorrectly represented in the trace.

  3. simple_sanitize_data function: Similar to _process_arg, the simple_sanitize_data function in models.py (used by sanitize_kwargs) explicitly handles bytes and list recursively, but it does not have logic to recurse into nested dictionaries. If a dictionary is passed to simple_sanitize_data, it will be returned as-is without its internal values being sanitized.

Example of a Scenario Requiring Dictionary Value Handling

While core spec functions often use Container objects for structured data, helper functions or configuration parameters might sometimes be passed as dictionaries. Consider a hypothetical spec helper function that takes a dictionary where some values are bytes that need to be hex-encoded for the trace:

# Hypothetical spec helper function (not directly from the diff, but plausible)
def process_metadata(self, state: BeaconState, metadata: Dict[str, bytes]) -> None:
    """Processes a dictionary of metadata, where values are byte arrays."""
    for key, value in metadata.items():
        # Imagine some processing of the byte values here
        pass
    # ...

If a test calls this function:

# In a test function decorated with @spec_trace
metadata_payload = {
    'block_hash': b'\x11' * 32,  # A byte string
    'state_root': b'\x22' * 32,  # Another byte string
    'version': b'\x01'           # A short byte string
}
spec.process_metadata(state, metadata=metadata_payload)

Here's how the current neotrace implementation would handle metadata_payload:

  1. When spec.process_metadata is called, RecordingSpec._spec_call_hook will invoke _process_arg on the metadata_payload dictionary.
  2. Inside _process_arg(metadata_payload):
    • isinstance(metadata_payload, Sequence) is True.
    • The code [self._process_arg(elem) for elem in metadata_payload] will iterate over the keys of the dictionary: 'block_hash', 'state_root', 'version'.
    • _process_arg will be called on each of these string keys. Since strings are not View objects and are not Sequence types (for recursive processing in this context), _process_arg will simply return the string keys themselves.
    • Consequently, serial_params['metadata'] (which would eventually become the input for SpecCallOp) would incorrectly be ['block_hash', 'state_root', 'version']. The actual byte values would be lost, and the dictionary structure would be flattened into a list of keys.

This example demonstrates that if a spec function takes a dictionary as an argument, and the values within that dictionary need to be serialized (e.g., bytes to hex, View objects to SSZ references), the current _process_arg logic would not correctly capture them.

Recommendation

To correctly handle dictionaries where their values (or nested values) require serialization, I recommend modifying RecordingSpec._process_arg to explicitly handle dictionaries by iterating over their items() or values() and recursively processing them. Additionally, the SERIALIZED_ARGS type alias in typing.py might need to be updated to explicitly allow for nested dictionaries if this is an intended use case.

For instance, _process_arg could be extended:

# Proposed change to traced_spec.py
def _process_arg(self, arg: RAW_ARGS) -> SERIALIZED_ARGS:
    # ... (existing code for Sequence, View, etc.)

    # Explicitly handle dictionaries by processing their values
    if isinstance(arg, dict):
        return {k: self._process_arg(v) for k, v in arg.items()}

    # ... (existing code for is_serializable and default return)

And similarly, simple_sanitize_data in models.py could be updated if it needs to handle nested dictionaries beyond the top-level input:

# Proposed change to models.py
def simple_sanitize_data(value: SERIALIZED_ARGS) -> SERIALIZED_ARGS:
    # ... (existing code for bytes, list)

    # Recursively clean dictionaries
    if isinstance(value, dict):
        return {k: simple_sanitize_data(v) for k, v in value.items()}

    return value

This would ensure that the contents of dictionaries passed as arguments are properly serialized and included in the trace, maintaining their structure.

@IvanAnishchuk
Copy link
Owner Author

@gemini-code-assist process_metadata doesn't exist in the spec. Can you show a factual example of a function in the spec that does it?

@gemini-code-assist
Copy link

Hello @IvanAnishchuk, thank you for pushing for a more concrete example from the actual spec code. You're right to challenge a hypothetical example. While core spec functions in eth2spec typically operate on well-defined SSZ Container objects or primitive types, there are scenarios, particularly within helper functions or utility methods that RecordingSpec would also intercept, where dictionaries might be used in a way that exposes the _process_arg limitation.

Since I don't have direct access to the entire ethereum/consensus-specs codebase to pinpoint an exact line in a spec file, I will provide a plausible, spec-idiomatic example of a helper function that demonstrates the issue. This example aligns with the kind of data structures and operations found within the eth2spec ecosystem.

Context: Helper Functions and Dynamic Data

Many spec modules contain helper functions that perform utility operations, data transformations, or handle configuration. These functions might accept more flexible Python data structures like dictionaries, especially when dealing with dynamic or auxiliary data that doesn't warrant a full SSZ Container definition.

Consider a hypothetical helper function within a spec module, perhaps for processing some form of metadata or configuration that includes byte strings (e.g., hashes, IDs) that should be hex-encoded in the trace:

# Hypothetical helper function within a spec module (e.g., in eth2spec/specs/deneb/helpers.py)
from typing import Dict
from eth2spec.utils.ssz.ssz_typing import Bytes32

def process_dynamic_config(state: BeaconState, config_params: Dict[str, Bytes32 | int]) -> None:
    """Processes dynamic configuration parameters, some of which are byte arrays."""
    for key, value in config_params.items():
        if isinstance(value, Bytes32):
            # Imagine some operation that uses the byte value
            print(f"Processing {key}: {value.hex()}")
        else:
            print(f"Processing {key}: {value}")
    # ... further processing

# A test calling this helper function, assuming it's part of the 'spec' object
# (e.g., if it's a method of a class that 'spec' is an instance of, or dynamically added)

# In a test function decorated with @spec_trace
# from eth2spec.utils.ssz.ssz_typing import Bytes32

# config_data = {
#     'genesis_hash': Bytes32(b'\x01' * 32),  # A Bytes32 object
#     'chain_id': 1,
#     'validator_root': Bytes32(b'\x02' * 32) # Another Bytes32 object
# }
# spec.process_dynamic_config(state, config_params=config_data)

How neotrace currently handles this (and where it falls short):

If spec.process_dynamic_config(state, config_params=config_data) were called:

  1. RecordingSpec._spec_call_hook would intercept the call.
  2. _bind_args would correctly bind state and config_params.
  3. The serial_params dictionary would be constructed by calling self._process_arg on each value of bound_args.arguments. So, _process_arg would be called with config_params as its arg parameter.
  4. Inside RecordingSpec._process_arg (lines 209-210):
    # traced_spec.py
    def _process_arg(self, arg: RAW_ARGS) -> SERIALIZED_ARGS:
        # ...
        # recursively handle lists and tuples
        if isinstance(arg, Sequence) and not isinstance(arg, str | bytes):
            return [self._process_arg(elem) for elem in arg]
        # ...
    • isinstance(config_params, Sequence) would evaluate to True because Python dictionaries are sequences (they iterate over their keys).
    • The list comprehension [self._process_arg(elem) for elem in arg] would then iterate over the keys of config_params (i.e., 'genesis_hash', 'chain_id', 'validator_root'), not their values (Bytes32(b'\x01' * 32), 1, Bytes32(b'\x02' * 32)).
    • _process_arg would be called on each of these string keys. Since strings are not View objects and are not Sequence types (for further recursive processing in this context), _process_arg would simply return the string keys themselves.
    • The actual Bytes32 values (which are View objects and should be serialized as SSZ artifacts) and the int value would be completely missed by the serialization logic within _process_arg. The resulting serial_params['config_params'] would incorrectly be ['genesis_hash', 'chain_id', 'validator_root'], losing the dictionary structure and failing to serialize the Bytes32 objects.

Why this is a problem:

This scenario highlights a discrepancy between the RAW_ARGS type alias (which includes Sequence and thus dict, implying recursive processing of its elements) and the current implementation of _process_arg. If a dictionary is passed as an argument and its values contain View objects or bytes that need to be converted to hex strings or SSZ artifact references, the current logic will fail to process them correctly.

Proposed Solution (reiterated):

To address this, RecordingSpec._process_arg needs to explicitly handle dictionaries by iterating over their items() or values() and recursively processing them, similar to how lists are handled. This ensures that the contents of dictionaries are properly serialized and included in the trace, maintaining their structure.

# Proposed change to traced_spec.py
def _process_arg(self, arg: RAW_ARGS) -> SERIALIZED_ARGS:
    # ... (existing code for Sequence, View, etc.)

    # Explicitly handle dictionaries by processing their values
    if isinstance(arg, dict):
        return {k: self._process_arg(v) for k, v in arg.items()}

    # ... (existing code for is_serializable and default return)

This modification would ensure that the neotrace framework is robust enough to handle such plausible scenarios within the spec's helper functions, even if they are not prevalent in the core state transition functions.

@IvanAnishchuk
Copy link
Owner Author

@gemini-code-assist it's not 100% but as far as I know spec methods never use dicts (sometimes they use lists though), so handling this is unnecessary complexity and less precise type annotations to accomodate an edge case that is not there. Let's not do this, at least for now. If you see any good ways to make sure everything fails early and nicely if unpredicted situation actually happens - feel free to suggest. Same for documentation improvements - non-obvious decisions like this should be properly documented in comments.

@gemini-code-assist
Copy link

Hello @IvanAnishchuk,

Thank you for the clarification and for pushing for a more precise understanding of dictionary usage within the spec. Your insight that "spec methods never use dicts (sometimes they use lists though)" for arguments requiring complex serialization is crucial.

You are absolutely correct that if such dictionaries are not used as arguments where their values need special serialization (like converting Bytes32 to SSZ artifact references or bytes to hex strings), then implementing recursive processing for dictionary values would indeed add unnecessary complexity.

However, it's important to highlight the current behavior of RecordingSpec._process_arg when it encounters a dictionary, and how we can ensure "fail early and nicely" if an unexpected situation arises.

Current Behavior and Potential Issue

In RecordingSpec._process_arg (lines 209-210 of traced_spec.py):

        if isinstance(arg, Sequence) and not isinstance(arg, str | bytes):
            return [self._process_arg(elem) for elem in arg]

A Python dict is considered a Sequence (it iterates over its keys). Therefore, if arg were a dictionary, this line would iterate over the keys of the dictionary and apply self._process_arg to each key (which are typically strings), completely ignoring the dictionary's values. This would lead to a loss of information and incorrect tracing if the dictionary's values contained View objects or bytes that were intended to be serialized.

For example, if spec.some_method(config={'hash': Bytes32(b'...')}) were called, the trace would incorrectly record config=['hash'], completely missing the Bytes32 object.

Recommendation for Early Failure and Documentation

Given your confirmation that dictionaries are not expected to be used in this manner, the most robust way to ensure that any deviation from this assumption leads to an immediate and clear error (rather than silent data loss) is to explicitly disallow dictionaries as arguments to _process_arg for now. This also addresses your request for documentation.

  1. Add a clarifying comment to _process_arg: This comment will document the current design decision and the expectation regarding dictionary arguments.

  2. Raise a TypeError for dictionary arguments: By explicitly raising an error if a dictionary is passed to _process_arg, we prevent the silent and incorrect processing of only its keys. This ensures that if the spec's usage patterns change, or if a developer inadvertently passes a dictionary where its values should be traced, the system will fail loudly and informatively.

Here's how traced_spec.py could be modified:

# Proposed change to traced_spec.py

    def _process_arg(self, arg: RAW_ARGS) -> SERIALIZED_ARGS:
        """
        Process a potential container or primitive object.

        Returns the root hash of container or the original primitive.
        The idea is that after passing this we either have a hash or something
        that we can assume is serializable.

        Note: Dictionaries are currently not expected to be passed as arguments
        to spec methods in a way that requires their internal values to be
        recursively processed for serialization (e.g., for nested View objects
        or bytes needing hex encoding). If a dictionary is encountered,
        a TypeError will be raised to prevent silent data loss or incorrect tracing.
        """
        # Explicitly disallow dictionaries as arguments to _process_arg for now,
        # as their recursive value processing is not implemented and not expected.
        if isinstance(arg, dict):
            raise TypeError(
                f"Dictionary argument found for tracing: {arg}. "
                "Recursive processing of dictionary values is not currently supported "
                "and not expected in spec method arguments. "
                "If this is an intended use case, `_process_arg` needs to be updated."
            )

        # recursively handle lists and tuples
        if isinstance(arg, Sequence) and not isinstance(arg, str | bytes):
            return [self._process_arg(elem) for elem in arg]

        if is_serializable(arg):
            return self._save_artifact(arg)
        else:
            return arg

This approach directly addresses your concerns by:

  • Avoiding unnecessary complexity: It does not implement recursive dictionary processing, as it's not currently needed.
  • Failing early and nicely: Any unexpected dictionary argument will immediately raise a TypeError with a clear message.
  • Improving documentation: The added docstring explains the rationale behind this decision.

Let me know if this refined approach aligns with your expectations.

@IvanAnishchuk
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a powerful new tracing framework, neotrace, for testing consensus-specs. The implementation is well-designed, leveraging wrapt for non-invasive proxying and Pydantic for robust data modeling and serialization. The separation of concerns into the decorator, the recorder proxy, and the data models is clean. The unit tests for the recorder are thorough and cover important edge cases like state mutation and argument sanitization. The integration with the existing test runner is also handled correctly. I have one minor suggestion to improve a comment for better maintainability. Overall, this is an excellent addition to the testing infrastructure.

coderabbitai[bot]
coderabbitai bot previously requested changes Dec 21, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c0f0fc4 and 540cf9a.

📒 Files selected for processing (2)
  • tests/infra/trace/models.py (1 hunks)
  • tests/infra/trace/traced_spec.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
tests/infra/trace/models.py

20-25: Avoid specifying long messages outside the exception class

(TRY003)

tests/infra/trace/traced_spec.py

70-70: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


70-70: Dynamically typed expressions (typing.Any) are disallowed in wrapped_spec

(ANN401)


77-77: Dynamically typed expressions (typing.Any) are disallowed in __getattr__

(ANN401)


98-98: Unused method argument: instance

(ARG002)


98-98: Dynamically typed expressions (typing.Any) are disallowed in instance

(ANN401)


218-223: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (15)
tests/infra/trace/models.py (6)

17-33: Looks good: Intentional dictionary rejection with clear error messaging.

The explicit TypeError for dictionaries is a deliberate design choice (as clarified in past discussions) to fail fast when unexpected dictionary arguments slip through. The error message is descriptive and actionable.

The static analysis hint (TRY003) about long exception messages is a stylistic preference; here, the verbose message aids debugging and documents the constraint, which is appropriate for this internal tracing infrastructure.


36-42: LGTM!

Clean abstract base with extra="forbid" to catch typos in YAML/JSON inputs.


44-50: LGTM!

Good extraction of shared state_root field into a base class. The pattern correctly validates a 64-character hex hash with the .ssz_snappy suffix.


52-72: LGTM!

Both LoadStateOp and AssertStateOp correctly inherit from StateOp and use Literal types for the discriminator field. The docstrings are clear and accurate.


74-99: LGTM!

The SpecCallOp model is well-structured with appropriate field serializers. Using when_used="json" ensures sanitization occurs during both JSON serialization and model_dump(mode="json"), which is typically used for YAML output as well.


101-120: LGTM!

The TraceConfig root model is well-designed:

  • extra="forbid" catches unexpected fields (addressed from prior feedback)
  • Discriminated union on "op" enables proper polymorphic deserialization
  • PrivateAttr correctly excludes _artifacts from serialization

Note: The artifacts property returns the internal dict directly, allowing external mutation. This appears intentional for the tracing workflow where RecordingSpec registers artifacts.

tests/infra/trace/traced_spec.py (9)

30-50: LGTM!

The is_serializable helper clearly defines what qualifies for SSZ serialization: View instances excluding primitives (int, bool) and None.


77-94: LGTM!

The attribute interception logic is sound:

  • Public methods defined on RecordingSpec (like finalize_trace) are resolved directly, bypassing __getattr__
  • Only public lowercase callables from the wrapped spec are decorated
  • The type: ignore[misc] is appropriate for the wrapt.ObjectProxy parent call

96-132: LGTM!

The hook implementation follows a clear flow (A→E) with proper argument binding, state capture, execution, and recording.

The unused instance parameter is required by wrapt.decorator's signature, so leaving it as-is is appropriate. Optionally, you could rename it to _instance to signal it's intentionally unused.


134-144: LGTM!

Clean helper that normalizes positional and keyword arguments into a consistent mapping, with apply_defaults() ensuring all parameters are captured.


146-161: LGTM!

The pre-state capture logic correctly handles:

  • State mutations detected via hash comparison
  • Auto-insertion of AssertStateOp for the previous state before loading new state
  • Clean early return for non-View arguments

The past feedback about walrus operator usage has been addressed.


163-176: LGTM!

Simple step recording that delegates sanitization to SpecCallOp's field serializers.


191-197: LGTM!

Content-addressed artifact storage with clear separation: serialization happens here, compression deferred to the dumper.


199-232: LGTM!

The argument processing logic is robust:

  • Explicit TypeError for unexpected dictionaries (intentional per design discussions)
  • Recursive sequence handling
  • Clean serialization of View objects via _save_artifact

The verbose error message aids debugging for this internal infrastructure.


234-244: LGTM!

Clean finalization logic. The finalize_trace method provides a clear public API for closing out the trace.

Minor: Consider adding -> None return type annotations for consistency with other methods.

@IvanAnishchuk
Copy link
Owner Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an impressive and well-designed tracing framework (neotrace) for pyspec tests. The use of wrapt for proxying, Pydantic for data modeling, and the overall structure are clean and robust. The accompanying documentation and unit tests are also thorough and helpful. However, I've identified a critical bug in the @spec_trace decorator that prevents the tracing mechanism from working as intended. The fix is straightforward, and once addressed, this will be an excellent addition to the testing infrastructure.

@IvanAnishchuk
Copy link
Owner Author

/gemini review

@IvanAnishchuk
Copy link
Owner Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an impressive new tracing framework, neotrace, for Ethereum consensus spec tests. The implementation is well-designed, leveraging wrapt for transparent proxying of spec calls, Pydantic for robust data modeling of traces, and includes comprehensive documentation and unit tests. The integration with the existing test runner is also handled cleanly. My review identified a couple of minor areas for improvement related to confusing comments in the core logic, which could enhance future maintainability. Overall, this is a solid and valuable addition to the testing infrastructure.

Comment on lines 216 to 217
# A dict is a Mapping, not a Sequence, so this check is necessary
# to catch it before the Sequence check below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This comment is slightly misleading. A dict is not a Sequence, so it wouldn't be caught by the isinstance(arg, Sequence) check anyway. The check for dict is necessary because dictionaries are explicitly not supported for recursive processing at this stage. The TypeError message explains this well, so the comment could be removed to avoid confusion.

@IvanAnishchuk
Copy link
Owner Author

/gemini review

@IvanAnishchuk
Copy link
Owner Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a powerful new tracing framework, neotrace, for creating spec tests. The implementation is well-designed, using wrapt for transparent proxying and Pydantic for structured trace generation, which is a great approach. The code is clean, well-documented in the new README.md, and includes a comprehensive set of unit tests for the core tracing logic. The integration into the existing test runner is also clean and minimal. My feedback includes a couple of minor suggestions to improve clarity and remove redundancy. Overall, this is an excellent contribution.

Comment on lines +18 to +25
if isinstance(value, dict):
raise TypeError(
f"Dictionary argument found for tracing: {value}. "
"Recursive processing of dictionary values is not currently supported "
"and not expected in spec method arguments. "
"If this is an intended use case, tracing models and logic need to be updated."
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This isinstance(value, dict) check appears to be unreachable given the type hints. The SERIALIZED_ARGS type for value doesn't include dict. The _process_arg function, which prepares the data before it reaches this sanitization step, already has a check to prevent dictionaries. Removing this redundant check will make the code cleaner and easier to understand.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist it's necessary because we don't want to support nesting dicts yet don't want surprising behavior if anything changes in future at any time

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for clarifying your intent, IvanAnishchuk. I understand that the isinstance(value, dict) check serves as a valuable defensive mechanism against potential future changes that might introduce dictionaries into simple_sanitize_data, even if current type hints and upstream logic prevent them. Keeping it as an explicit assertion for future robustness is a reasonable approach.

Simple type checking.
The current rule is to serialize any View that
is not a primitive (int or bool).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring here is slightly inaccurate. It says primitives are int or bool, but the PRIMITIVES type alias also includes str and bytes. It would be clearer to update the docstring to match the code.

Suggested change
is not a primitive (int or bool).
is not a primitive (int, bool, str, or bytes).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemini-code-assist it's not exhaustive

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, explicitly listing all primitive types can be hard to maintain and might not be exhaustive in a broader sense. To make the docstring more robust and align directly with the implementation, how about we refer to the PRIMITIVES type alias?

This way, the docstring will always be accurate without needing updates if the definition of PRIMITIVES changes in the future.

Suggested change
is not a primitive (int or bool).
is not a primitive (as defined by `PRIMITIVES`).

Typing annotations, comments, edge cases.
Fix extra deps, fix styiling in the README.
@IvanAnishchuk
Copy link
Owner Author

@gemini-code-assist okay, let's thoroughly analyse this pull request and decide if it's ready for final review and testing

@IvanAnishchuk
Copy link
Owner Author

/gemini review

@IvanAnishchuk
Copy link
Owner Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Dec 21, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
tests/infra/trace/test_traced_spec.py (1)

328-341: Edge case test for empty trace is appropriate.

The comment at line 333 acknowledges this intentionally tests internal behavior. This is a valid unit test approach to ensure _capture_pre_state and _capture_post_state handle None gracefully without side effects.

tests/infra/trace/traced_spec.py (1)

30-49: is_serializable function has a minor docstring inaccuracy.

The docstring states primitives are "int or bool", but the PRIMITIVES type alias (from typing.py) likely includes str and bytes as well, based on the context that these should not be SSZ-serialized as artifacts.

🔎 Suggested docstring fix
 def is_serializable(value: RAW) -> bool:
     """
     Simple type checking.
 
     The current rule is to serialize any View that
-    is not a primitive (int or bool).
+    is not a primitive (int, bool, str, or bytes).
     """
📜 Review details

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 65b222a and 4158ee7.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (9)
  • pyproject.toml (1 hunks)
  • tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py (1 hunks)
  • tests/infra/trace/README.md (1 hunks)
  • tests/infra/trace/decorator.py (1 hunks)
  • tests/infra/trace/models.py (1 hunks)
  • tests/infra/trace/test_example_slots_2.py (1 hunks)
  • tests/infra/trace/test_traced_spec.py (1 hunks)
  • tests/infra/trace/traced_spec.py (1 hunks)
  • tests/infra/trace/typing.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py (1)
tests/infra/trace/models.py (1)
  • artifacts (117-119)
tests/infra/trace/decorator.py (1)
tests/infra/trace/traced_spec.py (3)
  • RecordingSpec (52-242)
  • finalize_trace (240-242)
  • model (66-68)
tests/infra/trace/test_traced_spec.py (3)
tests/core/pyspec/eth2spec/utils/ssz/ssz_impl.py (2)
  • serialize (12-13)
  • ssz_serialize (8-9)
tests/infra/trace/traced_spec.py (4)
  • RecordingSpec (52-242)
  • finalize_trace (240-242)
  • _capture_pre_state (146-161)
  • _capture_post_state (178-189)
pysetup/spec_builders/base.py (1)
  • fork (7-8)
🪛 Ruff (0.14.8)
tests/infra/trace/decorator.py

21-21: Dynamically typed expressions (typing.Any) are disallowed in *args

(ANN401)


21-21: Dynamically typed expressions (typing.Any) are disallowed in **kwargs

(ANN401)


27-29: Avoid specifying long messages outside the exception class

(TRY003)


32-34: Avoid specifying long messages outside the exception class

(TRY003)

tests/infra/trace/test_traced_spec.py

25-25: Missing return type annotation for special method __new__

(ANN204)


25-25: Unused static method argument: root

(ARG004)


30-30: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


49-49: Missing return type annotation for special method __new__

(ANN204)


49-49: Unused static method argument: root

(ARG004)


52-52: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


65-65: Missing return type annotation for special method __new__

(ANN204)


65-65: Unused static method argument: root

(ARG004)


68-68: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


85-85: Unused method argument: slot

(ARG002)


95-95: Unused method argument: state

(ARG002)


99-99: Unused method argument: state

(ARG002)

tests/infra/trace/models.py

19-24: Avoid specifying long messages outside the exception class

(TRY003)

tests/infra/trace/typing.py

1-1: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

tests/infra/trace/traced_spec.py

70-70: Dynamically typed expressions (typing.Any) are disallowed in wrapped_spec

(ANN401)


77-77: Dynamically typed expressions (typing.Any) are disallowed in __getattr__

(ANN401)


98-98: Unused method argument: instance

(ARG002)


98-98: Dynamically typed expressions (typing.Any) are disallowed in instance

(ANN401)


216-221: Avoid specifying long messages outside the exception class

(TRY003)

🔇 Additional comments (24)
pyproject.toml (1)

36-36: LGTM!

The wrapt dependency is correctly placed in the test optional dependencies, which aligns with its usage solely in the test infrastructure (tests/infra/trace/). Version pinning ensures reproducible builds.

tests/infra/trace/README.md (1)

1-134: LGTM!

The documentation is comprehensive and well-structured, covering the framework's purpose, usage examples, trace file format, and implementation details. The code examples clearly demonstrate how to use the @spec_trace decorator, and the TODO section appropriately tracks remaining work.

tests/infra/trace/typing.py (1)

8-23: LGTM!

The type aliases are well-structured for the tracing system. Using TypeAliasType for recursive definitions (SERIALIZED_ARGS, RAW_ARGS) ensures pydantic schema compatibility. The STATE TypeVar bound to View provides proper type safety for state representations.

tests/infra/trace/decorator.py (1)

9-51: LGTM!

The decorator implementation is well-structured:

  • Proper argument binding and validation with clear error messages
  • Correct injection of the RecordingSpec proxy via bound_args.arguments
  • The bound_args.args and bound_args.kwargs properties are computed from bound_args.arguments, so the modified spec is correctly passed to the test function
  • Finalization happens after test execution, and trace data is yielded for the runner

The static analysis hints about Any in *args/**kwargs and long exception messages are acceptable in this decorator context where flexible argument handling and clear debugging information are priorities.

tests/infra/trace/test_example_slots_2.py (3)

15-28: LGTM!

Clean example test demonstrating the simplest tracing use case. The decorator ordering (@with_all_phases, @spec_state_test, @spec_trace) is correct, and the comment accurately describes that process_slot mutates state in-place.


30-43: LGTM!

Well-structured helper function with sensible defaults. The nested BuilderPendingWithdrawal construction is clean and readable.


46-66: LGTM!

Good integration test demonstrating more complex tracing scenarios including epoch advancement, state mutations via builder_pending_payments, epoch processing, and diverse input/output types. The comments clarify the intentional design choices.

tests/infra/trace/test_traced_spec.py (5)

20-42: Mock classes are well-designed for testing.

The mock classes correctly override hash_tree_root() to return controlled test values while inheriting from Container to satisfy type checks. The __new__ override to intercept root before Container.__new__ is a valid pattern for these test mocks.

The static analysis warnings about unused root parameter and missing return type annotations are false positives in this test context—root is used to set self._root in __init__, and strict typing on test mocks is unnecessary.


80-107: MockSpec provides good test coverage for different spec method patterns.

The mock methods cover key scenarios: state-mutating operations (tick), no-op operations (no_op), operations returning primitives (get_current_epoch), bytes (get_root), and SSZ objects (get_block_root). The unused argument warnings from static analysis are expected for mock implementations.


125-166: Thorough test for basic function call recording.

This test verifies the complete flow: implicit load_state injection, spec_call recording with correct method/output, and auto-injected assert_state on finalize. The assertions on both raw model attributes and model_dump(mode="json") output ensure both internal state and serialization work correctly.


169-227: Sanitization tests correctly verify dual representation.

The tests appropriately verify that raw values (bytes, tuples) are stored as-is in the model, while model_dump(mode="json") produces the sanitized output (hex strings, lists). This validates the field serializer behavior in SpecCallOp.


230-294: Excellent coverage of state tracking edge cases.

The test correctly verifies:

  1. Initial state registration with load_state
  2. No redundant load_state when state is unchanged
  3. Detection of out-of-band mutations triggering assert_state + load_state

This is critical for ensuring the trace accurately reflects state transitions.

tests/infra/trace/models.py (4)

17-32: Sanitization function is well-designed with clear constraints.

The explicit TypeError for dictionaries is a good defensive measure, as past discussions confirm dictionaries are not expected in spec method arguments. The recursive list handling and bytes-to-hex conversion cover the necessary cases.

The static analysis hint about long exception messages (TRY003) is a style preference—the detailed message aids debugging and is acceptable here.


43-70: StateOp base class and subclasses are well-structured.

The extraction of common state_root field into StateOp base class (as suggested in past review) reduces duplication. The regex pattern ^[0-9a-f]{64}\.ssz_snappy$ correctly validates the full artifact filename format, ensuring consistency with how state_root is now set in traced_spec.py.


73-97: SpecCallOp field serializers correctly handle the dual representation.

The when_used="json" parameter ensures raw values (bytes, etc.) are preserved in the model for internal use, while JSON serialization applies simple_sanitize_data. The sanitize_kwargs method correctly handles the top-level input dict without recursive dict processing—consistent with the design decision that dicts are only expected at the top level (kwargs).


100-118: TraceConfig is well-designed as the root schema.

The discriminated union via Annotated[..., Field(discriminator="op")] enables proper deserialization of different trace step types. The PrivateAttr for _artifacts correctly keeps binary artifact data out of the YAML/JSON serialization while exposing it via the artifacts property for the dumper to write SSZ files.

tests/infra/trace/traced_spec.py (6)

65-75: Initialization is clean and appropriate.

The use of Any for wrapped_spec is justified given the dynamically-generated nature of spec modules. The _last_state_root initialization to None correctly represents "no state tracked yet".


77-94: Attribute interception logic is well-designed.

The filtering criteria (public, lowercase, callable) correctly targets spec methods like process_slots, get_current_epoch while excluding types (BeaconState), constants, and internal methods. The lazy wrapping approach is efficient.


96-132: The spec call hook implements the correct interception flow.

The execution order is important and correct:

  1. Bind and process arguments (registers input artifacts)
  2. Capture pre-state (handles state loading)
  3. Execute the actual function
  4. Process result (registers output artifacts)
  5. Record the step
  6. Capture post-state (handles state mutations)

The unused instance parameter is required by the wrapt.decorator signature and cannot be removed.


146-161: Pre-state capture correctly handles state transitions.

The logic properly:

  1. Skips non-View states
  2. Asserts the previous state if one was tracked (for out-of-band mutation detection)
  3. Loads the new state
  4. Updates the tracker

The f"{new_root}.ssz_snappy" format matches the StateOp.state_root pattern validation.


199-230: Argument processing correctly implements the serialization rules.

The method:

  1. Explicitly rejects dicts (per design decision validated in past reviews)
  2. Recursively processes sequences (excluding str/bytes which are primitives)
  3. Serializes View objects as artifacts
  4. Passes through primitives unchanged

The explicit TypeError for dicts is a good fail-fast approach to catch unexpected usage patterns.


232-242: Finalization correctly handles the final state assertion.

The _record_auto_assert_step method correctly guards against the case where no state was ever tracked (_last_state_root is None). The public finalize_trace method provides a clean API for test teardown.

tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py (2)

101-102: The comment and code are correct; no changes needed.

The review's concern that "artifacts can contain unserializedSSZ objects" is unfounded. The type annotation in models.py explicitly defines _artifacts: dict[str, bytes], and all code paths that populate artifacts use ssz_serialize(), which returns bytes. The dumper.dump_ssz() method correctly expects bytes and directly compresses them. The existing comment accurately describes the actual implementation.


96-103: The "pydantic" branch uses dump_data(), not dump_trace()—correct the review comment.

The new "pydantic" branch at lines 96-103 appends output with kind="data" (line 98), which causes the code to call dumper.dump_data() at line 116, not dumper.dump_trace(). The Dumper class has methods dump_meta(), dump_cfg(), dump_data(), dump_ssz(), and dump_manifest(), but no dump_trace() method.

However, the type safety concern remains valid: data is typed as Any (line 93), so calls to data.model_dump() and data.artifacts lack runtime validation and will fail with AttributeError if the object lacks the expected Pydantic interface.

Likely an incorrect or invalid review comment.

@@ -0,0 +1,23 @@
from collections.abc import Sequence # noqa: F401
Copy link

@coderabbitai coderabbitai bot Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Remove unnecessary noqa directive.

The Sequence import is actually used (in the TypeAliasType string on line 20, which is evaluated at runtime). Since Ruff indicates F401 is not enabled, the noqa directive is unnecessary.

🔎 Suggested fix
-from collections.abc import Sequence  # noqa: F401
+from collections.abc import Sequence
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
from collections.abc import Sequence # noqa: F401
from collections.abc import Sequence
🧰 Tools
🪛 Ruff (0.14.8)

1-1: Unused noqa directive (non-enabled: F401)

Remove unused noqa directive

(RUF100)

🤖 Prompt for AI Agents
In tests/infra/trace/typing.py around line 1, the import "from collections.abc
import Sequence  # noqa: F401" includes an unnecessary "noqa: F401" because
Sequence is actually used at runtime; remove the trailing "# noqa: F401" so the
line becomes a normal import without the noqa directive.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai no, I don't think so, there was a tool that warned about this - I'd rather keep the noqa

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IvanAnishchuk Understood, I'll leave it as-is. If you prefer to keep the noqa directive to prevent potential warnings from other tools or configurations, that's perfectly reasonable.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an impressive new tracing framework, neotrace, for generating Ethereum consensus spec tests. The implementation is well-engineered, leveraging wrapt for transparent proxying and pydantic for structured trace modeling. The code is clean, well-documented, and thoroughly tested with both unit tests for the core logic and example tests demonstrating its usage. The changes to the existing test runner are minimal and cleanly integrated. I have one minor suggestion to improve the wording in the new README file.


wrapt is used to wrap spec methods and record their calls, parameters and
results. A decorator is used to set things up. Some simple pydantic models are
used for the trace file structure and some sanitation/formatting.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A minor wording suggestion: 'sanitization' is the more common term in software engineering for cleaning or filtering data, whereas 'sanitation' typically refers to public health. Using 'sanitization' would be more precise here.

Suggested change
used for the trace file structure and some sanitation/formatting.
used for the trace file structure and some sanitization/formatting.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new tracing framework for tests, which is a significant and well-implemented feature. The use of wrapt for proxying and pydantic for data modeling is robust. The code is well-structured, documented, and comes with a comprehensive set of tests. I have one suggestion to improve maintainability by removing a piece of duplicated code. Overall, this is a high-quality contribution.

Comment on lines +18 to +24
if isinstance(value, dict):
raise TypeError(
f"Dictionary argument found for tracing: {value}. "
"Recursive processing of dictionary values is not currently supported "
"and not expected in spec method arguments. "
"If this is an intended use case, tracing models and logic need to be updated."
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This dictionary check is redundant. The _process_arg function in traced_spec.py already performs this validation before the data is passed to the pydantic model. Removing this duplicated logic will improve code maintainability.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants