-
Notifications
You must be signed in to change notification settings - Fork 0
feat(trace): neotrace - PR for testing and self-reviews #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
pydantic models for the spec trace core spec tracing logic use wrapt to wrap the spec and intercept the calls tracing decorator some basic unit tests for the trace recorder some converted test examples use 0x prefix for hex bytes in trace a README with a short explanation how tracing works
add "method" to StepModel for spec_call op remove unneeded things address some more requirements, format, etc. new approach - decorator just generates data for dumper add the auto-assert of state in the end of test trace adjust assert/load tracing logic according to the issue rename record_spec_trace -> spec_trace test fixes more simplicity some cleanup
Summary of ChangesHello @IvanAnishchuk, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request implements a novel tracing framework for Ethereum consensus specification tests, dubbed 'neotrace'. The primary goal is to enhance the testing process by providing a detailed, automated record of spec method interactions. It achieves this by dynamically proxying spec methods to capture arguments, return values, and state mutations, which are then serialized into human-readable YAML trace files alongside their corresponding SSZ artifacts. This framework aims to make tests more linear and simplify the dumping of complex data, offering a clearer insight into test execution flows. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an impressive new tracing framework for Ethereum consensus spec tests. The use of wrapt for transparently proxying the spec object and Pydantic for data modeling is well-executed. The implementation is clean and the addition of unit tests for the tracer itself is a great practice.
I've left a few minor comments, mostly related to documentation typos and small code style improvements. Overall, this is a solid contribution that should significantly simplify writing and maintaining spec tests.
| outputs += [ | ||
| # dump trace data for yaml serialization | ||
| ("trace", "data", data.model_dump(mode="json", exclude_none=True)), | ||
| ] + [ | ||
| (name, "ssz", value) | ||
| # ssz artifacts are already serialized and will be compressed by the dumper | ||
| for name, value in data._artifacts.items() | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list concatenation [...] + [...] followed by outputs += can be slightly simplified by using outputs.extend() with a generator expression that chains the two parts. This avoids creating an intermediate list.
outputs.extend([
# dump trace data for yaml serialization
("trace", "data", data.model_dump(mode="json", exclude_none=True)),
*((name, "ssz", value)
# ssz artifacts are already serialized and will be compressed by the dumper
for name, value in data._artifacts.items()),
])
tests/infra/trace/README.md
Outdated
| consensus spec tests, based on tracing spec method calls and recording them in a | ||
| structured trace file. | ||
|
|
||
| The basic idea is make tests more simple and linear and hide the minutae of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/README.md
Outdated
| structured trace file. | ||
|
|
||
| The basic idea is make tests more simple and linear and hide the minutae of | ||
| dumping data into the test harness (`@spec_trace` decoratior) and automate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/README.md
Outdated
| dumping data into the test harness (`@spec_trace` decoratior) and automate | ||
| everything that doesn't have to be manual. | ||
|
|
||
| Spec is being wrapped into a transparent proxy object and all method call are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/README.md
Outdated
| In this example, `process_slot` does not return anything but we can see the | ||
| initial state and the final state being dumped automatically and they are | ||
| different. In the other more complex example test (omitted here for brewety) we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/__init__.py
Outdated
| @@ -0,0 +1 @@ | |||
| from tests.infra.trace.decorator import spec_trace as spec_trace | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist this is intentional for re-exporting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct that this import in __init__.py is for re-exporting spec_trace from the decorator module. However, the as spec_trace part is still redundant. When you import spec_trace from tests.infra.trace.decorator, it is already named spec_trace in the current namespace. The as keyword is typically used to rename an imported object (e.g., import long_name as short_name).
Removing as spec_trace will not change the functionality or the re-export behavior, but it will make the line more concise and adhere to common Python practices for clarity.
| from tests.infra.trace.decorator import spec_trace as spec_trace | |
| from tests.infra.trace.decorator import spec_trace |
| from tests.infra.trace.decorator import spec_trace as spec_trace | |
| from tests.infra.trace.decorator import spec_trace |
tests/infra/trace/models.py
Outdated
| if isinstance(v, str): | ||
| return str(v) | ||
| if isinstance(v, int): | ||
| return int(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The str(v) and int(v) calls are redundant since the isinstance checks already confirm the types. You can remove these explicit casts for cleaner code.
| if isinstance(v, str): | |
| return str(v) | |
| if isinstance(v, int): | |
| return int(v) | |
| if isinstance(v, str): | |
| return v | |
| if isinstance(v, int): | |
| return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist this is intentional to catch and coerce subtypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification, @IvanAnishchuk. Your explanation regarding the intentional coercion of subtypes makes perfect sense. In that context, the explicit str(v) and int(v) calls are indeed necessary and not redundant. I'll retract my previous suggestion.
| @@ -0,0 +1,64 @@ | |||
| """ | |||
| FIXME: I put this file here to test reftest vector generation with the new approach - using existing runners and dumpers system, most likely this should not be merged like this | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| payment = create_builder_pending_payment(spec, i, spec.MIN_ACTIVATION_BALANCE, 1) | ||
| state.builder_pending_payments[i] = payment | ||
|
|
||
| [x for x in run_epoch_processing_with(spec, state, "process_builder_pending_payments")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list comprehension is used just to exhaust the generator, and the result is discarded. A more idiomatic way to do this is to use list() and assign to _ to make it clear the result is intentionally unused.
| [x for x in run_epoch_processing_with(spec, state, "process_builder_pending_payments")] | |
| list(run_epoch_processing_with(spec, state, "process_builder_pending_payments")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces neotrace, a new testing framework for tracing Ethereum consensus spec method calls. The implementation uses wrapt to proxy the spec object and pydantic to define and serialize the trace structure. The core logic for tracing, handling state mutations, and serializing artifacts seems well-designed and is accompanied by a good set of unit tests. The integration with the existing test runner is minimal and clean. I've included some suggestions for code clarity, style, and removal of temporary test files to help finalize this feature. Overall, this is a great addition.
| @@ -0,0 +1,64 @@ | |||
| """ | |||
| FIXME: I put this file here to test reftest vector generation with the new approach - using existing runners and dumpers system, most likely this should not be merged like this | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file includes a FIXME comment indicating it is for testing purposes and should likely not be merged in its current state. Please remove this file before merging the pull request. If it is intended to be a permanent example, it should be moved to a more appropriate location and the FIXME comment should be removed.
| outputs += [ | ||
| # dump trace data for yaml serialization | ||
| ("trace", "data", data.model_dump(mode="json", exclude_none=True)), | ||
| ] + [ | ||
| (name, "ssz", value) | ||
| # ssz artifacts are already serialized and will be compressed by the dumper | ||
| for name, value in data._artifacts.items() | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list concatenation and list comprehension can be slightly simplified by using list.extend for better readability and potentially better performance, as it avoids creating intermediate lists.
outputs.append(("trace", "data", data.model_dump(mode="json", exclude_none=True)))
# ssz artifacts are already serialized and will be compressed by the dumper
outputs.extend(
(name, "ssz", value)
for name, value in data._artifacts.items()
)
tests/infra/trace/README.md
Outdated
| structured trace file. | ||
|
|
||
| The basic idea is make tests more simple and linear and hide the minutae of | ||
| dumping data into the test harness (`@spec_trace` decoratior) and automate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/README.md
Outdated
| dumping data into the test harness (`@spec_trace` decoratior) and automate | ||
| everything that doesn't have to be manual. | ||
|
|
||
| Spec is being wrapped into a transparent proxy object and all method call are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/README.md
Outdated
| In this example, `process_slot` does not return anything but we can see the | ||
| initial state and the final state being dumped automatically and they are | ||
| different. In the other more complex example test (omitted here for brewety) we |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/__init__.py
Outdated
| @@ -0,0 +1 @@ | |||
| from tests.infra.trace.decorator import spec_trace as spec_trace | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/models.py
Outdated
| """ | ||
| Load state step in the execution trace. | ||
| Used when a previously-unseen state is used in spec all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/models.py
Outdated
| """ | ||
|
|
||
| op: Literal["assert_state"] = Field(default="assert_state") | ||
| # literal type + default value looks duplicated but I think required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/models.py
Outdated
| def sanitize_data(cls, v: Any) -> Any: | ||
| # convert raw bytes to 0x-prefixed hex | ||
| if isinstance(v, bytes): | ||
| return f"0x{v.hex()}" | ||
| # coerce primitive types into their raw form | ||
| # (pre-processor just passes them through without coercing) | ||
| if isinstance(v, str): | ||
| return str(v) | ||
| if isinstance(v, int): | ||
| return int(v) | ||
| # recursively clean simple structures | ||
| if isinstance(v, tuple): | ||
| return tuple(cls.sanitize_data(x) for x in v) | ||
| if isinstance(v, list): | ||
| return [cls.sanitize_data(x) for x in v] | ||
| if isinstance(v, dict): | ||
| return {k: cls.sanitize_data(val) for k, val in v.items()} | ||
| return v |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the project uses Python 3.10+, the sanitize_data method can be refactored using a match statement for improved readability and conciseness.
| def sanitize_data(cls, v: Any) -> Any: | |
| # convert raw bytes to 0x-prefixed hex | |
| if isinstance(v, bytes): | |
| return f"0x{v.hex()}" | |
| # coerce primitive types into their raw form | |
| # (pre-processor just passes them through without coercing) | |
| if isinstance(v, str): | |
| return str(v) | |
| if isinstance(v, int): | |
| return int(v) | |
| # recursively clean simple structures | |
| if isinstance(v, tuple): | |
| return tuple(cls.sanitize_data(x) for x in v) | |
| if isinstance(v, list): | |
| return [cls.sanitize_data(x) for x in v] | |
| if isinstance(v, dict): | |
| return {k: cls.sanitize_data(val) for k, val in v.items()} | |
| return v | |
| def sanitize_data(cls, v: Any) -> Any: | |
| match v: | |
| case bytes(): | |
| # convert raw bytes to 0x-prefixed hex | |
| return f"0x{v.hex()}" | |
| case str() | int(): | |
| # coerce primitive types into their raw form | |
| # (pre-processor just passes them through without coercing) | |
| return v | |
| case tuple(): | |
| return tuple(cls.sanitize_data(x) for x in v) | |
| case list(): | |
| return [cls.sanitize_data(x) for x in v] | |
| case dict(): | |
| return {k: cls.sanitize_data(val) for k, val in v.items()} | |
| case _: | |
| return v |
| payment = create_builder_pending_payment(spec, i, spec.MIN_ACTIVATION_BALANCE, 1) | ||
| state.builder_pending_payments[i] = payment | ||
|
|
||
| [x for x in run_epoch_processing_with(spec, state, "process_builder_pending_payments")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a list comprehension for its side effects (to exhaust a generator) is not idiomatic. A simple for loop would be more explicit and readable.
| [x for x in run_epoch_processing_with(spec, state, "process_builder_pending_payments")] | |
| for _ in run_epoch_processing_with(spec, state, "process_builder_pending_payments"): pass |
|
@gemini-code-assist compare with requirements in ethereum#4603 and feedback given in ethereum#4755 - is everything addressed, are there noticeable gaps or differences, is anything implemented in a non-optimal or non-idiomatic way? Be thorough, don't skip any details. |
|
Hello @IvanAnishchuk, thanks for reaching out! I've thoroughly reviewed your pull request, focusing on the new Overall, the Let's break down your questions: Is everything addressed (in the spirit of the linked issues/PRs)?Based on the From the perspective of the stated goals in your Are there noticeable gaps or differences?The
One minor observation, not necessarily a gap, is the reliance on Is anything implemented in a non-optimal or non-idiomatic way?I found the implementation to be largely optimal and idiomatic for Python, especially within the context of the
|
|
@gemini-code-assist Okay, here's a quote of the original issue with requirements (ethereum#4603) please thoroughly review this PR against this:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a new testing framework, neotrace, for creating spec traces. The implementation uses wrapt to proxy the spec object and record method calls, arguments, return values, and state mutations. The core logic is well-structured, using Pydantic models for the trace format and a decorator-based approach for easy integration into existing tests. The changes include the core tracer implementation, unit tests, example tests, documentation, and necessary dependency updates. The overall approach is solid. I've found a few minor issues, mostly related to documentation typos and code style, which I've detailed in the comments.
tests/infra/trace/README.md
Outdated
| dumping data into the test harness (`@spec_trace` decoratior) and automate | ||
| everything that doesn't have to be manual. | ||
|
|
||
| Spec is being wrapped into a transparent proxy object and all method call are | ||
| being tracked including any state mutations before and after. Final state is | ||
| recorded and all relevant artifacts including all states are being saved into | ||
| SSZ artifacts (hash-addressed to avoid duplication). | ||
|
|
||
| ### Usage and example test | ||
|
|
||
| ```python | ||
| from tests.infra.trace import spec_trace | ||
|
|
||
|
|
||
| @with_all_phases | ||
| @spec_state_test # keep these like before | ||
| @spec_trace # this is the thing that makes the magic happen | ||
| def test_linear_sanity_slots_222( | ||
| spec, state | ||
| ): # spec and state can be positional but the name matters | ||
| # just use spec methods, they are traced automagically, and state is dumped | ||
| spec.process_slot(state) | ||
| ``` | ||
|
|
||
| this is for example purposes put into | ||
| `tests/core/pyspec/eth2spec/test/gloas/sanity/test_slots_2.py` and can be run | ||
| with something like | ||
|
|
||
| ``` | ||
| make reftests fork=gloas runner=sanity k=linear_sanity_slots_222 verbose=true | ||
| ``` | ||
|
|
||
| that produces a trace in | ||
| `../consensus-spec-tests/tests/minimal/gloas/sanity/slots_2/pyspec_tests/linear_sanity_slots_222/trace.yaml` | ||
|
|
||
| ### Spec trace file example | ||
|
|
||
| ```yaml | ||
| default_fork: gloas | ||
| trace: | ||
| - {op: load_state, state_root: | ||
| 95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy} | ||
| - op: spec_call | ||
| method: process_slot | ||
| input: {state: | ||
| 95d19311d30804985b06c40cc437bdfbb126209ad9ea8253ba33e0ff0af74c40.ssz_snappy} | ||
| - {op: assert_state, state_root: | ||
| 41f562b491baaa9fdd981973c8aef64bb7c663c4b07f35141c16afc9e11184c1.ssz_snappy} | ||
| ``` | ||
| In this example, `process_slot` does not return anything but we can see the | ||
| initial state and the final state being dumped automatically and they are | ||
| different. In the other more complex example test (omitted here for brewety) we | ||
| can examine how complex inputs and outputs being dumped and how out-of-band | ||
| state mutations are being tracked with assert and load steps. | ||
|
|
||
| A non-normative example of a little more complex inputs and outputs: | ||
|
|
||
| ```yaml | ||
| - op: spec_call | ||
| method: get_current_epoch | ||
| input: {state: | ||
| 0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy} | ||
| assert_output: 2 | ||
| - op: spec_call | ||
| method: get_seed | ||
| input: | ||
| state: | ||
| 0740b3ecc6fb1bdc20c4f2d792da51dc7aaaa506e445ee7ba7ef1dd7ed900443.ssz_snappy | ||
| epoch: 2 | ||
| domain_type: '0x00000000' | ||
| assert_output: '0x79edc6cbb9ffac34477afe87d8569b7afab320241ce69d70c6d0c0a1839379df' | ||
| ``` | ||
|
|
||
| simple primitive types here (e.g. int or bytes, not containers) are serialized | ||
| directly into yaml (even in cases where they subclass SSZ `View` and can | ||
| technically be dumped as ssz artifacts), bytes are dumped as 0x-prefixed hex | ||
| strings which seems appropriate. SSZ artifacts are always referred to by root | ||
| hash-based filename so that there's no need to maintain any mappings or | ||
| filename-generating logic. | ||
|
|
||
| ### Implementation details | ||
|
|
||
| wrapt is used to wrap spec methods and record their calls, parameters and | ||
| results. A decorator is used to set things up. Some simple pydantic models are | ||
| used for the trace file structure and some sanitation/formatting. | ||
|
|
||
| From a consumer standpoint (i.e. test runner) new tests using this decorator | ||
| behave differently and are being detected by a new data type (a pydantic model | ||
| instance). Some logic was added to `execute_test` in | ||
| `tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py` to catch that | ||
| new case and apply new serialization method. | ||
|
|
||
| The way `wrapt.ObjectProxy` operates is that it allows you to create a proxy | ||
| object for e.g. consensus spec module and override things on it without | ||
| affecting any of the underlying logic (unlike monkey-patching). In our case here | ||
| we override all lowercase methods in the spec object by wrapping them in a | ||
| `wrapt` decorator with tracer function. Whenever a state is detected in any of | ||
| the method calls it gets automatically tracked and then it's checked again after | ||
| each method call to check for mutations. Everything is saved in a pydantic model | ||
| object for further dumping using existing reftest tooling. | ||
|
|
||
| ### TODO | ||
|
|
||
| This is still being cooked. | ||
|
|
||
| Integration with test runner is done by yielding pydantic model as a new data | ||
| type and very simple change in the runner. It would be more natural to just | ||
| return the value but that would require supporting non-generator functions as | ||
| tests which would require much more code that seems reasonable here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few typos and grammatical errors in the README which could be corrected to improve clarity.
- On line 8:
decoratiorshould bedecorator. - On line 11:
method call areshould bemethod calls are. - On line 60:
brewetyshould bebrevity. - On line 114:
yielding pydantic modelshould beyielding a pydantic model. - On line 117:
much more code that seems reasonable hereshould bemuch more code than seems reasonable here.
tests/infra/trace/__init__.py
Outdated
| @@ -0,0 +1 @@ | |||
| from tests.infra.trace.decorator import spec_trace as spec_trace | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| payment = create_builder_pending_payment(spec, i, spec.MIN_ACTIVATION_BALANCE, 1) | ||
| state.builder_pending_payments[i] = payment | ||
|
|
||
| [x for x in run_epoch_processing_with(spec, state, "process_builder_pending_payments")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This list comprehension is used only for its side effect of exhausting the generator, but the resulting list is not used. This can be confusing as it looks like a value is being computed and stored. A for loop would be more explicit about the intent and is generally preferred for code that relies on side effects.
for _ in run_epoch_processing_with(spec, state, "process_builder_pending_payments"):
pass|
Hello @IvanAnishchuk, thank you for providing the full context of Overall, the Let's break down the comparison point by point: Goals:
Test writer interface:
Tests consumer interface:
Implementation details and space optimisations:
Example:
Noticeable Gaps or Differences (revisiting previous points with new context):
Is anything implemented in a non-optimal or non-idiomatic way?My assessment remains that the implementation is largely optimal and highly idiomatic for Python within this ecosystem:
In conclusion, the |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a powerful new tracing framework for Ethereum consensus spec tests. The implementation is well-thought-out, leveraging wrapt for transparent proxying and pydantic for structured data modeling. The inclusion of comprehensive unit tests for the tracing mechanism itself is a great practice. My review includes several suggestions to improve code clarity, maintainability, and encapsulation by refactoring duplicated code, improving error messages, and adhering more strictly to object-oriented principles. These changes should make the new framework even more robust and easier for other developers to understand and extend.
| ] + [ | ||
| (name, "ssz", value) | ||
| # ssz artifacts are already serialized and will be compressed by the dumper | ||
| for name, value in data._artifacts.items() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code directly accesses the private _artifacts attribute of the data object (TraceConfig instance). To improve encapsulation and adhere to good object-oriented practices, it's better to expose these artifacts through a public property on the TraceConfig model.
In tests/infra/trace/models.py, please add this property to TraceConfig:
@property
def artifacts(self) -> dict[str, bytes]:
return self._artifactsThen, you can use data.artifacts here.
| for name, value in data._artifacts.items() | |
| for name, value in data.artifacts.items() |
tests/infra/trace/decorator.py
Outdated
| except TypeError as e: | ||
| raise RuntimeError("non-test invocation detected") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RuntimeError message "non-test invocation detected" is misleading. A TypeError from inspect.signature().bind() typically occurs due to a mismatch between the test function's signature and the arguments provided, which is a test definition issue. A more specific error message would be more helpful for debugging.
| except TypeError as e: | |
| raise RuntimeError("non-test invocation detected") from e | |
| except TypeError as e: | |
| raise RuntimeError(f"Failed to bind arguments for test function '{fn.__name__}': {e}") from e |
tests/infra/trace/decorator.py
Outdated
| recorder._finalize_trace() | ||
|
|
||
| # yield data so that runner can pick it up and dump | ||
| yield "trace", "pydantic", recorder._model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decorator directly accesses the private _model attribute of the RecordingSpec instance. To improve encapsulation, you should expose the model via a public property.
In tests/infra/trace/traced_spec.py, please add this property to the RecordingSpec class:
@property
def model(self) -> TraceConfig:
return self._modelThen, you can use recorder.model here.
| yield "trace", "pydantic", recorder._model | |
| yield "trace", "pydantic", recorder.model |
tests/infra/trace/models.py
Outdated
| class LoadStateOp(TraceStepModel): | ||
| """ | ||
| Load state step in the execution trace. | ||
| Used when a previously-unseen state is used in a spec call. | ||
| State root is recorded as 'state_root'. | ||
| """ | ||
|
|
||
| op: Literal["load_state"] = Field(default="load_state") | ||
| state_root: str = Field(pattern=r"^[0-9a-f]{64}$") | ||
|
|
||
| @field_serializer("state_root", mode="plain", when_used="always") | ||
| @classmethod | ||
| def sanitize_data(cls, v: str) -> str: | ||
| # add ssz_snappy suffix (dumper handles the actual compression) | ||
| return f"{v}.ssz_snappy" | ||
|
|
||
|
|
||
| class AssertStateOp(TraceStepModel): | ||
| """ | ||
| Assert state step in the execution trace. | ||
| Auto-added at the end of the trace with the last known state root. | ||
| State root is recorded as 'state_root'. | ||
| """ | ||
|
|
||
| op: Literal["assert_state"] = Field(default="assert_state") | ||
| state_root: str = Field(pattern=r"^[0-9a-f]{64}$") | ||
|
|
||
| @field_serializer("state_root", mode="plain", when_used="always") | ||
| @classmethod | ||
| def sanitize_data(cls, v: str) -> str: | ||
| # add ssz_snappy suffix (dumper handles the actual compression) | ||
| return f"{v}.ssz_snappy" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The LoadStateOp and AssertStateOp models share the same state_root field and sanitize_data serializer. This duplication can be avoided by extracting the common logic into a base class. This will make the code more maintainable and less error-prone.
class StateOp(TraceStepModel):
"""
Abstract base class for operations involving a state root.
"""
state_root: str = Field(pattern=r"^[0-9a-f]{64}$")
@field_serializer("state_root", mode="plain", when_used="always")
@classmethod
def sanitize_data(cls, v: str) -> str:
# add ssz_snappy suffix (dumper handles the actual compression)
return f"{v}.ssz_snappy"
class LoadStateOp(StateOp):
"""
Load state step in the execution trace.
Used when a previously-unseen state is used in a spec call.
State root is recorded as 'state_root'.
"""
op: Literal["load_state"] = Field(default="load_state")
class AssertStateOp(StateOp):
"""
Assert state step in the execution trace.
Auto-added at the end of the trace with the last known state root.
State root is recorded as 'state_root'.
"""
op: Literal["assert_state"] = Field(default="assert_state")
tests/infra/trace/models.py
Outdated
| if isinstance(v, str): | ||
| return str(v) | ||
| if isinstance(v, int): | ||
| return int(v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sanitize_data method contains redundant type casting for str and int. Since isinstance already confirms the type, calling str(v) on a string or int(v) on an integer is unnecessary. Removing these calls will make the code cleaner.
| if isinstance(v, str): | |
| return str(v) | |
| if isinstance(v, int): | |
| return int(v) | |
| if isinstance(v, str): | |
| return v | |
| if isinstance(v, int): | |
| return v |
| # We must re-assign the `state` variable, as `process_slot` | ||
| # is a pure function that returns a new, modified state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is misleading. It states that process_slot is a pure function that returns a new state, but it actually mutates the state object in-place. This should be corrected to avoid confusion for developers using this as an example.
| # We must re-assign the `state` variable, as `process_slot` | |
| # is a pure function that returns a new, modified state. | |
| # The `process_slot` function mutates the state object in-place, | |
| # so there is no need to re-assign the `state` variable. |
Draft a naive approach to trace data dumping. Small fixes and lots of polish. Test improvements.
Add shared base class for Assert/Load state ops. Stricter and more concise typing. Simplify serialization.
b024d8e to
b153b26
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a new tracing framework for spec tests, which is a great addition for improving test simplicity and debugging. The implementation using wrapt and Pydantic models is well-structured. I've found a few minor areas for improvement in the core tracer logic regarding redundant code and unused parameters. I've also suggested some minor wording improvements in the README file to enhance clarity. Overall, this is a solid contribution.
tests/infra/trace/README.md
Outdated
| This is an implementation of #4603 a new testing framework for the Ethereum | ||
| consensus spec tests, based on tracing spec method calls and recording them in a | ||
| structured trace file. | ||
|
|
||
| The basic idea is make tests more simple and linear and hide the minutiae of | ||
| dumping data into the test harness (`@spec_trace` decorator) and automate | ||
| everything that doesn't have to be manual. | ||
|
|
||
| Spec is being wrapped into a transparent proxy object and all method calls are | ||
| being tracked including any state mutations before and after. Final state is | ||
| recorded and all relevant artifacts including all states are being saved into | ||
| SSZ artifacts (hash-addressed to avoid duplication). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This README is very informative! I have a few minor suggestions to improve grammar and clarity for even better readability:
- On line 3: '...Spec test sequence trace, a general test vector format for the Ethereum consensus layer ethereum/consensus-specs#4603 a new testing framework...' could be '...Spec test sequence trace, a general test vector format for the Ethereum consensus layer ethereum/consensus-specs#4603, a new testing framework...'
- On line 7: '...is make tests more simple and linear...' could be '...is to make tests simpler and more linear...'
- On line 11: 'Spec is being wrapped...' could be 'The spec is wrapped...'
- On line 13: 'all relevant artifacts including all states are being saved into SSZ artifacts' -> 'all relevant artifacts, including all states, are saved as SSZ artifacts'
These are just suggestions, feel free to apply them as you see fit.
tests/infra/trace/decorator.py
Outdated
| raise ValueError( | ||
| f"Failed to bind arguments for test function '{fn.__name__}': {e}" | ||
| ) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good practice to raise a more specific error when arguments fail to bind. A TypeError is often raised in this scenario, but wrapping it in a ValueError provides more context about the failure's significance in this specific application. Consider re-raising as a TypeError for better semantic accuracy, as the issue is with the type or number of arguments passed to the function, not their values.
| raise ValueError( | |
| f"Failed to bind arguments for test function '{fn.__name__}': {e}" | |
| ) from e | |
| raise TypeError( | |
| f"Failed to bind arguments for test function '{fn.__name__}': {e}" | |
| ) from e |
tests/infra/trace/traced_spec.py
Outdated
|
|
||
| @wrapt.decorator | ||
| def _spec_call_hook( | ||
| self, wrapped: Callable, instance: "RecordingSpec", args: tuple, kwargs: dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instance parameter is unused and has an incorrect type hint. According to wrapt documentation, instance is the object to which the wrapped function was bound (the original spec object), not RecordingSpec. Since it's unused, you can rename it to _ and type it as Any to signify it's intentionally ignored.
| self, wrapped: Callable, instance: "RecordingSpec", args: tuple, kwargs: dict | |
| self, wrapped: Callable, _: Any, args: tuple, kwargs: dict |
tests/infra/trace/traced_spec.py
Outdated
| if state is None: | ||
| # unnecessary safeguard/type hint | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/traced_spec.py
Outdated
| def _record_step( | ||
| self, | ||
| op: str, | ||
| method: str, | ||
| params: SERIALIZED_KWARGS, | ||
| result: SERIALIZED_ARGS, | ||
| ) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The op parameter is unused in _record_step since SpecCallOp sets the op field with a default value. You can remove it from the function signature. Consequently, you can also remove the op_name variable at line 113 and update the call to _record_step at line 132 in _spec_call_hook to self._record_step(method_name, serial_params, serial_result).
def _record_step(
self,
method: str,
params: SERIALIZED_KWARGS,
result: SERIALIZED_ARGS,
) -> None:
tests/infra/trace/traced_spec.py
Outdated
| if state is None: | ||
| # unnecessary safeguard/type hint | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an excellent new tracing framework, neotrace, for Ethereum consensus spec tests. The use of wrapt for transparent proxying and Pydantic for data modeling is well-executed. The overall implementation is strong, providing a much simpler way to write tests. I've found one critical bug that would cause a runtime error and have also included a few suggestions to improve code readability and documentation clarity. Once the critical issue is addressed, this will be a great addition.
tests/infra/trace/traced_spec.py
Outdated
| serial_result: SERIALIZED_ARGS = self._process_arg(result) | ||
|
|
||
| # D. Record the successful step | ||
| self._record_spec_op_step(method_name, serial_params, serial_result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the method name. You are calling _record_spec_op_step, but the method is defined as _record_spec_call_step on line 164. This will raise an AttributeError at runtime. Please rename the method call to match the definition.
| self._record_spec_op_step(method_name, serial_params, serial_result) | |
| self._record_spec_call_step(method_name, serial_params, serial_result) |
| # new data type for spec traces | ||
| outputs += [ | ||
| # dump trace data for yaml serialization | ||
| ("trace", "data", data.model_dump(mode="json", exclude_none=True)), | ||
| ] + [ | ||
| (name, "ssz", value) | ||
| # ssz artifacts are already serialized and will be compressed by the dumper | ||
| for name, value in data.artifacts.items() | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list concatenation [...] + [...] with += creates an unnecessary intermediate list. For better readability and a minor performance improvement, you could use list.append and list.extend in separate steps.
# new data type for spec traces
outputs.append(("trace", "data", data.model_dump(mode="json", exclude_none=True)))
outputs.extend(
(name, "ssz", value)
# ssz artifacts are already serialized and will be compressed by the dumper
for name, value in data.artifacts.items()
)
tests/infra/trace/README.md
Outdated
| @@ -0,0 +1,133 @@ | |||
| ## Spec trace framework | |||
|
|
|||
| This is an implementation of ethereum/consensus-specs#4603 a new testing | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests/infra/trace/README.md
Outdated
| spec.process_slot(state) | ||
| ``` | ||
|
|
||
| this is for example purposes put into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
5d2bee5 to
37998a6
Compare
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an impressive new tracing framework, neotrace, for testing the Ethereum consensus specs. The framework cleverly uses wrapt to create a proxy around the spec object, intercepting and recording method calls, arguments, and state mutations. Pydantic models are used to define a structured trace format, which is a great choice for ensuring data integrity. The implementation is well-structured, with clear separation of concerns between the decorator, the proxy object, data models, and tests. The addition of unit tests for the tracer itself is also a very good practice.
My review includes a few suggestions to further improve the code. I've pointed out an opportunity to enhance readability and performance in the test runner integration. I also found a minor path inconsistency in the new README.md file. Finally, I've suggested improvements to the type hints in typing.py to more accurately reflect the recursive nature of the data structures, which will enhance maintainability and type safety.
| outputs += [ | ||
| # dump trace data for yaml serialization | ||
| ("trace", "data", data.model_dump(mode="json", exclude_none=True)), | ||
| ] + [ | ||
| (name, "ssz", value) | ||
| # ssz artifacts are already serialized and will be compressed by the dumper | ||
| for name, value in data.artifacts.items() | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list concatenation outputs += [...] + [...] creates two intermediate lists before extending the outputs list. This can be made more readable and slightly more performant by using list.append and list.extend directly.
# dump trace data for yaml serialization
outputs.append(("trace", "data", data.model_dump(mode="json", exclude_none=True)))
# ssz artifacts are already serialized and will be compressed by the dumper
outputs.extend(
(name, "ssz", value)
for name, value in data.artifacts.items()
)
tests/infra/trace/README.md
Outdated
| This is for example purposes put into | ||
| `tests/core/pyspec/eth2spec/test/gloas/sanity/test_slots_2.py` and can be run | ||
| with something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The README mentions that the example test is in tests/core/pyspec/eth2spec/test/gloas/sanity/test_slots_2.py, but in this pull request, the example test test_linear_sanity_slots_222 is located in the newly added file tests/infra/trace/test_example_slots_2.py. Please update the path in the documentation to reflect the correct location of the example test file to avoid confusion.
tests/infra/trace/typing.py
Outdated
| # two recursion levels max | ||
| SERIALIZED_ARGS: TypeAlias = SERIALIZED | list[SERIALIZED] | list[list[SERIALIZED] | SERIALIZED] | ||
| SERIALIZED_KWARGS: TypeAlias = dict[str, SERIALIZED_ARGS] | ||
|
|
||
| # typing aliases for non-serialized values and arguments | ||
| RAW: TypeAlias = View | None # allowed simple argument types (View is wide!) | ||
| RAW_ARGS: TypeAlias = RAW | Sequence[RAW] | Sequence[Sequence[RAW] | RAW] | ||
| RAW_KWARGS: TypeAlias = dict[str, RAW_ARGS] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type aliases SERIALIZED_ARGS and RAW_ARGS are defined to support a limited nesting depth, and the comment for SERIALIZED_ARGS explicitly states "two recursion levels max". However, the functions that process these types are recursive and can handle arbitrary nesting. To improve type safety and accurately reflect the implementation, these aliases should be defined recursively.
| # two recursion levels max | |
| SERIALIZED_ARGS: TypeAlias = SERIALIZED | list[SERIALIZED] | list[list[SERIALIZED] | SERIALIZED] | |
| SERIALIZED_KWARGS: TypeAlias = dict[str, SERIALIZED_ARGS] | |
| # typing aliases for non-serialized values and arguments | |
| RAW: TypeAlias = View | None # allowed simple argument types (View is wide!) | |
| RAW_ARGS: TypeAlias = RAW | Sequence[RAW] | Sequence[Sequence[RAW] | RAW] | |
| RAW_KWARGS: TypeAlias = dict[str, RAW_ARGS] | |
| # Allows for arbitrarily nested lists of serialized data | |
| NestedSerializable: TypeAlias = SERIALIZED | list["NestedSerializable"] | |
| SERIALIZED_ARGS: TypeAlias = NestedSerializable | |
| SERIALIZED_KWARGS: TypeAlias = dict[str, SERIALIZED_ARGS] | |
| # typing aliases for non-serialized values and arguments | |
| RAW: TypeAlias = View | None # allowed simple argument types (View is wide!) | |
| NestedRaw: TypeAlias = RAW | Sequence["NestedRaw"] | |
| RAW_ARGS: TypeAlias = NestedRaw | |
| RAW_KWARGS: TypeAlias = dict[str, RAW_ARGS] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist No, not in this case, implicit recursive types are not compatible with something in the reftest framework and also we can't use py3.12+ type syntax for compatibility reason. And actual arguments we have are never nested beyond two levels - but feel free to suggest how to type annotate given these constraints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the clarification. I understand now that the actual arguments are never nested beyond two levels, and that implicit recursive types or Python 3.12+ syntax are not compatible with your framework.
Given these constraints, the current type definitions for SERIALIZED_ARGS and RAW_ARGS are indeed correct and accurately reflect the maximum nesting depth of two levels. My previous suggestion was based on a misunderstanding of the actual data constraints and compatibility requirements.
Therefore, no changes are needed to these type aliases.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant and well-executed new "spec trace" framework for testing. It leverages wrapt to proxy the spec object, recording method calls, I/O, and state changes into a structured trace file. The implementation is robust, using Pydantic for data modeling, try...finally for guaranteed cleanup, and a clever state tracking mechanism. The new functionality is well-structured across multiple files, including models, the decorator, the core proxy logic, and comprehensive unit tests. My review includes a couple of minor suggestions to improve documentation clarity and correct a type hint in a test mock. Overall, this is an excellent addition to the testing infrastructure.
tests/infra/trace/README.md
Outdated
| spec.process_slot(state) | ||
| ``` | ||
|
|
||
| This is for example purposes can be put into e.g. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Does not modify state | ||
| pass | ||
|
|
||
| def iterate_something(self, state: BeaconState, arg_list1: tuple, arg_list2: list) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The type hint for the return value is None, but the function actually returns a list. This should be corrected to -> list to match the implementation.
| def iterate_something(self, state: BeaconState, arg_list1: tuple, arg_list2: list) -> None: | |
| def iterate_something(self, state: BeaconState, arg_list1: tuple, arg_list2: list) -> list: |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an impressive new tracing framework, neotrace, for the consensus-spec tests. The implementation is well-structured, leveraging wrapt for non-invasive call tracing and pydantic for clear data modeling. The inclusion of comprehensive documentation and dedicated unit tests for the framework itself is commendable and speaks to the quality of the contribution. My review includes a couple of minor suggestions to enhance the readability and maintainability of the core tracer logic. Overall, this is an excellent and well-executed feature.
tests/infra/trace/traced_spec.py
Outdated
| if name in ["finalize_trace", "model"]: | ||
| # avoid wrapping our own methods | ||
| return real_attr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This if block appears to be redundant and can be safely removed. The conditions it checks for are already handled by the preceding if statement on line 90:
- The
modelproperty is not callable, so it's filtered out bynot callable(real_attr). - The
finalize_tracemethod is actually named_finalize_trace, which is filtered out byname.startswith("_").
Removing this block will simplify the __getattr__ method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist valid suggestion, but that particular method should be public and thus does need a special condition here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are absolutely right, IvanAnishchuk! My apologies for the oversight. I incorrectly assumed finalize_trace was a private method (_finalize_trace). Since it is a public method and intended to be unwrapped, the explicit if name in ["finalize_trace", "model"]: condition is indeed necessary to prevent it from being wrapped by the _spec_call_hook decorator. Thank you for clarifying this important detail. The block should remain as is.
tests/infra/trace/traced_spec.py
Outdated
| if (old_root := self._last_state_root) != (new_root := state.hash_tree_root().hex()): | ||
| # Assert last output state (was serialized in capture_post_state) | ||
| if old_root: | ||
| self._model.trace.append(AssertStateOp(state_root=old_root)) | ||
| # Handle out-of-band mutation / add LoadState: | ||
| # note: this is always serialized before so no process_arg | ||
| self._model.trace.append(LoadStateOp(state_root=new_root)) | ||
| self._last_state_root = new_root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The if condition on line 155 is quite dense due to the use of two walrus operators, which can harm readability. Refactoring this block to separate the assignments from the comparison makes the logic clearer and easier to maintain.
| if (old_root := self._last_state_root) != (new_root := state.hash_tree_root().hex()): | |
| # Assert last output state (was serialized in capture_post_state) | |
| if old_root: | |
| self._model.trace.append(AssertStateOp(state_root=old_root)) | |
| # Handle out-of-band mutation / add LoadState: | |
| # note: this is always serialized before so no process_arg | |
| self._model.trace.append(LoadStateOp(state_root=new_root)) | |
| self._last_state_root = new_root | |
| new_root = state.hash_tree_root().hex() | |
| if self._last_state_root != new_root: | |
| # Assert last output state (was serialized in capture_post_state) | |
| if self._last_state_root: | |
| self._model.trace.append(AssertStateOp(state_root=self._last_state_root)) | |
| # Handle out-of-band mutation / add LoadState: | |
| # note: this is always serialized before so no process_arg | |
| self._model.trace.append(LoadStateOp(state_root=new_root)) | |
| self._last_state_root = new_root |
|
/gemini review |
|
@gemini-code-assist Can you provide any examples where handling of such dictionaries would be necessary? Based on actual spec code, please. |
|
Hello @IvanAnishchuk, thank you for this excellent follow-up question regarding the handling of dictionaries within the You're asking for examples where handling of dictionaries would be necessary, particularly in the context of actual spec code. Let's clarify how dictionaries are currently processed and identify a scenario where the current implementation might fall short. Current Dictionary Handling
Example of a Scenario Requiring Dictionary Value HandlingWhile core spec functions often use # Hypothetical spec helper function (not directly from the diff, but plausible)
def process_metadata(self, state: BeaconState, metadata: Dict[str, bytes]) -> None:
"""Processes a dictionary of metadata, where values are byte arrays."""
for key, value in metadata.items():
# Imagine some processing of the byte values here
pass
# ...If a test calls this function: # In a test function decorated with @spec_trace
metadata_payload = {
'block_hash': b'\x11' * 32, # A byte string
'state_root': b'\x22' * 32, # Another byte string
'version': b'\x01' # A short byte string
}
spec.process_metadata(state, metadata=metadata_payload)Here's how the current
This example demonstrates that if a spec function takes a dictionary as an argument, and the values within that dictionary need to be serialized (e.g., RecommendationTo correctly handle dictionaries where their values (or nested values) require serialization, I recommend modifying For instance, # Proposed change to traced_spec.py
def _process_arg(self, arg: RAW_ARGS) -> SERIALIZED_ARGS:
# ... (existing code for Sequence, View, etc.)
# Explicitly handle dictionaries by processing their values
if isinstance(arg, dict):
return {k: self._process_arg(v) for k, v in arg.items()}
# ... (existing code for is_serializable and default return)And similarly, # Proposed change to models.py
def simple_sanitize_data(value: SERIALIZED_ARGS) -> SERIALIZED_ARGS:
# ... (existing code for bytes, list)
# Recursively clean dictionaries
if isinstance(value, dict):
return {k: simple_sanitize_data(v) for k, v in value.items()}
return valueThis would ensure that the contents of dictionaries passed as arguments are properly serialized and included in the trace, maintaining their structure. |
|
@gemini-code-assist |
|
Hello @IvanAnishchuk, thank you for pushing for a more concrete example from the actual spec code. You're right to challenge a hypothetical example. While core spec functions in Since I don't have direct access to the entire Context: Helper Functions and Dynamic DataMany spec modules contain helper functions that perform utility operations, data transformations, or handle configuration. These functions might accept more flexible Python data structures like dictionaries, especially when dealing with dynamic or auxiliary data that doesn't warrant a full SSZ Consider a hypothetical helper function within a spec module, perhaps for processing some form of # Hypothetical helper function within a spec module (e.g., in eth2spec/specs/deneb/helpers.py)
from typing import Dict
from eth2spec.utils.ssz.ssz_typing import Bytes32
def process_dynamic_config(state: BeaconState, config_params: Dict[str, Bytes32 | int]) -> None:
"""Processes dynamic configuration parameters, some of which are byte arrays."""
for key, value in config_params.items():
if isinstance(value, Bytes32):
# Imagine some operation that uses the byte value
print(f"Processing {key}: {value.hex()}")
else:
print(f"Processing {key}: {value}")
# ... further processing
# A test calling this helper function, assuming it's part of the 'spec' object
# (e.g., if it's a method of a class that 'spec' is an instance of, or dynamically added)
# In a test function decorated with @spec_trace
# from eth2spec.utils.ssz.ssz_typing import Bytes32
# config_data = {
# 'genesis_hash': Bytes32(b'\x01' * 32), # A Bytes32 object
# 'chain_id': 1,
# 'validator_root': Bytes32(b'\x02' * 32) # Another Bytes32 object
# }
# spec.process_dynamic_config(state, config_params=config_data)How
|
|
@gemini-code-assist it's not 100% but as far as I know spec methods never use dicts (sometimes they use lists though), so handling this is unnecessary complexity and less precise type annotations to accomodate an edge case that is not there. Let's not do this, at least for now. If you see any good ways to make sure everything fails early and nicely if unpredicted situation actually happens - feel free to suggest. Same for documentation improvements - non-obvious decisions like this should be properly documented in comments. |
|
Hello @IvanAnishchuk, Thank you for the clarification and for pushing for a more precise understanding of dictionary usage within the spec. Your insight that "spec methods never use dicts (sometimes they use lists though)" for arguments requiring complex serialization is crucial. You are absolutely correct that if such dictionaries are not used as arguments where their values need special serialization (like converting However, it's important to highlight the current behavior of Current Behavior and Potential IssueIn if isinstance(arg, Sequence) and not isinstance(arg, str | bytes):
return [self._process_arg(elem) for elem in arg]A Python For example, if Recommendation for Early Failure and DocumentationGiven your confirmation that dictionaries are not expected to be used in this manner, the most robust way to ensure that any deviation from this assumption leads to an immediate and clear error (rather than silent data loss) is to explicitly disallow dictionaries as arguments to
Here's how # Proposed change to traced_spec.py
def _process_arg(self, arg: RAW_ARGS) -> SERIALIZED_ARGS:
"""
Process a potential container or primitive object.
Returns the root hash of container or the original primitive.
The idea is that after passing this we either have a hash or something
that we can assume is serializable.
Note: Dictionaries are currently not expected to be passed as arguments
to spec methods in a way that requires their internal values to be
recursively processed for serialization (e.g., for nested View objects
or bytes needing hex encoding). If a dictionary is encountered,
a TypeError will be raised to prevent silent data loss or incorrect tracing.
"""
# Explicitly disallow dictionaries as arguments to _process_arg for now,
# as their recursive value processing is not implemented and not expected.
if isinstance(arg, dict):
raise TypeError(
f"Dictionary argument found for tracing: {arg}. "
"Recursive processing of dictionary values is not currently supported "
"and not expected in spec method arguments. "
"If this is an intended use case, `_process_arg` needs to be updated."
)
# recursively handle lists and tuples
if isinstance(arg, Sequence) and not isinstance(arg, str | bytes):
return [self._process_arg(elem) for elem in arg]
if is_serializable(arg):
return self._save_artifact(arg)
else:
return argThis approach directly addresses your concerns by:
Let me know if this refined approach aligns with your expectations. |
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a powerful new tracing framework, neotrace, for testing consensus-specs. The implementation is well-designed, leveraging wrapt for non-invasive proxying and Pydantic for robust data modeling and serialization. The separation of concerns into the decorator, the recorder proxy, and the data models is clean. The unit tests for the recorder are thorough and cover important edge cases like state mutation and argument sanitization. The integration with the existing test runner is also handled correctly. I have one minor suggestion to improve a comment for better maintainability. Overall, this is an excellent addition to the testing infrastructure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
tests/infra/trace/models.py(1 hunks)tests/infra/trace/traced_spec.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.14.8)
tests/infra/trace/models.py
20-25: Avoid specifying long messages outside the exception class
(TRY003)
tests/infra/trace/traced_spec.py
70-70: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
70-70: Dynamically typed expressions (typing.Any) are disallowed in wrapped_spec
(ANN401)
77-77: Dynamically typed expressions (typing.Any) are disallowed in __getattr__
(ANN401)
98-98: Unused method argument: instance
(ARG002)
98-98: Dynamically typed expressions (typing.Any) are disallowed in instance
(ANN401)
218-223: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (15)
tests/infra/trace/models.py (6)
17-33: Looks good: Intentional dictionary rejection with clear error messaging.The explicit
TypeErrorfor dictionaries is a deliberate design choice (as clarified in past discussions) to fail fast when unexpected dictionary arguments slip through. The error message is descriptive and actionable.The static analysis hint (TRY003) about long exception messages is a stylistic preference; here, the verbose message aids debugging and documents the constraint, which is appropriate for this internal tracing infrastructure.
36-42: LGTM!Clean abstract base with
extra="forbid"to catch typos in YAML/JSON inputs.
44-50: LGTM!Good extraction of shared
state_rootfield into a base class. The pattern correctly validates a 64-character hex hash with the.ssz_snappysuffix.
52-72: LGTM!Both
LoadStateOpandAssertStateOpcorrectly inherit fromStateOpand useLiteraltypes for the discriminator field. The docstrings are clear and accurate.
74-99: LGTM!The
SpecCallOpmodel is well-structured with appropriate field serializers. Usingwhen_used="json"ensures sanitization occurs during both JSON serialization andmodel_dump(mode="json"), which is typically used for YAML output as well.
101-120: LGTM!The
TraceConfigroot model is well-designed:
extra="forbid"catches unexpected fields (addressed from prior feedback)- Discriminated union on
"op"enables proper polymorphic deserializationPrivateAttrcorrectly excludes_artifactsfrom serializationNote: The
artifactsproperty returns the internal dict directly, allowing external mutation. This appears intentional for the tracing workflow whereRecordingSpecregisters artifacts.tests/infra/trace/traced_spec.py (9)
30-50: LGTM!The
is_serializablehelper clearly defines what qualifies for SSZ serialization:Viewinstances excluding primitives (int,bool) andNone.
77-94: LGTM!The attribute interception logic is sound:
- Public methods defined on
RecordingSpec(likefinalize_trace) are resolved directly, bypassing__getattr__- Only public lowercase callables from the wrapped spec are decorated
- The
type: ignore[misc]is appropriate for thewrapt.ObjectProxyparent call
96-132: LGTM!The hook implementation follows a clear flow (A→E) with proper argument binding, state capture, execution, and recording.
The unused
instanceparameter is required bywrapt.decorator's signature, so leaving it as-is is appropriate. Optionally, you could rename it to_instanceto signal it's intentionally unused.
134-144: LGTM!Clean helper that normalizes positional and keyword arguments into a consistent mapping, with
apply_defaults()ensuring all parameters are captured.
146-161: LGTM!The pre-state capture logic correctly handles:
- State mutations detected via hash comparison
- Auto-insertion of
AssertStateOpfor the previous state before loading new state- Clean early return for non-View arguments
The past feedback about walrus operator usage has been addressed.
163-176: LGTM!Simple step recording that delegates sanitization to
SpecCallOp's field serializers.
191-197: LGTM!Content-addressed artifact storage with clear separation: serialization happens here, compression deferred to the dumper.
199-232: LGTM!The argument processing logic is robust:
- Explicit
TypeErrorfor unexpected dictionaries (intentional per design discussions)- Recursive sequence handling
- Clean serialization of
Viewobjects via_save_artifactThe verbose error message aids debugging for this internal infrastructure.
234-244: LGTM!Clean finalization logic. The
finalize_tracemethod provides a clear public API for closing out the trace.Minor: Consider adding
-> Nonereturn type annotations for consistency with other methods.
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an impressive and well-designed tracing framework (neotrace) for pyspec tests. The use of wrapt for proxying, Pydantic for data modeling, and the overall structure are clean and robust. The accompanying documentation and unit tests are also thorough and helpful. However, I've identified a critical bug in the @spec_trace decorator that prevents the tracing mechanism from working as intended. The fix is straightforward, and once addressed, this will be an excellent addition to the testing infrastructure.
|
/gemini review |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an impressive new tracing framework, neotrace, for Ethereum consensus spec tests. The implementation is well-designed, leveraging wrapt for transparent proxying of spec calls, Pydantic for robust data modeling of traces, and includes comprehensive documentation and unit tests. The integration with the existing test runner is also handled cleanly. My review identified a couple of minor areas for improvement related to confusing comments in the core logic, which could enhance future maintainability. Overall, this is a solid and valuable addition to the testing infrastructure.
tests/infra/trace/traced_spec.py
Outdated
| # A dict is a Mapping, not a Sequence, so this check is necessary | ||
| # to catch it before the Sequence check below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is slightly misleading. A dict is not a Sequence, so it wouldn't be caught by the isinstance(arg, Sequence) check anyway. The check for dict is necessary because dictionaries are explicitly not supported for recursive processing at this stage. The TypeError message explains this well, so the comment could be removed to avoid confusion.
|
/gemini review |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a powerful new tracing framework, neotrace, for creating spec tests. The implementation is well-designed, using wrapt for transparent proxying and Pydantic for structured trace generation, which is a great approach. The code is clean, well-documented in the new README.md, and includes a comprehensive set of unit tests for the core tracing logic. The integration into the existing test runner is also clean and minimal. My feedback includes a couple of minor suggestions to improve clarity and remove redundancy. Overall, this is an excellent contribution.
| if isinstance(value, dict): | ||
| raise TypeError( | ||
| f"Dictionary argument found for tracing: {value}. " | ||
| "Recursive processing of dictionary values is not currently supported " | ||
| "and not expected in spec method arguments. " | ||
| "If this is an intended use case, tracing models and logic need to be updated." | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isinstance(value, dict) check appears to be unreachable given the type hints. The SERIALIZED_ARGS type for value doesn't include dict. The _process_arg function, which prepares the data before it reaches this sanitization step, already has a check to prevent dictionaries. Removing this redundant check will make the code cleaner and easier to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist it's necessary because we don't want to support nesting dicts yet don't want surprising behavior if anything changes in future at any time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for clarifying your intent, IvanAnishchuk. I understand that the isinstance(value, dict) check serves as a valuable defensive mechanism against potential future changes that might introduce dictionaries into simple_sanitize_data, even if current type hints and upstream logic prevent them. Keeping it as an explicit assertion for future robustness is a reasonable approach.
| Simple type checking. | ||
| The current rule is to serialize any View that | ||
| is not a primitive (int or bool). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gemini-code-assist it's not exhaustive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, explicitly listing all primitive types can be hard to maintain and might not be exhaustive in a broader sense. To make the docstring more robust and align directly with the implementation, how about we refer to the PRIMITIVES type alias?
This way, the docstring will always be accurate without needing updates if the definition of PRIMITIVES changes in the future.
| is not a primitive (int or bool). | |
| is not a primitive (as defined by `PRIMITIVES`). |
Typing annotations, comments, edge cases. Fix extra deps, fix styiling in the README.
65b222a to
4158ee7
Compare
|
@gemini-code-assist okay, let's thoroughly analyse this pull request and decide if it's ready for final review and testing |
|
/gemini review |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (2)
tests/infra/trace/test_traced_spec.py (1)
328-341: Edge case test for empty trace is appropriate.The comment at line 333 acknowledges this intentionally tests internal behavior. This is a valid unit test approach to ensure
_capture_pre_stateand_capture_post_statehandleNonegracefully without side effects.tests/infra/trace/traced_spec.py (1)
30-49: is_serializable function has a minor docstring inaccuracy.The docstring states primitives are "int or bool", but the
PRIMITIVEStype alias (from typing.py) likely includesstrandbytesas well, based on the context that these should not be SSZ-serialized as artifacts.🔎 Suggested docstring fix
def is_serializable(value: RAW) -> bool: """ Simple type checking. The current rule is to serialize any View that - is not a primitive (int or bool). + is not a primitive (int, bool, str, or bytes). """
📜 Review details
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
pyproject.toml(1 hunks)tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py(1 hunks)tests/infra/trace/README.md(1 hunks)tests/infra/trace/decorator.py(1 hunks)tests/infra/trace/models.py(1 hunks)tests/infra/trace/test_example_slots_2.py(1 hunks)tests/infra/trace/test_traced_spec.py(1 hunks)tests/infra/trace/traced_spec.py(1 hunks)tests/infra/trace/typing.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py (1)
tests/infra/trace/models.py (1)
artifacts(117-119)
tests/infra/trace/decorator.py (1)
tests/infra/trace/traced_spec.py (3)
RecordingSpec(52-242)finalize_trace(240-242)model(66-68)
tests/infra/trace/test_traced_spec.py (3)
tests/core/pyspec/eth2spec/utils/ssz/ssz_impl.py (2)
serialize(12-13)ssz_serialize(8-9)tests/infra/trace/traced_spec.py (4)
RecordingSpec(52-242)finalize_trace(240-242)_capture_pre_state(146-161)_capture_post_state(178-189)pysetup/spec_builders/base.py (1)
fork(7-8)
🪛 Ruff (0.14.8)
tests/infra/trace/decorator.py
21-21: Dynamically typed expressions (typing.Any) are disallowed in *args
(ANN401)
21-21: Dynamically typed expressions (typing.Any) are disallowed in **kwargs
(ANN401)
27-29: Avoid specifying long messages outside the exception class
(TRY003)
32-34: Avoid specifying long messages outside the exception class
(TRY003)
tests/infra/trace/test_traced_spec.py
25-25: Missing return type annotation for special method __new__
(ANN204)
25-25: Unused static method argument: root
(ARG004)
30-30: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
49-49: Missing return type annotation for special method __new__
(ANN204)
49-49: Unused static method argument: root
(ARG004)
52-52: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
65-65: Missing return type annotation for special method __new__
(ANN204)
65-65: Unused static method argument: root
(ARG004)
68-68: Missing return type annotation for special method __init__
Add return type annotation: None
(ANN204)
85-85: Unused method argument: slot
(ARG002)
95-95: Unused method argument: state
(ARG002)
99-99: Unused method argument: state
(ARG002)
tests/infra/trace/models.py
19-24: Avoid specifying long messages outside the exception class
(TRY003)
tests/infra/trace/typing.py
1-1: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
tests/infra/trace/traced_spec.py
70-70: Dynamically typed expressions (typing.Any) are disallowed in wrapped_spec
(ANN401)
77-77: Dynamically typed expressions (typing.Any) are disallowed in __getattr__
(ANN401)
98-98: Unused method argument: instance
(ARG002)
98-98: Dynamically typed expressions (typing.Any) are disallowed in instance
(ANN401)
216-221: Avoid specifying long messages outside the exception class
(TRY003)
🔇 Additional comments (24)
pyproject.toml (1)
36-36: LGTM!The
wraptdependency is correctly placed in the test optional dependencies, which aligns with its usage solely in the test infrastructure (tests/infra/trace/). Version pinning ensures reproducible builds.tests/infra/trace/README.md (1)
1-134: LGTM!The documentation is comprehensive and well-structured, covering the framework's purpose, usage examples, trace file format, and implementation details. The code examples clearly demonstrate how to use the
@spec_tracedecorator, and the TODO section appropriately tracks remaining work.tests/infra/trace/typing.py (1)
8-23: LGTM!The type aliases are well-structured for the tracing system. Using
TypeAliasTypefor recursive definitions (SERIALIZED_ARGS,RAW_ARGS) ensures pydantic schema compatibility. TheSTATETypeVar bound toViewprovides proper type safety for state representations.tests/infra/trace/decorator.py (1)
9-51: LGTM!The decorator implementation is well-structured:
- Proper argument binding and validation with clear error messages
- Correct injection of the
RecordingSpecproxy viabound_args.arguments- The
bound_args.argsandbound_args.kwargsproperties are computed frombound_args.arguments, so the modifiedspecis correctly passed to the test function- Finalization happens after test execution, and trace data is yielded for the runner
The static analysis hints about
Anyin*args/**kwargsand long exception messages are acceptable in this decorator context where flexible argument handling and clear debugging information are priorities.tests/infra/trace/test_example_slots_2.py (3)
15-28: LGTM!Clean example test demonstrating the simplest tracing use case. The decorator ordering (
@with_all_phases,@spec_state_test,@spec_trace) is correct, and the comment accurately describes thatprocess_slotmutates state in-place.
30-43: LGTM!Well-structured helper function with sensible defaults. The nested
BuilderPendingWithdrawalconstruction is clean and readable.
46-66: LGTM!Good integration test demonstrating more complex tracing scenarios including epoch advancement, state mutations via
builder_pending_payments, epoch processing, and diverse input/output types. The comments clarify the intentional design choices.tests/infra/trace/test_traced_spec.py (5)
20-42: Mock classes are well-designed for testing.The mock classes correctly override
hash_tree_root()to return controlled test values while inheriting fromContainerto satisfy type checks. The__new__override to interceptrootbeforeContainer.__new__is a valid pattern for these test mocks.The static analysis warnings about unused
rootparameter and missing return type annotations are false positives in this test context—rootis used to setself._rootin__init__, and strict typing on test mocks is unnecessary.
80-107: MockSpec provides good test coverage for different spec method patterns.The mock methods cover key scenarios: state-mutating operations (
tick), no-op operations (no_op), operations returning primitives (get_current_epoch), bytes (get_root), and SSZ objects (get_block_root). The unused argument warnings from static analysis are expected for mock implementations.
125-166: Thorough test for basic function call recording.This test verifies the complete flow: implicit
load_stateinjection,spec_callrecording with correct method/output, and auto-injectedassert_stateon finalize. The assertions on both raw model attributes andmodel_dump(mode="json")output ensure both internal state and serialization work correctly.
169-227: Sanitization tests correctly verify dual representation.The tests appropriately verify that raw values (bytes, tuples) are stored as-is in the model, while
model_dump(mode="json")produces the sanitized output (hex strings, lists). This validates the field serializer behavior inSpecCallOp.
230-294: Excellent coverage of state tracking edge cases.The test correctly verifies:
- Initial state registration with
load_state- No redundant
load_statewhen state is unchanged- Detection of out-of-band mutations triggering
assert_state+load_stateThis is critical for ensuring the trace accurately reflects state transitions.
tests/infra/trace/models.py (4)
17-32: Sanitization function is well-designed with clear constraints.The explicit
TypeErrorfor dictionaries is a good defensive measure, as past discussions confirm dictionaries are not expected in spec method arguments. The recursive list handling and bytes-to-hex conversion cover the necessary cases.The static analysis hint about long exception messages (TRY003) is a style preference—the detailed message aids debugging and is acceptable here.
43-70: StateOp base class and subclasses are well-structured.The extraction of common
state_rootfield intoStateOpbase class (as suggested in past review) reduces duplication. The regex pattern^[0-9a-f]{64}\.ssz_snappy$correctly validates the full artifact filename format, ensuring consistency with howstate_rootis now set intraced_spec.py.
73-97: SpecCallOp field serializers correctly handle the dual representation.The
when_used="json"parameter ensures raw values (bytes, etc.) are preserved in the model for internal use, while JSON serialization appliessimple_sanitize_data. Thesanitize_kwargsmethod correctly handles the top-levelinputdict without recursive dict processing—consistent with the design decision that dicts are only expected at the top level (kwargs).
100-118: TraceConfig is well-designed as the root schema.The discriminated union via
Annotated[..., Field(discriminator="op")]enables proper deserialization of different trace step types. ThePrivateAttrfor_artifactscorrectly keeps binary artifact data out of the YAML/JSON serialization while exposing it via theartifactsproperty for the dumper to write SSZ files.tests/infra/trace/traced_spec.py (6)
65-75: Initialization is clean and appropriate.The use of
Anyforwrapped_specis justified given the dynamically-generated nature of spec modules. The_last_state_rootinitialization toNonecorrectly represents "no state tracked yet".
77-94: Attribute interception logic is well-designed.The filtering criteria (public, lowercase, callable) correctly targets spec methods like
process_slots,get_current_epochwhile excluding types (BeaconState), constants, and internal methods. The lazy wrapping approach is efficient.
96-132: The spec call hook implements the correct interception flow.The execution order is important and correct:
- Bind and process arguments (registers input artifacts)
- Capture pre-state (handles state loading)
- Execute the actual function
- Process result (registers output artifacts)
- Record the step
- Capture post-state (handles state mutations)
The unused
instanceparameter is required by thewrapt.decoratorsignature and cannot be removed.
146-161: Pre-state capture correctly handles state transitions.The logic properly:
- Skips non-View states
- Asserts the previous state if one was tracked (for out-of-band mutation detection)
- Loads the new state
- Updates the tracker
The
f"{new_root}.ssz_snappy"format matches theStateOp.state_rootpattern validation.
199-230: Argument processing correctly implements the serialization rules.The method:
- Explicitly rejects dicts (per design decision validated in past reviews)
- Recursively processes sequences (excluding str/bytes which are primitives)
- Serializes View objects as artifacts
- Passes through primitives unchanged
The explicit
TypeErrorfor dicts is a good fail-fast approach to catch unexpected usage patterns.
232-242: Finalization correctly handles the final state assertion.The
_record_auto_assert_stepmethod correctly guards against the case where no state was ever tracked (_last_state_rootisNone). The publicfinalize_tracemethod provides a clean API for test teardown.tests/core/pyspec/eth2spec/gen_helpers/gen_base/gen_runner.py (2)
101-102: The comment and code are correct; no changes needed.The review's concern that "artifacts can contain unserializedSSZ objects" is unfounded. The type annotation in
models.pyexplicitly defines_artifacts: dict[str, bytes], and all code paths that populate artifacts usessz_serialize(), which returns bytes. Thedumper.dump_ssz()method correctly expects bytes and directly compresses them. The existing comment accurately describes the actual implementation.
96-103: The"pydantic"branch usesdump_data(), notdump_trace()—correct the review comment.The new
"pydantic"branch at lines 96-103 appends output withkind="data"(line 98), which causes the code to calldumper.dump_data()at line 116, notdumper.dump_trace(). The Dumper class has methodsdump_meta(),dump_cfg(),dump_data(),dump_ssz(), anddump_manifest(), but nodump_trace()method.However, the type safety concern remains valid:
datais typed asAny(line 93), so calls todata.model_dump()anddata.artifactslack runtime validation and will fail withAttributeErrorif the object lacks the expected Pydantic interface.Likely an incorrect or invalid review comment.
| @@ -0,0 +1,23 @@ | |||
| from collections.abc import Sequence # noqa: F401 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick | 🔵 Trivial
Remove unnecessary noqa directive.
The Sequence import is actually used (in the TypeAliasType string on line 20, which is evaluated at runtime). Since Ruff indicates F401 is not enabled, the noqa directive is unnecessary.
🔎 Suggested fix
-from collections.abc import Sequence # noqa: F401
+from collections.abc import Sequence📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| from collections.abc import Sequence # noqa: F401 | |
| from collections.abc import Sequence |
🧰 Tools
🪛 Ruff (0.14.8)
1-1: Unused noqa directive (non-enabled: F401)
Remove unused noqa directive
(RUF100)
🤖 Prompt for AI Agents
In tests/infra/trace/typing.py around line 1, the import "from collections.abc
import Sequence # noqa: F401" includes an unnecessary "noqa: F401" because
Sequence is actually used at runtime; remove the trailing "# noqa: F401" so the
line becomes a normal import without the noqa directive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai no, I don't think so, there was a tool that warned about this - I'd rather keep the noqa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IvanAnishchuk Understood, I'll leave it as-is. If you prefer to keep the noqa directive to prevent potential warnings from other tools or configurations, that's perfectly reasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces an impressive new tracing framework, neotrace, for generating Ethereum consensus spec tests. The implementation is well-engineered, leveraging wrapt for transparent proxying and pydantic for structured trace modeling. The code is clean, well-documented, and thoroughly tested with both unit tests for the core logic and example tests demonstrating its usage. The changes to the existing test runner are minimal and cleanly integrated. I have one minor suggestion to improve the wording in the new README file.
|
|
||
| wrapt is used to wrap spec methods and record their calls, parameters and | ||
| results. A decorator is used to set things up. Some simple pydantic models are | ||
| used for the trace file structure and some sanitation/formatting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A minor wording suggestion: 'sanitization' is the more common term in software engineering for cleaning or filtering data, whereas 'sanitation' typically refers to public health. Using 'sanitization' would be more precise here.
| used for the trace file structure and some sanitation/formatting. | |
| used for the trace file structure and some sanitization/formatting. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new tracing framework for tests, which is a significant and well-implemented feature. The use of wrapt for proxying and pydantic for data modeling is robust. The code is well-structured, documented, and comes with a comprehensive set of tests. I have one suggestion to improve maintainability by removing a piece of duplicated code. Overall, this is a high-quality contribution.
| if isinstance(value, dict): | ||
| raise TypeError( | ||
| f"Dictionary argument found for tracing: {value}. " | ||
| "Recursive processing of dictionary values is not currently supported " | ||
| "and not expected in spec method arguments. " | ||
| "If this is an intended use case, tracing models and logic need to be updated." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing things
Summary by CodeRabbit
New Features
Documentation
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.