feat(tests): Initial implementation for generating client test vectors#76
feat(tests): Initial implementation for generating client test vectors#76tcoratger merged 18 commits intoleanEthereum:mainfrom
Conversation
0bc0f7f to
c1f3942
Compare
tcoratger
left a comment
There was a problem hiding this comment.
Maybe we should put in the CLAUDE.md the description of the PR you made or something similar so that our Claude agents know how to deal with the current way of doing tests?
packages/tests/src/lean_spec_tests/test_fixtures/fork_choice.py
Outdated
Show resolved
Hide resolved
packages/tests/src/lean_spec_tests/test_fixtures/fork_choice.py
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,85 @@ | |||
| """Fork choice head selection tests for the devnet fork.""" | |||
There was a problem hiding this comment.
I love this way of doing tests here, this is much simpler than before and we will be able in the future to drastically simplify the way tests are handled in the repo by just having this way of doing everywhere.
c1f3942 to
92134ce
Compare
|
Looks really good to me! Just one question for now. I see that the I think ideally we probably want to full data structure as the
|
Thanks for the feedback. Yeah this is not by any means some final draft. I'd love for client teams to chime in on the design as I took a very general approach here. How this currently works is we define in the test things for that test that we expect to be true. For example, here, I only defined that for this particular test I want to make absolutely sure that this is in the post state. What this then does is it validates the output from the spec, makes sure the spec has these expected values, and then we pass these expected values of interest to the vector. Of course, this test is not really caring about anything important and is just to serve as an example. What we could do is validate the defined values and still always pass through the entire We should definitely define what's best for implementers / testing as well as consumers here 👍🏼 |
And yes I agree this perhaps is not ideal currently because this looks into the |
476ed0e to
b2f113f
Compare
Mhhhh I would say that the approach you currently have is better no because if we have a lot of tests and the full post state for each of them it can become very cumbersome right? What we want here is to test only a specific portion of the state for each test and if we want we can have also a couple of full integration tests where we test everything for specific case but this seems to be overkill for most of small unit tests dedicated to testing only one thing right? |
b2f113f to
d3a9abf
Compare
|
rebased to fix conflicts 👀 |
|
We are in the middle of merging the |
90e7ed9 to
25d3c52
Compare
@tcoratger yes this is how it is set up now and we can define any number of things we care about for the test and it will be in the vector. The other approach is that what we define in the test we verify while filling (so we verify the spec is returning the expected values) but we can proceed to fill the post-state with everything the spec returns and so this post state is always there in full (some hybrid version of what exists here). I think we should decide together what is best here and this can always be tweaked and changed after this PR as well. Please take a look at this commit in particular as I tweaked a few things with consensus specs as inspiration just so I could actually drive the chain and fill multiple blocks for the test examples as it wasn't progressing without this. If we want to revert these changes or tweak these while still in devnet or not I am not certain as this is beyond my scope 😅. I believe these are the only spec changes I made here so worth giving that particular attention. The other changes in the specs side were serialization-only changes so we can serialize appropriately in the vectors so please take a look as well. |
|
@fselmo In my opinion, the current way in which this PR treats things is perfect for what we want right now.
|
|
@fselmo We just merged another big PR hence the merge conflict. Could you help rebase? |
51dedef to
65d52ed
Compare
Done. Cleaned up the implementation a bit too. |
|
I'm getting this error when running uv sync: |
|
@unnawut I renamed some things so your old ^ from |
Thanks for the suggestion! Got the same error after removing |
fwiw I know what happened here. If anyone tested the changes before any renaming, you likely had a |
- forkchoice was not advancing chain with blocks, made some minor changes as compared to consensus-specs that may not be correct but help show how to write forkchoice tests for vector generation.
- Rename the package to ``lean-ethereum-testing``
- Rename `uv` workspace package to ``packages/testing``
- Refactor the package into a generalized framework component and
``consensus_testing`` module that focuses on testing the consensus
protocol. This opens up the possibility of adding execution-specific
testing in the future.
- Use `tests/consensus/{fork}` as consensus tests that generate vectors.
- Uses a ``--layer`` flag for ``fill`` that defaults to ``consensus`` scope
we don't even have to think about this separation for now.
b41e983 to
3ad914f
Compare
|
Rebased again. I also added a CI run that fills all tests with python 3.14. Right now it's just the example tests I added to illustrate the basic flow but as we tweak this and add actual tests, this will likely be a good sanity check that nothing is breaking. |
- Reconcile differences with ``main`` after rebase. - Clean up the implementation of the test fixtures + remove clunky builder for now.
3ad914f to
81090ef
Compare
syjn99
left a comment
There was a problem hiding this comment.
Looked over the weekend, thanks for your hard work!
jihoonsong
left a comment
There was a problem hiding this comment.
Generally looks good. I have minor comments and questions.
packages/testing/src/consensus_testing/test_types/step_types.py
Outdated
Show resolved
Hide resolved
- Remove comment that snuck into spec docstring
- Remove comment that snuck into spec docstring - Remove ``Union`` for ``BlockSpec | Block`` for fixtures. Instead, use ``@field_serializer`` to serialize the filled block as the ``Block`` in the fixture. - Add a TODO to configure appropriate default values for validator number and pubkeys for fixtures.
a839dff to
457a1da
Compare
| # Initialize Store from anchor | ||
| store = Store.get_forkchoice_store( | ||
| state=self.anchor_state, | ||
| anchor_block=self.anchor_block, |
There was a problem hiding this comment.
actually we don't need block, because the state here one to one corresponds with the block unlike beacon where state could have been rolled to the checkpoint because of missing slots towards the end
g11tech
left a comment
There was a problem hiding this comment.
post merge review, looks amazing ❤️

🗒️ Description
I have ported over some ideas from
execution-spec-testsfor vector filling. It may not be perfectly cleaned up yet but I wanted to put this out so we can start discussing the format for the vectors from implementers' sides and think about how you want to consume these.If you're not familiar with
execution-spec-tests, I will try to summarize some of the high-level ideas behind the approach here. Coincidentally,execution-spec-testsis in the middle of merging intoexecution-specsas a monorepo this week. I've been waiting a bit to see the final state of the monorepo organization so we can mimic this here as I think it's a good idea to try to keep things as familiar as possible for developers.Workspace with Test Package
The approach I took here is the same as we did for execution repos. I have set up
leanSpecas auvworkspace with apackages/directory where thetestsframework lives. Inside thispackages/, for now, only exists this testing framework and this is considered a separate package altogether, though they share the same workspace. In here you'll find it has its own setup withpyproject.toml, etc, though the main linting, spellcheck, and all these things are configured at the root level of the repository (leanSpec/pyproject.toml). All thetoxcommands have also been updated to run these checks across the entirety of the workspace, which makes things quite maintainable.Filling Test Vectors
I defined a basic set of fixtures for now. I think it would be quite nice to find the fewest entry points that we can test the consensus layer from, as the public API entry points, and write all tests from these as test fixtures / formats. What I mean is, we currently have
blockchain_test,state_test,benchmark_testin execution specs. If we could somehow get some small subset of test fixtures for consensus, this could be ideal from an integration standpoint. If this is not possible, however, we can create as many fixture types as we need to but this is a chance to redesign the approach we are taking so we should think on this. If a minimal subset can be used, an added benefit is that we can have a more "integration" type of test where we're not only checking logic in one place but from the entry point and across all relevant logic.The test fixtures I defined for now with some influence from
consensus-specsvector designs:state_transition_test: Test via the STF as the entry pointfork_choice_test: Tests that use fork choice as the test logicWe have a new command:
fill: This is the command to fill test vectors from pytest tests that are written inside oftests/spec_tests--clean: This flag forces you to specify that you want to write over the currentfixtures/so that you don't accidentally remove already-filled tests you want to keep--fork: Specifies the relevant fork to fill for, right now onlydevnetmakes sense--output: Can be specified to fill fixtures to an output directory that isn'tfixtures/, so you can compare two different sets if you want or fill them somewhere else entirely.Please use this as a discussion as this is not by any means attempting to finalize the design for the test vectors.
Note: This commit (6c892e4) makes some assumptions that change behavior in the specs but I leaned into some changes that allowed the fork choice to move the chain so I could properly test some things as the
fork_choice_testwas not working without these.✅ Checklist
toxchecks to avoid unnecessary CI fails:uvx --with=tox-uv tox