Allow creating separate execution-only and storage proofs #28

jsign · 2025-10-27T20:47:39Z

Note: I doubt we might merge this PR, and might get seen mostly as an experiment to run some benchmarks without merging into our fork until we fully process all I discovered here

Currently, Reth has a stateless_validation function, which we use as the STF to generate L1 proofs. This method proves:

The provided witness with state data is correct via a MPT proof.
The provided sparse merkle trie used the database interface implementation to execute the block.

Said differently, the proof proves both block execution and storage.

This PR adds the capability of proving things independently:

It adds a stateless_validation_with_flatdb which, instead of a usual ExecutionWitness it receives a FlatExecutionWitness. This FlatExecutionWitness instead of having a list of RLP encoded MPT nodes to rebuild a sparse MPT, it has a “flat database” i.e., ~a hashmap of accounts and storage trie values directly (no MPT nodes). This flat database is used for doing only block execution.
It adds a stateless_validation_flatdb_storage_check which receives:
1. The usual ExecutionWitness, which has the sparse MPT nodes
2. The flat db used for (the separate) execution proof
3. The post-state resulted from (the separate) execution proof

The idea of 1. is that the proof of this STF does not involve any MPT deserialization nor lots of keccak-ing verifying it. It goes directly to the block validation logic using a flat database (thus accessing state should be faster than using the underlying sparse MPT).

The idea of 2. is that we check multiple things:

The provided ExecutionWitness is correct. As in, this is our cryptographically verified source of truth for state since we verify the pre-state “as usual”.
We check that the provided flat database entries check against the verified sparse MPT. As in, the database we used for the execution-only proof was indeed correct state.
We also use the provided post-state to calculate the post-state root, verifying that if the execution-only run with the flat db pre-state resulted in the provided post-state then the block was valid (since the result of block execution updates in the MPT checks the block claimed new state root).

The EEST fixture runs today are run in stateful and stateless mode (usual execution + storage). But I also know added verifying the two new styles of runs passing the tests: i) execution-only, and ii) storage only. Said differently, if we do the execution-only STF, and use the pre-state+generated-post-state into the storage-only STF, then the pre-state validation against the sparse MPT plus the post-state root must check. (Basically is a whole consitency check — I’ll point this in a PR comment where can be seen more clearly). All EEST tests pass (this wasn't easy, failed ones actually helped to discover many rabbit holes)

Also, a new RPC is created, analogous to debug_executionWitnesss to get the new type of witness FlatExecutionWitness. For now this RPC is meant to have the witness for the execution-only proof, but prob we’ll need to make it return both witnesses since that will be needed for the “storage proof”.

This is the TL;DR of the PR — but there’re other details to mention, which I’ll dive a bit deeper in PR comments

Signed-off-by: Ignacio Hagopian <[email protected]>

jsign · 2025-10-27T22:28:34Z

crates/chain-state/src/in_memory.rs

            Ok(Vec::default())
        }
+
+        fn flat_witness(&self, _record: FlatWitnessRecord) -> ProviderResult<FlatPreState> {


flat_witness is a new method added to the StateProofProvider. Analogous to the existing witness one. I'll ignore commenting on mock/test-utils impl of StateProofProvider, and only in the relevant ones.

jsign · 2025-10-27T22:36:00Z

crates/evm/execution-types/src/witness.rs

+/// Records pre-state data for witness generation.
+#[derive(Debug, Clone, Default)]
+pub struct FlatPreState {
+    /// Accounts accessed during execution.
+    pub accounts: HashMap<Address, DbAccount>,
+    /// Bytecode accessed during execution.
+    pub contracts: HashMap<B256, Bytecode>,
+    /// The set of addresses that have been self-destructed in the execution.
+    pub destructed_addresses: HashSet<Address>,
+}


This is basically the "state witness" used for the execution-only STF. It is the most important field of the new FlatExecutionWitness.

The need for destructed_addresses is a rabbit hole that I'll explain later. But for now feel safe to not pollute understanding.

jsign · 2025-10-27T22:38:24Z

crates/evm/execution-types/src/witness.rs

+
+/// Records pre-state accesses that occurred during execution.
+#[derive(Debug, Clone, Default)]
+pub struct FlatWitnessRecord {


The FlatWitnessRecord is analogous to WitnessRecord.
You can refresh how this worked for WitnessRecord in the blockchain_test.rs file, but the TL:DR is that when we execute a block, we can pass a closure that returns a State<DB> that we can use to capture which state was accessed.

I'll create a comment in that file later in the review.

jsign · 2025-10-27T22:51:53Z

crates/evm/execution-types/src/witness.rs

+pub enum AccessedAccount {
+    /// Indicates if the account was destroyed during execution.
+    Destroyed,
+    /// Storage keys accessed during execution.
+    StorageKeys(HashSet<U256>),
+}


This is a good time to explain a quirk about how things work today in Reth regarding witness generation.

When the State<DB> is passed in the closure after the block execution. State<DB> basically has the in-memory cache of all the state accessed/written during the execution.

The way this is used for the usual ExecutionWitness is the same as for FlatExecutionWitness, but there's a catch.

If an account was destroyed during execution, State<DB> set the account as None, since it was destroyed. A priori that is okay to give that signal, but unfortunately that means that it wiped out all the storage slots captured during the execution.

This means that for destroyed accounts, we lost information on which storage slots were accessed before the account was SELFDESTROYed.

This has somewhat nasty implications for witness generation, since now we can't know which storage slots for this contract we must put in the witness pre-state.

When I realized this through a failing EEST test, I was surprised since wasn't obvious to me how to solve it. But.. in theory this should be already solved in the ExecutionWitness, so... how is this solved today then?

The answer is that Reth includes all storage slots of destroyed accounts. If that feels surprising, see here.

In that link, we see the get_proof_targets() used when building the MPT proof for the usual ExecutionWitness. For wiped (i.e., destroyed) storage tries, it includes all storage slots! And not only the storage slots used during execution, which means that probably the witness might be bloated.

Now, here's an important practical detail why this isn't a problem today:

Pre-cancun, SELFDESTROY indeed deletes the storage trie of a contract, thus situation can arise (i.e., witness bloating due to including all storage slots)

Post-cancun, there was an EIP that if you SELFDESTROY an account that exists, it won't destroy the storage slots anymore.

This means that for post-Cancun blocks, the wiped flag won't happen since the storage trie isn't really destroyed thus the bloating won't happen.

"In short", this "include all storage slots" is a bloating that can only happen for pre-Cancun blocks. While doesn't sound like an "urgent" problem, I think will be a problem whenever we want to standarize the witness generation (if the standard also covers older forks)... since clearly adding more storage slots as needed isn't required.

Sorry for the length, but I wanted to explain this correctly. This situation of SELFDESTRUCT having to add all the storage slots, means a decent amount of complexity in the Provider traits implementations, that we'll see later... but after you understand the problem, you can understand the "why" of those.

I've been thinking if there's a reason why they though would be needed to include all the slots, but the only reason I can think of is to workaround the problem that State<DB> lost information of accessed storage slots in the case the account is destroyed. For destroying a storage trie, we don't really need all the values since the account will be nuked from the account trie, so we don't need to know their values. I think at some point would be nice to ask Roman or someone else about this.

To be honest, I think for correct witness generation, ideally we want the State<DB> to properly register the pre-state more cleanly, insetad of only registering the addresses and storage slots (since State contains post-state values) and having to use Providers after to get the pre-state.

For this PR, I didn't want to change revm, etc -- but I think doing proper pre-state capturing in revm is probably the best thing to do in the long run. But before doing that, probably is worth discussing with them.

jsign · 2025-10-27T22:57:37Z

crates/evm/execution-types/src/witness.rs

+    pub fn record_executed_state<DB>(&mut self, statedb: &State<DB>) {
+        self.contracts = statedb
+            .cache
+            .contracts


Here's another thing that I believe is a problem today (both for FlatExecutionWitness and to the existing ExecutionWitness).

As I mentioned before, statedb.cache contains all hte state that was read/written during block execution. As said before, that's nice since we can know which accounts and storage slots where touched so we can get the pre-state.

But I think that statedb.cache.contracts also has newly created contracts, which we shouldn't include in the witness.

If that sounds suprising, see here (this is master code). I honestly, don't know why new contracts needs to be in the witness. New bytecode will be generated during the re-execution, so there's no need to include it.

In any case, even if we remove that .chain(...), I checked and new contracts will also be in statedb.cache.contracts (i.e., revm only goes to the database if doesn't have any contract bytecode cached there, which also includes new contracts (which makes sense)).

So... this also means that unless I'm not mistaken, there is probably extra bloating today of bytecodes in the execution witness. I'm thinking of creating some EEST test to triple-check this, but I feel confident about it (plus, that last link comment I think sounds wrong).

jsign · 2025-10-28T00:04:13Z