refactor: crash tolerant PXE #19293

mverzilli · 2026-01-02T20:12:21Z

Overview

Refactor PXE's stores so they work with "staged writes": every write to a store is now kept in memory segmented by a jobId, and is not written to the underlying KV store.

Relevant stores implement a new StagedStore interface, with defines the following methods:

commit(jobId): when called, moves all the in-data memory corresponding to jobId to the persistent KV store.
discardStaged(jobId): clears up any in-memory data structures associated to jobId without persisting.

Read operations can optionally receive a jobId, which affects behavior as follows:

If not provided (or undefined): read from KV store (aka "read committed")
If provided: read committed + staged data associated to the jobId (how both sources of data are unified is store-dependent).

A new JobCoordinator class exposes the following methods for PXE's convenience:

registerStores(stagedStores: StagedStore[]): makes a collection of stores known to the JobCoordinator.
beginJob(): string: called by PXE when a job starts, returns a jobId that then gets threaded through the job's phases.
commitJob(jobId): iterates over all registered stores, calling commit(jobIbd) and wrapped by a transactionAsync call to guarantee that all writes happen in the same KV transaction.
abortJob(jobId): same as commitJob, but calling discard.

As a result, any data operations done before PXE decides to commitJob are discarded if PXE fails, process is killed, etc.

"Waiter, there's a jobId in my signature!"

Perhaps a not so nice consequence of this change, is that many methods now expect a jobId which makes some tests a bit more cumbersome to write (particularly the CapsuleStore suite). I chose to nevertheless make jobId mandatory because I prefer not to open the door to inadvertent misusage (imagine forgetting to pass a param and as a result having writes leak to other jobs when you thought you were cozyed up in an isolated transactional context).

(Partially) Free riding volatile arrays

In F-136 we want to introduce volatile arrays. Incidentally, this PR makes all writes in memory by default, including capsules. Which is not exactly F-136, but if we reach the end of the job having consumed all capsules written during it, upon commit there will be nothing to save, so it will look an awful lot like what a volatile array would do.

Lukewarm refactor of NoteStore

Why lukewarm? I didn't want to re-think the indexes because this PR is about trying to make what we had crash-tolerant. At the same time, making NoteStore mix and mash db with in-memory data was a bit of a headache with the state the code was in so I did need to move around some code for my sanity. But it's not clear whether we really need so many indexes in this store (and I'm explicitly leaving that as future work)

…tBlock

I decided to fragment #19293 into a smaller, more digestible (both for reviewers and for myself) series of PRs. The end goal is to refactor PXE's stores so they work with "staged writes": every write to a store is now kept in memory segmented by a `jobId`, and is not written to the underlying KV store until a coordinated commit. Relevant stores will (in subsequent PRs) implement a new `StagedStore` interface, which defines the following methods: - `commit(jobId)`: when called, moves all the in-data memory corresponding to `jobId` to the persistent KV store. - `discardStaged(jobId)`: clears up any in-memory data structures associated to `jobId` without persisting. Read operations can optionally receive a `jobId`, which affects behavior as follows: - If not provided (or undefined): read from KV store ("read committed") - If provided: read committed + staged data associated to the `jobId` (how both sources of data are unified is store-dependent). A new `JobCoordinator` class exposes the following methods for PXE's convenience: - `registerStores(stagedStores: StagedStore[])`: makes a collection of stores known to the `JobCoordinator`. - `beginJob(): string`: called by PXE when a job starts, returns a `jobId` that then gets threaded through the job's phases. - `commitJob(jobId)`: iterates over all registered stores, calling `commit(jobId)` and wrapped by a `transactionAsync` call to guarantee that all writes happen in the same KV transaction. - `abortJob(jobId)`: same as `commitJob`, but calling `discard`. As a result, any data operations done before PXE decides to `commitJob` are discarded if PXE fails, process is killed, etc. This specific PR introduces the JobCoordinator class, and makes PXE jobs use it, and threads `jobId`'s through ContractFunctionSimulator and the oracles from where they will be used as params to store operations.

mverzilli added 30 commits December 16, 2025 15:48

extract getContractInstance from PXEOracleInterface

21db667

Merge branch 'next' into martin/refactor-pxe-oracle-interface-away

077c625

extract getFunctionArtifact from PXEOracleInterface

511fb18

extract getDebugFunctionName from PXEOracleInterface

3510ad5

extract getNotes from PXEOracleInterface

cec0289

merge

cef673b

extract getKeyValidationRequest from PXEOracleInterface

2ebef27

extract getCompleteAddress from PXEOracleInterface

61dec41

calculateDirectionalAppTaggingSecret

d1beaca

getSharedSecret

13118f9

getL1ToL2MembershipWitness

cad9544

fix regression

37ea1da

getMembershipWitness

aabc3a0

getLowNullifierMembershipWitness

8e79091

fix some regressions

6b9cd95

fix regression

6844a44

getBlock

31b5cea

getNulliferMembershipWitness and getNullifierMembershipWitnessAtLates…

b8f8de3

…tBlock

getPublicDataWitness

7514360

getPublicStorageAt

19c6fd7

fix regressions

95918da

assertCompatibleOracleVersion

245af9a

remove getSenders

cd0313d

storeCapsule

3bf3ac2

rest of capsule delegates

a022b02

getStats

a94eb8e

almost the rest of the frigging owl

ee24653

getNullifierIndex

c99cdf6

remove aztec node getter from PXEOracleInterface

86db9f5

note and event validation

ba6c7ed

mverzilli added 28 commits January 7, 2026 12:30

remove JobContext, use simple jobId strings

166264b

make sure note store uses staged writes

b9eafe1

thread jobIds through event store calls

a85cf6e

make sure we use jobIds with tagging store ops

b319c1f

make jobid mandatory

fad3030

make job_id mandatory for oracles

4bd9334

remove comment

d510ef0

polish job coordinator

010f373

use constant

db043a6

better comments for anchor_block_store

eaf4139

reduce verbosity

f7bff61

significantly refactor capsule_store_test

84f3ed6

fix potential deadlock on capsule store commit

0c799f7

significant refactor of getNotes

e21468d

significant refactor of note store

f9e17c0

leverage promise.all where low-hanging

7221d37

remove spurious comment

5d0e5c9

remove redundant comments

05a69ba

remove unnecessary comments

80fbc72

update comments

fdd3b13

update comments

b9600ab

remove old comment

ae2fa9d

fix bug in nullifier found during e2e testing

0e4852b

rollback bad fix

2504963

basic logging for debugging

7306b2c

Merge branch 'next' into martin/pxe-db-integrity

c58f1b7

more touches to private_event_store tests

3a9e138

fix nullifier commit bugs and sync tagging race

53b3753

mverzilli mentioned this pull request Jan 8, 2026

refactor!: Introduce PXE JobCoordinator #19445

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

refactor: crash tolerant PXE #19293

refactor: crash tolerant PXE #19293

Uh oh!

mverzilli commented Jan 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

refactor: crash tolerant PXE #19293

Are you sure you want to change the base?

refactor: crash tolerant PXE #19293

Uh oh!

Conversation

mverzilli commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

"Waiter, there's a jobId in my signature!"

(Partially) Free riding volatile arrays

Lukewarm refactor of NoteStore

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mverzilli commented Jan 2, 2026 •

edited

Loading