-
Notifications
You must be signed in to change notification settings - Fork 579
refactor: crash tolerant PXE #19293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
mverzilli
wants to merge
153
commits into
next
Choose a base branch
from
martin/pxe-db-integrity
base: next
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
refactor: crash tolerant PXE #19293
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
github-merge-queue bot
pushed a commit
that referenced
this pull request
Jan 9, 2026
I decided to fragment #19293 into a smaller, more digestible (both for reviewers and for myself) series of PRs. The end goal is to refactor PXE's stores so they work with "staged writes": every write to a store is now kept in memory segmented by a `jobId`, and is not written to the underlying KV store until a coordinated commit. Relevant stores will (in subsequent PRs) implement a new `StagedStore` interface, which defines the following methods: - `commit(jobId)`: when called, moves all the in-data memory corresponding to `jobId` to the persistent KV store. - `discardStaged(jobId)`: clears up any in-memory data structures associated to `jobId` without persisting. Read operations can optionally receive a `jobId`, which affects behavior as follows: - If not provided (or undefined): read from KV store ("read committed") - If provided: read committed + staged data associated to the `jobId` (how both sources of data are unified is store-dependent). A new `JobCoordinator` class exposes the following methods for PXE's convenience: - `registerStores(stagedStores: StagedStore[])`: makes a collection of stores known to the `JobCoordinator`. - `beginJob(): string`: called by PXE when a job starts, returns a `jobId` that then gets threaded through the job's phases. - `commitJob(jobId)`: iterates over all registered stores, calling `commit(jobId)` and wrapped by a `transactionAsync` call to guarantee that all writes happen in the same KV transaction. - `abortJob(jobId)`: same as `commitJob`, but calling `discard`. As a result, any data operations done before PXE decides to `commitJob` are discarded if PXE fails, process is killed, etc. This specific PR introduces the JobCoordinator class, and makes PXE jobs use it, and threads `jobId`'s through ContractFunctionSimulator and the oracles from where they will be used as params to store operations.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Overview
Refactor PXE's stores so they work with "staged writes": every write to a store is now kept in memory segmented by a
jobId, and is not written to the underlying KV store.Relevant stores implement a new
StagedStoreinterface, with defines the following methods:commit(jobId): when called, moves all the in-data memory corresponding tojobIdto the persistent KV store.discardStaged(jobId): clears up any in-memory data structures associated tojobIdwithout persisting.Read operations can optionally receive a
jobId, which affects behavior as follows:jobId(how both sources of data are unified is store-dependent).A new
JobCoordinatorclass exposes the following methods for PXE's convenience:registerStores(stagedStores: StagedStore[]): makes a collection of stores known to theJobCoordinator.beginJob(): string: called by PXE when a job starts, returns ajobIdthat then gets threaded through the job's phases.commitJob(jobId): iterates over all registered stores, callingcommit(jobIbd)and wrapped by atransactionAsynccall to guarantee that all writes happen in the same KV transaction.abortJob(jobId): same ascommitJob, but callingdiscard.As a result, any data operations done before PXE decides to
commitJobare discarded if PXE fails, process is killed, etc."Waiter, there's a jobId in my signature!"
Perhaps a not so nice consequence of this change, is that many methods now expect a
jobIdwhich makes some tests a bit more cumbersome to write (particularly theCapsuleStoresuite). I chose to nevertheless makejobIdmandatory because I prefer not to open the door to inadvertent misusage (imagine forgetting to pass a param and as a result having writes leak to other jobs when you thought you were cozyed up in an isolated transactional context).(Partially) Free riding volatile arrays
In F-136 we want to introduce volatile arrays. Incidentally, this PR makes all writes in memory by default, including capsules. Which is not exactly F-136, but if we reach the end of the job having consumed all capsules written during it, upon
committhere will be nothing to save, so it will look an awful lot like what a volatile array would do.Lukewarm refactor of NoteStore
Why lukewarm? I didn't want to re-think the indexes because this PR is about trying to make what we had crash-tolerant. At the same time, making NoteStore mix and mash db with in-memory data was a bit of a headache with the state the code was in so I did need to move around some code for my sanity. But it's not clear whether we really need so many indexes in this store (and I'm explicitly leaving that as future work)