Skip to content

Conversation

@mverzilli
Copy link
Contributor

@mverzilli mverzilli commented Jan 2, 2026

Overview

Refactor PXE's stores so they work with "staged writes": every write to a store is now kept in memory segmented by a jobId, and is not written to the underlying KV store.

Relevant stores implement a new StagedStore interface, with defines the following methods:

  • commit(jobId): when called, moves all the in-data memory corresponding to jobId to the persistent KV store.
  • discardStaged(jobId): clears up any in-memory data structures associated to jobId without persisting.

Read operations can optionally receive a jobId, which affects behavior as follows:

  • If not provided (or undefined): read from KV store (aka "read committed")
  • If provided: read committed + staged data associated to the jobId (how both sources of data are unified is store-dependent).

A new JobCoordinator class exposes the following methods for PXE's convenience:

  • registerStores(stagedStores: StagedStore[]): makes a collection of stores known to the JobCoordinator.
  • beginJob(): string: called by PXE when a job starts, returns a jobId that then gets threaded through the job's phases.
  • commitJob(jobId): iterates over all registered stores, calling commit(jobIbd) and wrapped by a transactionAsync call to guarantee that all writes happen in the same KV transaction.
  • abortJob(jobId): same as commitJob, but calling discard.

As a result, any data operations done before PXE decides to commitJob are discarded if PXE fails, process is killed, etc.

"Waiter, there's a jobId in my signature!"

Perhaps a not so nice consequence of this change, is that many methods now expect a jobId which makes some tests a bit more cumbersome to write (particularly the CapsuleStore suite). I chose to nevertheless make jobId mandatory because I prefer not to open the door to inadvertent misusage (imagine forgetting to pass a param and as a result having writes leak to other jobs when you thought you were cozyed up in an isolated transactional context).

(Partially) Free riding volatile arrays

In F-136 we want to introduce volatile arrays. Incidentally, this PR makes all writes in memory by default, including capsules. Which is not exactly F-136, but if we reach the end of the job having consumed all capsules written during it, upon commit there will be nothing to save, so it will look an awful lot like what a volatile array would do.

Lukewarm refactor of NoteStore

Why lukewarm? I didn't want to re-think the indexes because this PR is about trying to make what we had crash-tolerant. At the same time, making NoteStore mix and mash db with in-memory data was a bit of a headache with the state the code was in so I did need to move around some code for my sanity. But it's not clear whether we really need so many indexes in this store (and I'm explicitly leaving that as future work)

github-merge-queue bot pushed a commit that referenced this pull request Jan 9, 2026
I decided to fragment #19293 into a smaller, more digestible (both for
reviewers and for myself) series of PRs.

The end goal is to refactor PXE's stores so they work with "staged
writes": every write to a store is now kept in memory segmented by a
`jobId`, and is not written to the underlying KV store until a
coordinated commit.

Relevant stores will (in subsequent PRs) implement a new `StagedStore`
interface, which defines the following methods:
- `commit(jobId)`: when called, moves all the in-data memory
corresponding to `jobId` to the persistent KV store.
- `discardStaged(jobId)`: clears up any in-memory data structures
associated to `jobId` without persisting.

Read operations can optionally receive a `jobId`, which affects behavior
as follows:
- If not provided (or undefined): read from KV store ("read committed")
- If provided: read committed + staged data associated to the `jobId`
(how both sources of data are unified is store-dependent).

A new `JobCoordinator` class exposes the following methods for PXE's
convenience:
- `registerStores(stagedStores: StagedStore[])`: makes a collection of
stores known to the `JobCoordinator`.
- `beginJob(): string`: called by PXE when a job starts, returns a
`jobId` that then gets threaded through the job's phases.
- `commitJob(jobId)`: iterates over all registered stores, calling
`commit(jobId)` and wrapped by a `transactionAsync` call to guarantee
that all writes happen in the same KV transaction.
- `abortJob(jobId)`: same as `commitJob`, but calling `discard`.

As a result, any data operations done before PXE decides to `commitJob`
are discarded if PXE fails, process is killed, etc.

This specific PR introduces the JobCoordinator class, and makes PXE jobs
use it, and threads `jobId`'s through ContractFunctionSimulator and the
oracles from where they will be used as params to store operations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants