Skip to content

apollo_storage: Deduplicate reader traits and unify StorageTxn with RwLock#12406

Open
dean-starkware wants to merge 3 commits intodean/storage-batching-unsafe-code-removalfrom
dean/storage-reader-trait-dedup-and-rwlock
Open

apollo_storage: Deduplicate reader traits and unify StorageTxn with RwLock#12406
dean-starkware wants to merge 3 commits intodean/storage-batching-unsafe-code-removalfrom
dean/storage-reader-trait-dedup-and-rwlock

Conversation

@dean-starkware
Copy link
Contributor

@dean-starkware dean-starkware commented Feb 9, 2026

This commit makes RO exposed to uncommitted data.

Performance trade-off: RO and RW now contend on RwLock (readers block writer, writer blocks all readers), whereas previously they were fully independent with zero contention.


Note

High Risk
Changes core storage transaction semantics and concurrency (RO now sees uncommitted data and contends with writers via RwLock), which can affect correctness, performance, and deadlock/panic behavior across the node.

Overview
Refactors apollo_storage transaction handling to unify RO and RW into a single StorageTxn type backed by a shared persistent RW MDBX transaction guarded by an RwLock, so RO reads can observe uncommitted batched writes and writers coordinate via the same shared state.

Deduplicates storage reader/writer trait impls by removing mode generics (TransactionKind, RO/RW) from many APIs (RPC, p2p sync, execution utils, state sync/metrics, and storage modules), updates writer traits to implement on StorageTxn, and adds guardrails: commit() now errors on RO transactions and Drop only enforces uncommitted-write panics for write guards. Also routes DB stats/metrics queries through the shared writer-backed state and extends DbWriter with read-style stats helpers.

Written by Cursor Bugbot for commit 121ac1d. This will update automatically on new commits. Configure here.

dean-starkware and others added 2 commits February 5, 2026 11:32
Simplify storage batching by eliminating the TxnStorage enum entirely.
Both RO and RW transactions now borrow from the same persistent RW
transaction stored in SharedBatchingState.

Key changes:
- Removed TxnStorage enum (Owned/Borrowed variants)
- StorageTxn always uses borrowed reference (&DbTransaction)
- StorageReader and StorageWriter share batching_state via Arc<Mutex>
- StorageReader has DbWriter to create persistent RW transaction
- RO transactions can see uncommitted writes from RW transactions
- Simplified constructors: single new() instead of new_ro/new_borrowed
- Removed batch_config parameter from StorageWriter::new()
- Made DbWriter cloneable for sharing between reader and writer

This allows RO transactions to read from the ongoing RW transaction,
enabling them to see uncommitted data as requested.

Co-authored-by: Cursor <cursoragent@cursor.com>
This commit eliminates all unsafe code from lib.rs by introducing a new
design where StorageTxnRW holds a MutexGuard for the entire transaction
lifetime, rather than locking/unlocking on every operation.

Key changes:
- Introduced StorageTxnRW that owns MutexGuard<SharedState> for RW transactions
- StorageTxn now owns its DbTransaction for RO transactions (no shared state)
- begin_rw_txn() locks mutex once and moves guard into StorageTxnRW
- begin_ro_txn() creates fresh independent RO transaction (no unsafe cast)
- Updated all writer trait implementations to use StorageTxnRW
- Added reader trait implementations for StorageTxnRW (for read-after-write)
- Made update_marker_metrics generic over reader traits

Result: Zero unsafe code in lib.rs while maintaining batching functionality.
Co-authored-by: Cursor <cursoragent@cursor.com>
@reviewable-StarkWare
Copy link

This change is Reviewable

@github-actions
Copy link

github-actions bot commented Feb 9, 2026

@dean-starkware dean-starkware changed the title Deduplicate reader traits and unify StorageTxn with RwLock apollo_storage: Deduplicate reader traits and unify StorageTxn with RwLock Feb 9, 2026

impl<'env> StateReader<'env, RW> {
/// Creates a new state reader from a StorageTxnRW.
fn new_from_rw(txn_rw: &'env StorageTxnRW<'env>) -> StorageResult<Self> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Identical dead methods new and new_from_rw remain

Low Severity

StateReader::new and StateReader::new_from_rw now have identical signatures (both take &StorageTxn) and identical implementations after the type unification. Neither method is called anywhere in the codebase — get_state_reader constructs StateReader directly via struct literal. Both are dead code that can be removed.

Fix in Cursor Fix in Web

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

@dean-starkware dean-starkware force-pushed the dean/storage-reader-trait-dedup-and-rwlock branch from 63cf4f7 to 1499a27 Compare February 9, 2026 14:30
Copy link
Collaborator

@matanl-starkware matanl-starkware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@matanl-starkware reviewed 28 files and all commit messages, and made 5 comments.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @dean-starkware).


crates/apollo_storage/src/lib.rs line 268 at r2 (raw file):

        shared_state: shared_state.clone(),
        scope: storage_config.scope,
        file_handlers: file_writers.clone(),

writers?
Is it a typo or a must? (If actually shared between Reader and Writer, why not placed in the shared_state from the first place?)
I don't see file_readers being used.

Code quote:

file_writers.clone()

crates/apollo_storage/src/state/mod.rs line 227 at r2 (raw file):

/// A single coherent state at a single point in time,
pub struct StateReader<'env> {
    txn: &'env DbTransaction<'env, RW>,

So this is how you avoid casting (and unsafe...).
I'm not sure that was the intention of the author.

Code quote:

txn: &'env DbTransaction<'env, RW>,

crates/apollo_storage/src/body/events.rs line 170 at r2 (raw file):

/// This iterator goes over the events in the order of the events table key.
/// That is, the events iterated first by the contract address and then by the event index.
pub struct EventIterByContractAddress<'env, 'txn> {

Should we parameterize the cursors by TX type (RO/RW)?

Code quote:

EventIterByContractAddress

crates/apollo_storage/src/storage_metrics.rs line 20 at r2 (raw file):

pub fn update_storage_metrics(reader: &StorageReader) -> StorageResult<()> {
    debug!("updating storage metrics");
    let state = reader.shared_state.lock().unwrap();

I don't like that.
The reader should be responsible for protecting itself, and cannot rely on the user to lock the state before any access.

Code quote:

reader.shared_state


impl<'env> StateReader<'env, RW> {
/// Creates a new state reader from a StorageTxnRW.
fn new_from_rw(txn_rw: &'env StorageTxnRW<'env>) -> StorageResult<Self> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

This commit addresses two major improvements:

1. **Reader Trait Deduplication**: Introduced StorageTransaction trait to eliminate massive duplication across StorageTxn and StorageTxnRW implementations. Previously, every reader trait was implemented twice with identical code. Now, reader traits are implemented once generically over StorageTransaction.

2. **RwLock Unification**: Replaced Mutex with RwLock and unified StorageTxnRW into StorageTxn by removing the Mode type parameter. Both begin_rw_txn() and begin_ro_txn() now return StorageTxn<'_>, distinguished by TxnGuard enum (Read vs Write). This allows RO transactions to see uncommitted writes from the persistent RW transaction.

Key changes:
- Removed StorageTxnRW struct entirely
- Removed Mode type parameter from StorageTxn, StateReader, StateStorageReader
- Changed SharedState from Arc<Mutex<>> to Arc<RwLock<>>
- Added TxnGuard enum to hold either RwLockReadGuard or RwLockWriteGuard
- Updated all downstream crates to remove Mode parameter
- RO transactions now access persistent RW transaction via read lock

Performance trade-off: RO and RW now contend on RwLock (readers block writer, writer blocks all readers), whereas previously they were fully independent with zero contention.

Co-authored-by: Cursor <cursoragent@cursor.com>
@dean-starkware dean-starkware force-pushed the dean/storage-reader-trait-dedup-and-rwlock branch from 1499a27 to 121ac1d Compare February 9, 2026 15:36
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

scope: self.scope,
_metric_updater: MetricsHandler::new(self.open_readers_metric),
})
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent readers share non-thread-safe MDBX transaction

High Severity

Multiple concurrent begin_ro_txn() calls (from different threads via the Clone-able StorageReader) each acquire an RwLockReadGuard, which permits concurrent access. All resulting StorageTxn instances call txn() which dereferences the same shared active_txn MDBX transaction. However, MDBX transactions are not thread-safe (as explicitly acknowledged by the TODO on OwnedDbWriteTransaction). Previously, each reader got an independent RO transaction from db_reader.begin_ro_txn(), ensuring thread isolation. Now, concurrent readers open tables and create cursors on the same underlying MDBX transaction, risking data corruption or crashes.

Additional Locations (1)

Fix in Cursor Fix in Web

@dean-starkware dean-starkware force-pushed the dean/storage-batching-unsafe-code-removal branch 11 times, most recently from f5fb91c to 937a76a Compare February 22, 2026 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants