apollo_storage: Deduplicate reader traits and unify StorageTxn with RwLock#12406
Conversation
Simplify storage batching by eliminating the TxnStorage enum entirely. Both RO and RW transactions now borrow from the same persistent RW transaction stored in SharedBatchingState. Key changes: - Removed TxnStorage enum (Owned/Borrowed variants) - StorageTxn always uses borrowed reference (&DbTransaction) - StorageReader and StorageWriter share batching_state via Arc<Mutex> - StorageReader has DbWriter to create persistent RW transaction - RO transactions can see uncommitted writes from RW transactions - Simplified constructors: single new() instead of new_ro/new_borrowed - Removed batch_config parameter from StorageWriter::new() - Made DbWriter cloneable for sharing between reader and writer This allows RO transactions to read from the ongoing RW transaction, enabling them to see uncommitted data as requested. Co-authored-by: Cursor <cursoragent@cursor.com>
This commit eliminates all unsafe code from lib.rs by introducing a new design where StorageTxnRW holds a MutexGuard for the entire transaction lifetime, rather than locking/unlocking on every operation. Key changes: - Introduced StorageTxnRW that owns MutexGuard<SharedState> for RW transactions - StorageTxn now owns its DbTransaction for RO transactions (no shared state) - begin_rw_txn() locks mutex once and moves guard into StorageTxnRW - begin_ro_txn() creates fresh independent RO transaction (no unsafe cast) - Updated all writer trait implementations to use StorageTxnRW - Added reader trait implementations for StorageTxnRW (for read-after-write) - Made update_marker_metrics generic over reader traits Result: Zero unsafe code in lib.rs while maintaining batching functionality. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Artifacts upload workflows: |
|
|
||
| impl<'env> StateReader<'env, RW> { | ||
| /// Creates a new state reader from a StorageTxnRW. | ||
| fn new_from_rw(txn_rw: &'env StorageTxnRW<'env>) -> StorageResult<Self> { |
There was a problem hiding this comment.
Identical dead methods new and new_from_rw remain
Low Severity
StateReader::new and StateReader::new_from_rw now have identical signatures (both take &StorageTxn) and identical implementations after the type unification. Neither method is called anywhere in the codebase — get_state_reader constructs StateReader directly via struct literal. Both are dead code that can be removed.
63cf4f7 to
1499a27
Compare
matanl-starkware
left a comment
There was a problem hiding this comment.
@matanl-starkware reviewed 28 files and all commit messages, and made 5 comments.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @dean-starkware).
crates/apollo_storage/src/lib.rs line 268 at r2 (raw file):
shared_state: shared_state.clone(), scope: storage_config.scope, file_handlers: file_writers.clone(),
writers?
Is it a typo or a must? (If actually shared between Reader and Writer, why not placed in the shared_state from the first place?)
I don't see file_readers being used.
Code quote:
file_writers.clone()crates/apollo_storage/src/state/mod.rs line 227 at r2 (raw file):
/// A single coherent state at a single point in time, pub struct StateReader<'env> { txn: &'env DbTransaction<'env, RW>,
So this is how you avoid casting (and unsafe...).
I'm not sure that was the intention of the author.
Code quote:
txn: &'env DbTransaction<'env, RW>,crates/apollo_storage/src/body/events.rs line 170 at r2 (raw file):
/// This iterator goes over the events in the order of the events table key. /// That is, the events iterated first by the contract address and then by the event index. pub struct EventIterByContractAddress<'env, 'txn> {
Should we parameterize the cursors by TX type (RO/RW)?
Code quote:
EventIterByContractAddresscrates/apollo_storage/src/storage_metrics.rs line 20 at r2 (raw file):
pub fn update_storage_metrics(reader: &StorageReader) -> StorageResult<()> { debug!("updating storage metrics"); let state = reader.shared_state.lock().unwrap();
I don't like that.
The reader should be responsible for protecting itself, and cannot rely on the user to lock the state before any access.
Code quote:
reader.shared_state|
|
||
| impl<'env> StateReader<'env, RW> { | ||
| /// Creates a new state reader from a StorageTxnRW. | ||
| fn new_from_rw(txn_rw: &'env StorageTxnRW<'env>) -> StorageResult<Self> { |
This commit addresses two major improvements: 1. **Reader Trait Deduplication**: Introduced StorageTransaction trait to eliminate massive duplication across StorageTxn and StorageTxnRW implementations. Previously, every reader trait was implemented twice with identical code. Now, reader traits are implemented once generically over StorageTransaction. 2. **RwLock Unification**: Replaced Mutex with RwLock and unified StorageTxnRW into StorageTxn by removing the Mode type parameter. Both begin_rw_txn() and begin_ro_txn() now return StorageTxn<'_>, distinguished by TxnGuard enum (Read vs Write). This allows RO transactions to see uncommitted writes from the persistent RW transaction. Key changes: - Removed StorageTxnRW struct entirely - Removed Mode type parameter from StorageTxn, StateReader, StateStorageReader - Changed SharedState from Arc<Mutex<>> to Arc<RwLock<>> - Added TxnGuard enum to hold either RwLockReadGuard or RwLockWriteGuard - Updated all downstream crates to remove Mode parameter - RO transactions now access persistent RW transaction via read lock Performance trade-off: RO and RW now contend on RwLock (readers block writer, writer blocks all readers), whereas previously they were fully independent with zero contention. Co-authored-by: Cursor <cursoragent@cursor.com>
1499a27 to
121ac1d
Compare
| scope: self.scope, | ||
| _metric_updater: MetricsHandler::new(self.open_readers_metric), | ||
| }) | ||
| } |
There was a problem hiding this comment.
Concurrent readers share non-thread-safe MDBX transaction
High Severity
Multiple concurrent begin_ro_txn() calls (from different threads via the Clone-able StorageReader) each acquire an RwLockReadGuard, which permits concurrent access. All resulting StorageTxn instances call txn() which dereferences the same shared active_txn MDBX transaction. However, MDBX transactions are not thread-safe (as explicitly acknowledged by the TODO on OwnedDbWriteTransaction). Previously, each reader got an independent RO transaction from db_reader.begin_ro_txn(), ensuring thread isolation. Now, concurrent readers open tables and create cursors on the same underlying MDBX transaction, risking data corruption or crashes.
Additional Locations (1)
f5fb91c to
937a76a
Compare


This commit makes RO exposed to uncommitted data.
Performance trade-off: RO and RW now contend on RwLock (readers block writer, writer blocks all readers), whereas previously they were fully independent with zero contention.
Note
High Risk
Changes core storage transaction semantics and concurrency (RO now sees uncommitted data and contends with writers via
RwLock), which can affect correctness, performance, and deadlock/panic behavior across the node.Overview
Refactors
apollo_storagetransaction handling to unify RO and RW into a singleStorageTxntype backed by a shared persistent RW MDBX transaction guarded by anRwLock, so RO reads can observe uncommitted batched writes and writers coordinate via the same shared state.Deduplicates storage reader/writer trait impls by removing mode generics (
TransactionKind,RO/RW) from many APIs (RPC, p2p sync, execution utils, state sync/metrics, and storage modules), updates writer traits to implement onStorageTxn, and adds guardrails:commit()now errors on RO transactions andDroponly enforces uncommitted-write panics for write guards. Also routes DB stats/metrics queries through the shared writer-backed state and extendsDbWriterwith read-style stats helpers.Written by Cursor Bugbot for commit 121ac1d. This will update automatically on new commits. Configure here.