Skip to content

fix(db): reuse ReadOptions in ColumnDbSnapshot to reduce GC pressure#10894

Open
smartprogrammer93 wants to merge 5 commits intomasterfrom
fix/reuse-readoptions-in-column-db-snapshot
Open

fix(db): reuse ReadOptions in ColumnDbSnapshot to reduce GC pressure#10894
smartprogrammer93 wants to merge 5 commits intomasterfrom
fix/reuse-readoptions-in-column-db-snapshot

Conversation

@smartprogrammer93
Copy link
Contributor

@smartprogrammer93 smartprogrammer93 commented Mar 20, 2026

Changes

  • Share 2 ReadOptions instances (normal + cache-miss) across all 7 column RocksDbReader instances in ColumnDbSnapshot, instead of creating 14 separate ones
  • Explicitly destroy native ReadOptions handles via rocksdb_readoptions_destroy + GC.SuppressFinalize in ColumnDbSnapshot.Dispose(), with double-dispose guard
  • Replace Dictionary<T, IReadOnlyKeyValueStore> with flat array indexed by enum ordinal via Convert.ToInt32 (safe for all enum underlying types)
  • Share a single Func<ReadOptions> delegate across all column readers instead of 7 separate closures
  • Cache column keys and max ordinal on parent ColumnsDb<T> with volatile fields for thread safety
  • Add a new RocksDbReader constructor that accepts pre-created ReadOptions for shared use

Root cause: ReadOptions in RocksDbSharp has a finalizer (~ReadOptions()) but does not implement IDisposable. Each ColumnDbSnapshot created 14 finalizable ReadOptions objects (2 per column x 7 columns), plus a Dictionary, 7 RocksDbReader instances, and 7 closure delegates. In FlatState block processing, thousands of snapshots per second produce tens of thousands of finalizable objects that survive Gen0, get promoted to Gen1/Gen2, and trigger expensive stop-the-world GC collections.

Profiling data (ContractCall_200 FlatState, 2000 block processing calls per round, 20 rounds):

Metric FlatState (before) FlatState (after) Trie
Gen0 collections/round 38 35 30
Gen1 collections/round 9 6.8 0
Gen2 collections/round 1.3 0.8 0
Total Gen1 (20 rounds) 174 135 2
Total Gen2 (20 rounds) 26 16 0
Alloc/round 580MB 520MB 479MB
FinalizationPending 4,402 20 90
Gen2 heap size 20MB 17MB 7MB

Remaining Gen1/Gen2 pressure is from per-scope FlatState infrastructure objects (FlatWorldStateScope, SnapshotBundle, ReadOnlySnapshotBundle, StateTree, etc.) that require deeper architectural changes to pool.

Types of changes

  • Optimization
  • Bugfix (a non-breaking change that fixes an issue)

Testing

Requires testing

  • Yes

If yes, did you write tests?

  • Yes
  • No

Notes on testing

Validated via ProfileRunner benchmark with per-round GC tracking. Existing tests should pass since the behavior is identical - same ReadOptions configuration, just shared instead of duplicated. The array-indexed lookup via Convert.ToInt32 is functionally equivalent to the dictionary lookup and safe for all enum underlying types.

Documentation

Requires documentation update

  • No

Requires explanation in Release Notes

  • Yes
  • No

…zer pressure

ColumnDbSnapshot previously created 14 ReadOptions per snapshot (2 per column
× 7 columns). ReadOptions in RocksDbSharp has a finalizer but no IDisposable,
so these objects survive Gen0, get promoted to Gen1/Gen2, and cause expensive
GC pauses. In FlatState block processing, 2000+ snapshots per BDN round
produce 28,000+ finalizable objects, causing Gen1 (9/round) and Gen2 (1-2/round)
collections that create timing spikes.

Fix: Create 2 shared ReadOptions (normal + cache-miss) per ColumnDbSnapshot
and pass them to all RocksDbReader instances. Explicitly destroy the native
handles via rocksdb_readoptions_destroy + GC.SuppressFinalize in Dispose().

This reduces finalizable objects from 14 to 0 per snapshot scope.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…egate

ColumnDbSnapshot used Dictionary<T, IReadOnlyKeyValueStore> to map columns
to readers, plus 7 separate Func<ReadOptions> closure allocations. Replace
with a flat array indexed by enum ordinal and a single shared delegate.

This eliminates per-snapshot: 1 Dictionary + backing array + 6 extra delegate
allocations (from 7 down to 1 shared).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@smartprogrammer93
Copy link
Contributor Author

@claude review

@claude
Copy link

claude bot commented Mar 20, 2026

Claude finished @smartprogrammer93's task in 2m 5s —— View job


Code Review

Good optimization overall — the profiling data clearly justifies the changes, and the approach of eagerly destroying native handles + suppressing finalizers is sound. Below are the issues I found.


Critical

1. Unsafe.As<T, int> is unsafe for non-int-backed enums [ColumnsDb.cs:209, 220, 233]

int ord = Unsafe.As<T, int>(ref k);

where T : struct, Enum does not constrain the underlying type to int. If a caller uses a byte-, short-, long-, or uint-backed enum, Unsafe.As<T, int> performs a 4-byte read from a potentially 1- or 2-byte value on the stack — a buffer overread that silently produces garbage ordinals. Currently FlatState columns are int-backed, but this is a landmine for future callers.

Use Convert.ToInt32(k) for correctness across all underlying types, or leverage the already-present FastEnumUtility dependency which handles this safely.

Fix this →


2. No double-dispose guard in ColumnDbSnapshot.Dispose() [ColumnsDb.cs:237-247]

public void Dispose()
{
    Native.Instance.rocksdb_readoptions_destroy(_sharedReadOptions.Handle);
    GC.SuppressFinalize(_sharedReadOptions);
    Native.Instance.rocksdb_readoptions_destroy(_sharedCacheMissReadOptions.Handle);
    GC.SuppressFinalize(_sharedCacheMissReadOptions);
    _snapshot.Dispose();
}

If Dispose() is called more than once (which is valid per IDisposable contract), rocksdb_readoptions_destroy runs on an already-freed native handle — a use-after-free. Add a _disposed flag or use Interlocked.Exchange.

Fix this →


Moderate

3. Data race on _cachedColumnKeys and _cachedMaxOrdinal [ColumnsDb.cs:202-213]

T[] keys = columnsDb._cachedColumnKeys ??= columnsDb._columnDbs.Keys.ToArray();
if (columnsDb._cachedMaxOrdinal < 0)
{
    ...
    columnsDb._cachedMaxOrdinal = max;
}

ColumnsDb<T> is long-lived and CreateSnapshot() can be called from multiple threads. The ??= and if < 0 / assign pattern are not atomic — both threads can race through them. The race is "benign" (both compute identical results), but without volatile or Interlocked, the JIT/CPU is permitted to cache the stale value. Use LazyInitializer.EnsureInitialized for the array, and a volatile field or Interlocked.CompareExchange for the int.


4. Missing regression test [AGENTS.md]

Per project guidelines: "When fixing a bug, always add a regression test." The PR notes no tests were written. Even a simple unit test verifying that ColumnDbSnapshot.Dispose() runs without throwing (idempotency) and that shared ReadOptions are not double-freed would cover the fix.


Minor

5. readOptionsFactory closure comment is slightly misleading [ColumnsDb.cs:193-199]

// Single shared delegate for GetViewBetween — avoids per-reader closure allocation
Func<ReadOptions> readOptionsFactory = () => { ... };

The delegate is one allocation (good), but each call to GetViewBetween still invokes readOptionsFactory() which creates a new ReadOptions with a finalizer. These per-call ReadOptions are not explicitly destroyed. This is pre-existing behavior, but worth noting since the PR's motivation is exactly finalizer pressure — GetViewBetween-created ReadOptions remain unfixed.

6. LINQ in constructor [ColumnsDb.cs:202]

columnsDb._columnDbs.Keys.ToArray()

The coding style says "No LINQ when a simple for/foreach works." This is a constructor path (not hot), but a simple manual loop into a pre-sized array is consistent with the codebase style.


Summary

The optimization is well-motivated and the profiling data is compelling. The two critical issues (unsafe enum cast, missing double-dispose guard) should be addressed before merge. The thread-safety concern on the parent-level cache is also worth fixing since ColumnsDb is a long-lived shared object.

- Replace Unsafe.As<T, int> with Convert.ToInt32 for safe enum-to-int
  conversion regardless of underlying type (critical: buffer overread)
- Add _disposed guard to Dispose() to prevent double-free on native
  ReadOptions handles (critical: use-after-free)
- Use volatile fields for _cachedColumnKeys/_cachedMaxOrdinal to ensure
  cross-thread visibility (moderate: data race)
- Replace LINQ ToArray() with manual loop (minor: style)
- Clarify comment on readOptionsFactory noting GetViewBetween still
  creates per-call ReadOptions (minor: documentation)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@smartprogrammer93
Copy link
Contributor Author

@claude rereview

@claude
Copy link

claude bot commented Mar 20, 2026

Claude finished @smartprogrammer93's task in 1m 38s —— View job


Re-review

Both critical issues from the previous review have been addressed. Here's the full updated status.


Previously Critical — Now Fixed

1. ✅ Unsafe.As<T, int> replaced with Convert.ToInt32 [ColumnsDb.cs:222, 231, 242]

All three call sites now use Convert.ToInt32(...) — correct for all enum underlying types. No remaining Unsafe.As usage.

2. ✅ Double-dispose guard added [ColumnsDb.cs:247-248]

if (_disposed) return;
_disposed = true;

Clean and correct. Note: per coding-style, ObjectDisposedException.ThrowIf is the convention for methods called on a disposed object (e.g. GetColumn) — not for Dispose() itself, where silent idempotency is the correct contract. The current approach is fine.


Previously Moderate — Addressed with Acknowledged Trade-off

3. ⚠️ Benign data race on _cachedColumnKeys / _cachedMaxOrdinal [ColumnsDb.cs:207-224]

Both fields are now volatile. The in-code comment explicitly documents the race and the reasoning:

"The race is benign (both threads compute identical results) and volatile ensures visibility across cores."

This is acceptable. volatile guarantees that once any thread writes the value, subsequent reads by other threads will observe it. Two threads racing through the null / < 0 check will both compute identical results and write the same value — there is no inconsistency possible. LazyInitializer.EnsureInitialized would be more idiomatic, but the current approach is correct.


Previously Minor — Fixed

4. ✅ LINQ removed from constructor [ColumnsDb.cs:211-214]

Replaced with a manual foreach loop — consistent with the no-LINQ-when-foreach-works style rule.

5. ✅ Clarifying comment on readOptionsFactory [ColumnsDb.cs:193-196]

The comment now explicitly calls out that per-call ReadOptions in GetViewBetween is pre-existing behavior not addressed by this PR. Good.


Still Open

6. Missing regression test

Per project guidelines (AGENTS.md): "When fixing a bug, always add a regression test." A simple test verifying ColumnDbSnapshot.Dispose() is idempotent (double-dispose does not throw or crash) would satisfy this requirement and directly cover the double-free fix.


New Observation

7. GetColumn does not guard against post-dispose use [ColumnsDb.cs:240-243]

public IReadOnlyKeyValueStore GetColumn(T key)
{
    return _readers[Convert.ToInt32(key)];
}

After Dispose(), the native ReadOptions handles are freed, but GetColumn still returns a live RocksDbReader whose _options point to destroyed handles. Calling Get/GetSpan on the returned reader would be a use-after-free at the native layer. The coding style convention is ObjectDisposedException.ThrowIf(_disposed, this) at the top of GetColumn. This is a pre-existing gap (the original code had no disposal at all), but since this PR introduces _disposed, it's a natural place to add the guard.

Fix this →


Summary

The two blockers from the first review are resolved. The remaining items are:

  • Missing test (moderate — project requirement per AGENTS.md)
  • Post-dispose GetColumn guard (minor — pre-existing gap, natural to fix since _disposed is now in place)

The optimization itself is solid and the profiling data clearly justifies it. Good to merge once the test is added.

Copy link
Member

@LukaszRozmej LukaszRozmej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use helper methods to provide better context - naming what you are doing, will make code more self-documenting

Comment on lines +48 to +53
public RocksDbReader(DbOnTheRocks mainDb,
ReadOptions options,
ReadOptions hintCacheMissOptions,
Func<ReadOptions> readOptionsFactory,
DbOnTheRocks.IteratorManager? iteratorManager = null,
ColumnFamilyHandle? columnFamily = null)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this constructor be called by the other constructor with : this()? I think so

public void Dispose()
{
snapshot.Dispose();
if (_disposed) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be threadsafe?

We could do - although would need to by byte not bool

Suggested change
if (_disposed) return;
bool disposed = Interlocked.Exchange(ref _disposed, true);
if (disposed) return;

public IReadOnlyKeyValueStore GetColumn(T key)
{
return _columnDbs[key];
return _readers[Convert.ToInt32(key)];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be single line with =>

Comment on lines +184 to +186
_sharedReadOptions = new ReadOptions();
_sharedReadOptions.SetVerifyChecksums(columnsDb.VerifyChecksum);
_sharedReadOptions.SetSnapshot(snapshot);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper method

Comment on lines +188 to +191
_sharedCacheMissReadOptions = new ReadOptions();
_sharedCacheMissReadOptions.SetVerifyChecksums(columnsDb.VerifyChecksum);
_sharedCacheMissReadOptions.SetSnapshot(snapshot);
_sharedCacheMissReadOptions.SetFillCache(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper method

Comment on lines +207 to +216
T[]? keys = columnsDb._cachedColumnKeys;
if (keys is null)
{
IDictionary<T, ColumnDb> columnDbs = columnsDb._columnDbs;
keys = new T[columnDbs.Count];
int idx = 0;
foreach (T key in columnDbs.Keys)
keys[idx++] = key;
columnsDb._cachedColumnKeys = keys;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helper method

Comment on lines +218 to +224
if (columnsDb._cachedMaxOrdinal < 0)
{
int max = 0;
for (int i = 0; i < keys.Length; i++)
max = Math.Max(max, Convert.ToInt32(keys[i]));
columnsDb._cachedMaxOrdinal = max;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helper method

Comment on lines +227 to +237
_readers = new RocksDbReader[columnsDb._cachedMaxOrdinal + 1];
for (int i = 0; i < keys.Length; i++)
{
T k = keys[i];
_readers[Convert.ToInt32(k)] = new RocksDbReader(
columnsDb,
_sharedReadOptions,
_sharedCacheMissReadOptions,
readOptionsFactory,
columnFamily: columnsDb._columnDbs[k]._columnFamily);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper method

@smartprogrammer93
Copy link
Contributor Author

@claude review

@claude
Copy link

claude bot commented Mar 20, 2026

Claude finished @smartprogrammer93's task in 2m 2s —— View job


Re-review (Final)

All previous critical and moderate issues have been resolved. LukaszRozmej's review suggestions are also addressed. Here is the full updated status.


Previously Critical — Fixed ✅

  1. Unsafe.As<T, int> replaced — now uses EnumToInt(T value) => Convert.ToInt32(value) helper. Correct for all enum underlying types.
  2. Double-dispose guardInterlocked.Exchange(ref _disposed, 1) != 0 with private int _disposed. Clean and correct.

LukaszRozmej's Suggestions — All Addressed ✅

  • Constructor chaining in RocksDbReader — the factory constructor already chains to the full constructor via : this(mainDb, readOptionsFactory(), readOptionsFactory(), ...), then mutates _hintCacheMissOptions.SetFillCache(false) in the body. This is the correct pattern since the readonly field is set by the chain and the mutation follows.
  • Interlocked.Exchange dispose guard — implemented with int _disposed as required (bool not supported by Interlocked.Exchange; int is idiomatic).
  • Single-line GetColumnpublic IReadOnlyKeyValueStore GetColumn(T key) => _readers[EnumToInt(key)];
  • Helper methodsDestroyReadOptions, EnumToInt, and the static local functions CreateReadOptions, CreateKeyCache, GetCachedMaxOrdinal all extracted. ✅

Minor Finding

Missing private on fields in RocksDbReader.cs:25-26

readonly ReadOptions _options;
readonly ReadOptions _hintCacheMissOptions;

All other fields in the class (lines 20-23) have explicit private readonly. These two are missing the private modifier — they're implicitly private in C#, but the inconsistency with the rest of the class is worth fixing for style consistency.

Fix this →


Still Open

GetColumn post-dispose guard — After Dispose(), native ReadOptions handles are freed but GetColumn still returns readers whose _options/_hintCacheMissOptions point to destroyed handles. Per coding style, ObjectDisposedException.ThrowIf(_disposed != 0, this) should be added. This is a pre-existing gap now surfaced by the introduction of _disposed.

Fix this →

Missing regression test — Per AGENTS.md: "When fixing a bug, always add a regression test." A simple test verifying ColumnDbSnapshot.Dispose() is idempotent (no throw on double-dispose) would directly cover the double-free fix and satisfy the project guideline.


Summary

The optimization is well-implemented and the profiling data clearly justifies it. The PR has been approved by LukaszRozmej. The two open items (GetColumn ODE guard and regression test) are minor but worth addressing before merge given the project's guidelines. The private modifier omission on two fields in RocksDbReader.cs is a trivial style inconsistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants