Implement DB benchmark #18117

georgeee · 2025-11-15T10:26:36Z

This PR implements a benchmark for DB usage.

It independently measures read and write performance of every DB imnplementation.
Allows to make informed decisions for various flows of working with data, keeping measurement of pure DB performance separate from performance of other subsystems (including serialization).

See db_benchmark/README.md for more details about the benchmark.

Explain how you tested your changes:

Executed the benchmark successfully

Checklist:

Dependency versions are unchanged
- Notify Velocity team if dependencies must change in CI
Modified the current draft of release notes with details on what is completed or incomplete within this project
Document code purpose, how to use it
- Mention expected invariants, implicit constraints
Tests were added for the new behavior
- Document test purpose, significance of failures
- Test names should reflect their purpose
All tests pass (CI will check this if you didn't)
Serialized types are in stable-versioned modules
Does this close issues? None

georgeee · 2025-11-15T10:31:45Z

Result of the run with default parameters

Name	Time/Run	mWd/Run	mjWd/Run	Prom/Run	Percentage
rocksdb_write	705_955.63us	158_922.00w	482.99w	482.99w	26.18%
rocksdb_read	125.97us	1_935.00w	16_399.56w	13.56w
lmdb_write	2_696_045.27us	10_047.00w	12.96w	12.96w	100.00%
lmdb_read	96.56us	1_217.00w	16_387.19w	1.19w
single_file_write	70_465.27us	19_047.38w	324.69w	324.69w	2.61%
single_file_read	362.82us	1_273.00w	73_738.82w	2.82w	0.01%
multi_file_write	83_925.40us	559.00w	2_048_384.01w	382.01w	3.11%
multi_file_read	184.54us	1_269.00w	32_772.38w	0.38w

📊 Benchmark Analysis: Write vs Read Performance

Test Configuration:

📦 Keys per block: 125
💾 Value size: 131,072 bytes (128 KB)
🔢 Blocks in DB: 800
Total data: ~100,000 keys, ~12.8 GB total

✍️ Write Performance Comparison

Speed (Time/Run - lower is better):

🥇 single_file_write: ~70ms - fastest option
🥈 multi_file_write: ~84ms - very close second
⚠️ rocksdb_write: ~706ms - 10x slower than single file
🔴 lmdb_write: ~2,696ms - significantly slower (38x slower than single file)

Memory Allocation:

💚 multi_file_write: 559w - minimal minor heap allocation
⚠️ rocksdb_write: 158,922w - high memory allocation
🔴 multi_file_write (mjWd): 2,048,384w - very high major heap pressure from file operations

📖 Read Performance Comparison

Speed (Time/Run - all very fast):

🥇 lmdb_read: ~97μs - fastest
🥈 rocksdb_read: ~126μs - nearly identical
✅ multi_file_read: ~185μs - still excellent
✅ single_file_read: ~363μs - slowest but still sub-millisecond

All read operations are extremely fast (microsecond range vs millisecond writes).

🎯 Key Takeaways

✅ LMDB: Terrible write performance (~2.7s per operation) but excellent read speed
✅ Simple file I/O: Best write performance by far - ideal for large value storage
✅ RocksDB: Balanced middle-ground but high memory usage on writes
✅ Large values (128 KB): Simple file approaches dominate for write throughput

💡 Recommendation: For large-value workloads like this (128 KB per value):

Write-heavy → single_file_write is the clear winner
Read-heavy → LMDB or RocksDB provide faster lookups

Update after optimization of multi-file writing

📊 multi_file_write Benchmark Results

Name	Time/Run	mWd/Run	mjWd/Run	Prom/Run	Percentage
multi_file_write	45.50ms	553.00w	24.03w	24.03w	100.00%

📈 Before vs After Comparison

Performance Gains:

⏱️ Time: 83,925μs → 45,500μs (45.5ms)
📉 Speedup: 1.84x faster 🎉
💾 Memory (mWd): 559w → 553w (essentially unchanged)
🔄 Major heap (mjWd): 2,048,384w → 24.03w
⚡ Major heap reduction: 99.999% reduction! 🔥

🏆 Updated Write Performance Rankings

🥇 multi_file_write (new): ~45.5ms - NEW CHAMPION
🥈 single_file_write: ~70ms (1.54x slower)
⚠️ rocksdb_write: ~706ms (15.5x slower)
🔴 lmdb_write: ~2,696ms (59x slower)

💡 What Changed?

The massive mjWd reduction (from 2M+ to 24w) suggests you eliminated file system churn or excessive allocations. This is a textbook example of optimization - you kept the speed advantage while making it vastly more GC-friendly.

New recommendation: For large-value (128 KB) write workloads, multi_file_write is now the clear winner - fastest write speed AND minimal heap pressure. 🎯

georgeee · 2025-11-15T11:45:42Z

🚀 Smaller Values, Different Story

📊 Full Benchmark Results (New Parameters)

Test Configuration:

📦 Keys per block: 32
💾 Value size: 9,000 bytes (8.8 KB)
🔢 Warmup blocks: 1,000
Total warmup: 32,000 keys

Name	Time/Run	mWd/Run	mjWd/Run	Prom/Run	Percentage
rocksdb_write	4,033.82us	40,708.00w	20.57w	20.57w	0.42%
rocksdb_read	40.49us	1,924.00w	1,140.63w	13.63w	-
lmdb_write	957,662.23us	2,596.00w	0.65w	0.65w	100.00%
lmdb_read	35.66us	1,206.00w	1,127.31w	0.31w	-
single_file_write	1,957.77us	4,900.00w	38.59w	38.59w	0.20%
single_file_read	70.29us	1,262.00w	9,321.10w	0.10w	-
multi_file_write	501.43us	269.00w	36,014.88w	12.88w	0.05%
multi_file_read	55.97us	1,258.00w	2,254.03w	-	-

Before vs After:

⏱️ Time: 501.43us → 839.77us (1.67x slower) ⚠️
💾 Memory (mWd): 269w → 274w (essentially unchanged)
🔄 Major heap (mjWd): 36,014.88w → 6.49w
⚡ Major heap reduction: 99.98% reduction! 🔥

🏆 Write Performance Rankings (8.8 KB values)

🥇 multi_file_write (original): ~501us - fastest
🥈 multi_file_write (optimized): ~840us - better GC behavior
🥉 single_file_write: ~1,958us
⚠️ rocksdb_write: ~4,034us
🔴 lmdb_write: ~957,662us - still very slow

📖 Read Performance Rankings

🥇 lmdb_read: ~36us - fastest
🥈 rocksdb_read: ~40us
✅ multi_file_read: ~56us
✅ single_file_read: ~70us

💡 Key Observations

Compared to 128 KB value test:

📉 All operations are significantly faster with smaller values (8.8 KB vs 128 KB)
🔄 Trade-off emerged: Optimization reduced mjWd by 99.98% but slowed writes by 1.67x
🎯 RocksDB becomes competitive at smaller value sizes (~4ms vs 706ms previously)
⚡ LMDB still struggles with writes but dominates reads

Optimization trade-off: The optimized version trades some speed for much better GC behavior. Depending on workload (GC pressure vs raw throughput), either version could be preferable.

glyh · 2025-11-26T05:14:59Z

src/test/db_benchmark/common.ml

+  Sys.getenv "WARMUP_BLOCKS" |> Option.value_map ~default:800 ~f:int_of_string
+
+(* Fixed seed for reproducibility *)
+let random_seed = 42


Would be nice to make this random if not user provided. And the test prints out the seed it's using every time.

glyh · 2025-11-26T05:18:29Z

src/test/db_benchmark/common.ml

+let cached_value = lazy (generate_value ())
+
+(* Get the cached value *)
+let get_value () = Lazy.force cached_value


We're using a same value for all blocks. I wonder if these backend would have some optimizations that make the performance better, it's better to use distinct values for distinct keys.

glyh · 2025-11-26T05:22:10Z

src/test/db_benchmark/common.ml

+  min_key + Random.State.int random_state (max_key - min_key + 1)
+
+(* Database interface that all implementations must satisfy *)
+module type Database = sig


It might be worth replace this implementation that stores string with implementation storing bytes to avoid any kind of wrappings that'll be done at bindings, end, so that we're not wasting time to do serialization/deserialization.

glyh · 2025-11-26T05:25:00Z

src/test/db_benchmark/lmdb_impl.ml

+    let start_key = block_num * Common.keys_per_block in
+    List.iteri values ~f:(fun i value ->
+        let key = start_key + i in
+        Rw.set ~env:t.env t.db key value )


We're wasting time to commit on each key in the block. Is this intended?

glyh · 2025-11-26T05:28:29Z

src/test/db_benchmark/multi_file_impl.ml

+    let path = block_path t block_num in
+    (* Write all values directly to file without in-memory concatenation *)
+    Out_channel.with_file path ~binary:true ~f:(fun oc ->
+        List.iter values ~f:(Out_channel.output_string oc) )


This is not fair competition. Files are written to a same handler in a same channel without close/open.

But say with LMDB, we're creating a txn per write.

glyh · 2025-11-26T05:29:19Z

src/test/db_benchmark/rocksdb_impl.ml

+  let set_block db ~block_num values =
+    let start_key = block_num * Common.keys_per_block in
+    List.iteri values ~f:(fun i value ->
+        let key = start_key + i in


This is also not batch operation.

glyh

Please consider use batch operations on DB to the very least

georgeee added 3 commits November 15, 2025 10:09

Implement db benchmark

43138f9

Make benchmark configurable

bfca927

Update README

f91aa38

georgeee added the oom label Nov 15, 2025

georgeee added 3 commits November 15, 2025 10:39

Remove uninformative comment

1ab4b58

Optimize multi-file implementation

b3e7eac

Allow to run only specific tests

41c07dc

georgeee mentioned this pull request Nov 17, 2025

Implement multi-key file storage #18127

Merged

8 tasks

glyh reviewed Nov 26, 2025

View reviewed changes

glyh requested changes Nov 26, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement DB benchmark #18117

Implement DB benchmark #18117

Uh oh!

georgeee commented Nov 15, 2025

Uh oh!

georgeee commented Nov 15, 2025 •

edited

Loading

Uh oh!

georgeee commented Nov 15, 2025

Uh oh!

glyh Nov 26, 2025

Uh oh!

glyh Nov 26, 2025

Uh oh!

glyh Nov 26, 2025

Uh oh!

glyh Nov 26, 2025

Uh oh!

glyh Nov 26, 2025

Uh oh!

glyh Nov 26, 2025

Uh oh!

glyh left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Implement DB benchmark #18117

Are you sure you want to change the base?

Implement DB benchmark #18117

Uh oh!

Conversation

georgeee commented Nov 15, 2025

Uh oh!

georgeee commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Result of the run with default parameters

📊 Benchmark Analysis: Write vs Read Performance

✍️ Write Performance Comparison

📖 Read Performance Comparison

🎯 Key Takeaways

Update after optimization of multi-file writing

📊 multi_file_write Benchmark Results

📈 Before vs After Comparison

🏆 Updated Write Performance Rankings

💡 What Changed?

Uh oh!

georgeee commented Nov 15, 2025

🚀 Smaller Values, Different Story

📊 Full Benchmark Results (New Parameters)

🏆 Write Performance Rankings (8.8 KB values)

📖 Read Performance Rankings

💡 Key Observations

Uh oh!

glyh Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

glyh Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

glyh Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

glyh Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

glyh Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

glyh Nov 26, 2025

Choose a reason for hiding this comment

Uh oh!

glyh left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

georgeee commented Nov 15, 2025 •

edited

Loading