Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
121 changes: 121 additions & 0 deletions src/test/db_benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# Database Benchmark Suite

A comprehensive benchmark suite comparing four different database/storage implementations for key-value storage with fixed-size values.

## Benchmark description

Benchmark runs two test scenarios per implementation:

1. Write benchmark: Delete oldest block + Insert new block (steady state)
2. Read benchmark: Repeated read from random keys

Write benchmark measures pure write performance, without any read operations.
Read benchmark measures pure read performance, without any write operations.

## Implementations Tested

1. **RocksDB** - LSM tree-based embedded database
2. **LMDB** - Memory-mapped B+ tree database
3. **Single-file** - One file per key (filesystem-based)
4. **Multi-file** - One file per block, 125 keys per file

## Default Test Configuration

- **Keys per block**: 125
- **Value size**: 128 KB (131,072 bytes)
- **Warmup phase**: 800 blocks (100,000 keys, ~12.5 GB data)
- **Write benchmark**: Delete oldest block + Insert new block (steady state)
- **Read benchmark**: Repeated read from random keys

## Usage

### Manual Build

```bash
# Build only
$HOME/work/shell dune build src/test/db_benchmark/db_benchmark.exe

# Run with custom options
./_build/default/test/db_benchmark/db_benchmark.exe -ascii -quota 30s
```

### Core_bench Options

The benchmark uses Core_bench which supports various options:

- `-ascii`: Plain text output
- `-quota <time>`: How long to run each benchmark (e.g., `10s`, `1m`)
- `-v`: Verbose output
- `-help`: Show all available options

## Output

### Report File

Output is printed to stdout in plain text format. It contains:

1. **System Information**: CPU, memory, OS details
2. **Test Configuration**: Parameters and setup details
3. **Benchmark Results**: Timing and allocation data
4. **Interpretation Guide**: How to read the results

### Metrics Explained

- **Time/Run**: Average time per operation (lower is better)
- **mWd/Run**: Minor words allocated (GC pressure)
- **mjWd/Run**: Major words allocated
- **Prom/Run**: Promoted words
- **Percentage**: Relative performance vs baseline

## Implementation Details

### RocksDB (`rocksdb_impl.ml`)
- Uses `Rocksdb.Serializable.Make` with integer keys and string values
- LSM tree architecture, optimized for write-heavy workloads
- Automatic background compaction
- Batch writes: Iterates through key-value pairs in a block

### LMDB (`lmdb_impl.ml`)
- Uses `Lmdb_storage.Generic.Read_write` functor
- Memory-mapped files with B+ tree structure
- Initial map size: 256 MB (grows automatically as needed)
- Good for read-heavy workloads
- Batch writes: Iterates through key-value pairs in a block

### Single-file (`single_file_impl.ml`)
- Each key stored in separate file: `<key_id>.val`
- Simple filesystem operations
- High file descriptor usage
- Best for: Small datasets, simple requirements
- Batch writes: Creates one file per key in the block

### Multi-file (`multi_file_impl.ml`)
- One file per block: `<block_id>.block`
- Each file contains 125 keys (128KB × 125 = 16MB per file)
- **Efficient batch writes**: Concatenates all values in memory, writes once with `write_all`
- Reduces file count from 100,000 to 800
- Single write operation per block (no seeking needed)

## Customization

To modify test parameters, provide environment variables:

- `KEYS_PER_BLOCK`: Number of keys per block
- `VALUE_SIZE`: Size of each value in bytes
- `WARMUP_BLOCKS`: Number of blocks to warmup with
- `BENCHMARKS`: Comma-separated list of benchmarks to run

## Troubleshooting

### Build Failures

```bash
# Ensure you're using the special shell
$HOME/work/shell dune clean
$HOME/work/shell dune build src/test/db_benchmark
```

### Out of Disk Space

The benchmark writes ~14 GB per run, with two runs per implementation (up to 120 GB total).
Ensure there is enough disk space before running the benchmark.
117 changes: 117 additions & 0 deletions src/test/db_benchmark/common.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
open Core

(* Configuration constants - configurable via environment variables *)
let keys_per_block =
Sys.getenv "KEYS_PER_BLOCK" |> Option.value_map ~default:125 ~f:int_of_string

let value_size =
Sys.getenv "VALUE_SIZE"
|> Option.value_map ~default:(128 * 1024) ~f:int_of_string
(* 128 KB *)

let warmup_blocks =
Sys.getenv "WARMUP_BLOCKS" |> Option.value_map ~default:800 ~f:int_of_string

(* Fixed seed for reproducibility *)
let random_seed = 42
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to make this random if not user provided. And the test prints out the seed it's using every time.


(* Get key ID from block number and offset within block *)
let key_of_block_offset block_num offset = (block_num * keys_per_block) + offset

(* Get all key IDs for a given block *)
let keys_of_block block_num =
List.init keys_per_block ~f:(fun offset ->
key_of_block_offset block_num offset )

(* Generate random data for a value *)
let generate_value () =
let random_state = Random.State.make [| random_seed |] in
String.init value_size ~f:(fun _ ->
Char.of_int_exn (Random.State.int random_state 256) )

(* Cache a single value to avoid regenerating it every time *)
let cached_value = lazy (generate_value ())

(* Get the cached value *)
let get_value () = Lazy.force cached_value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're using a same value for all blocks. I wonder if these backend would have some optimizations that make the performance better, it's better to use distinct values for distinct keys.


(* Generate random key for read test *)
let random_key_in_range ~min_key ~max_key =
let random_state = Random.State.make [| random_seed |] in
min_key + Random.State.int random_state (max_key - min_key + 1)

(* Database interface that all implementations must satisfy *)
module type Database = sig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth replace this implementation that stores string with implementation storing bytes to avoid any kind of wrappings that'll be done at bindings, end, so that we're not wasting time to do serialization/deserialization.

type t

(* Initialize the database at the given path *)
val create : string -> t

(* Close/cleanup the database *)
val close : t -> unit

(* Write a block of values (sequential keys starting at block_num * keys_per_block) *)
val set_block : t -> block_num:int -> string list -> unit

(* Read a single key *)
val get : t -> key:int -> string option

(* Delete a block *)
val remove_block : t -> block_num:int -> unit

(* Get implementation name for reporting *)
val name : string
end

(* Operations for benchmarking *)
module Ops = struct
(* Write a full block of keys *)
let write_block (type db) (module Db : Database with type t = db) (db : db)
block_num =
let value = get_value () in
(* Create list of identical values for all keys in the block *)
let values = List.init keys_per_block ~f:(fun _ -> value) in
Db.set_block db ~block_num values

(* Delete a full block of keys *)
let delete_block (type db) (module Db : Database with type t = db) (db : db)
block_num =
Db.remove_block db ~block_num

(* Read a single key *)
let read_key (type db) (module Db : Database with type t = db) (db : db) key =
ignore (Db.get db ~key : string option)

(* Warmup: write initial blocks *)
let warmup (type db) (module Db : Database with type t = db) (db : db) =
for block_num = 0 to warmup_blocks - 1 do
write_block (module Db) db block_num
done

(* Steady state operation: remove oldest block and add new one *)
let steady_state_op (type db) (module Db : Database with type t = db)
(db : db) ~oldest_block ~new_block =
delete_block (module Db) db oldest_block ;
write_block (module Db) db new_block

(* Random read operation *)
let random_read (type db) (module Db : Database with type t = db) (db : db)
~min_key ~max_key =
let key = random_key_in_range ~min_key ~max_key in
read_key (module Db) db key
end

(* Temporary directory management *)
let make_temp_dir prefix =
let pid = Unix.getpid () |> Pid.to_int in
let dir_name = Printf.sprintf "%s_%d" prefix pid in
Unix.mkdir_p dir_name ; dir_name

let cleanup_temp_dir dir =
match Sys.file_exists dir with
| `Yes ->
ignore
( Core_unix.system (Printf.sprintf "rm -rf %s" (Filename.quote dir))
: Core_unix.Exit_or_signal.t )
| _ ->
()
88 changes: 88 additions & 0 deletions src/test/db_benchmark/db_benchmark.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
open Core
open Core_bench

(* Instantiate database implementations *)
module Rocksdb_db = Rocksdb_impl.Make ()

module Lmdb_db = Lmdb_impl.Make ()

module Single_file_db = Single_file_impl.Make ()

module Multi_file_db = Multi_file_impl.Make ()

let init_db (type db) (module Db : Common.Database with type t = db) name =
(* Initialization: create DB and warmup *)
let dir = Common.make_temp_dir (Printf.sprintf "db_bench_%s" name) in
let db = Db.create dir in
Common.Ops.warmup (module Db) db ;
eprintf "Warmup complete for %s\n" name ;
db

(* Write benchmark - warmup happens once, then test runs repeatedly *)
let make_write_bench (type db) (module Db : Common.Database with type t = db)
(db : db) =
let oldest_block_ref = ref 0 in
fun () ->
let oldest_block = !oldest_block_ref in
let new_block = oldest_block + Common.warmup_blocks in
Common.Ops.steady_state_op (module Db) db ~oldest_block ~new_block ;
incr oldest_block_ref

(* Read benchmark - warmup happens once, then test runs repeatedly *)
let make_read_bench (type db) (module Db : Common.Database with type t = db)
(db : db) =
let min_key = 0 in
let max_key = (Common.warmup_blocks * Common.keys_per_block) - 1 in
fun () -> Common.Ops.random_read (module Db) db ~min_key ~max_key

let test ~name (type db) (module Db : Common.Database with type t = db)
(f : (module Common.Database with type t = db) -> db -> unit -> unit) =
Bench.Test.create_with_initialization ~name (fun `init ->
init_db (module Db) name |> f (module Db) )

(* Create all benchmarks *)
let all_benchmarks () =
[ test ~name:"rocksdb_write" (module Rocksdb_db) make_write_bench
; test ~name:"rocksdb_read" (module Rocksdb_db) make_read_bench
; test ~name:"lmdb_write" (module Lmdb_db) make_write_bench
; test ~name:"lmdb_read" (module Lmdb_db) make_read_bench
; test ~name:"single_file_write" (module Single_file_db) make_write_bench
; test ~name:"single_file_read" (module Single_file_db) make_read_bench
; test ~name:"multi_file_write" (module Multi_file_db) make_write_bench
; test ~name:"multi_file_read" (module Multi_file_db) make_read_bench
]

(* Filter benchmarks based on BENCHMARKS environment variable *)
let filter_benchmarks benchmarks =
match Sys.getenv "BENCHMARKS" with
| None ->
benchmarks
| Some names_str ->
let requested_names =
String.split names_str ~on:','
|> List.map ~f:String.strip |> String.Set.of_list
in
let filtered =
List.filter benchmarks ~f:(fun bench ->
String.Set.mem requested_names (Bench.Test.name bench) )
in
Printf.printf "Filtering benchmarks: running %d of %d\n"
(List.length filtered) (List.length benchmarks) ;
Printf.printf " Requested: %s\n" names_str ;
filtered

(* Main entry point *)
let () =
(* Print configuration *)
Printf.printf "Database Benchmark Configuration:\n" ;
Printf.printf " Keys per block: %d\n" Common.keys_per_block ;
Printf.printf " Value size: %d bytes (%.1f KB)\n" Common.value_size
(Float.of_int Common.value_size /. 1024.) ;
Printf.printf " Warmup blocks: %d\n" Common.warmup_blocks ;
Printf.printf " Warmup keys: %d\n"
(Common.warmup_blocks * Common.keys_per_block) ;
Printf.printf "\n" ;

(* Run benchmarks *)
let benchmarks = all_benchmarks () |> filter_benchmarks in
Command.run (Bench.make_command benchmarks)
13 changes: 13 additions & 0 deletions src/test/db_benchmark/dune
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
(executable
(name db_benchmark)
(libraries
core
core_kernel
core_unix
core_bench
rocksdb
lmdb_storage
ppx_inline_test.runtime-lib)
(preprocess
(pps ppx_version ppx_jane ppx_deriving.std))
(modes native))
42 changes: 42 additions & 0 deletions src/test/db_benchmark/lmdb_impl.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
open Core
open Lmdb_storage.Generic

module Make () : Common.Database = struct
module F (Db : Db) = struct
type holder = (int, string) Db.t

let mk_maps { Db.create } =
create Lmdb_storage.Conv.uint32_be Lmdb.Conv.string

(* Start with 256 MB, LMDB will grow automatically as needed *)
let config = { default_config with initial_mmap_size = 256 lsl 20 }
end

module Rw = Read_write (F)

type t = { env : Rw.t; db : Rw.holder }

let name = "lmdb"

let create directory =
Unix.mkdir_p directory ;
let env, db = Rw.create directory in
{ env; db }

let close t = Rw.close t.env

let set_block t ~block_num values =
let start_key = block_num * Common.keys_per_block in
List.iteri values ~f:(fun i value ->
let key = start_key + i in
Rw.set ~env:t.env t.db key value )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're wasting time to commit on each key in the block. Is this intended?


let get t ~key = Rw.get ~env:t.env t.db key

let remove_block t ~block_num =
let start_key = block_num * Common.keys_per_block in
for i = 0 to Common.keys_per_block - 1 do
let key = start_key + i in
Rw.remove ~env:t.env t.db key
done
end
Loading