Benchmarking Guide

This guide explains how to run, interpret, and contribute benchmarks for Ruvector.

Running Benchmarks
Benchmark Suite
Interpreting Results
Performance Targets
Comparison Methodology
Contributing Benchmarks

Running Benchmarks

Quick Start

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench distance_metrics
cargo bench hnsw_search
cargo bench batch_operations

# With flamegraph profiling
cargo flamegraph --bench hnsw_search

# With criterion reports
cargo bench -- --save-baseline main
git checkout feature-branch
cargo bench -- --baseline main

Benchmark Crates

# Core benchmarks
cd crates/ruvector-bench
cargo bench

# Comparison benchmarks
cargo run --release --bin comparison_benchmark

# Memory benchmarks
cargo run --release --bin memory_benchmark

# Latency benchmarks
cargo run --release --bin latency_benchmark

SIMD Optimization

Enable SIMD for maximum performance:

RUSTFLAGS="-C target-cpu=native" cargo bench

# Or specific features
RUSTFLAGS="-C target-feature=+avx2,+fma" cargo bench

Benchmark Suite

1. Distance Metrics Benchmark

File: crates/ruvector-core/benches/distance_metrics.rs

What it measures: Raw distance calculation performance

Metrics:

Euclidean (L2) distance
Cosine similarity
Dot product
Manhattan (L1) distance
SIMD vs scalar implementations

Run:

cargo bench distance_metrics

Expected results:

euclidean_128d/simd       time:   [45.234 ns 45.456 ns 45.678 ns]
euclidean_128d/scalar     time:   [312.45 ns 315.23 ns 318.91 ns]
                                  ↑ 7x slower
cosine_128d/simd          time:   [52.123 ns 52.345 ns 52.567 ns]
dotproduct_128d/simd      time:   [38.901 ns 39.123 ns 39.345 ns]

2. HNSW Search Benchmark

File: crates/ruvector-core/benches/hnsw_search.rs

What it measures: End-to-end search performance

Metrics:

Search latency (p50, p95, p99)
Queries per second (QPS)
Recall accuracy
Different dataset sizes (1K, 10K, 100K, 1M vectors)
Different ef_search values (50, 100, 200, 500)

Run:

cargo bench hnsw_search

Expected results:

search_1M_vectors_k10_ef100
                        time:   [845.23 µs 856.78 µs 868.45 µs]
                        thrpt:  [1,151 queries/s]
                        recall: [95.2%]

search_1M_vectors_k10_ef200
                        time:   [1.678 ms 1.689 ms 1.701 ms]
                        thrpt:  [587 queries/s]
                        recall: [98.7%]

3. Batch Operations Benchmark

File: crates/ruvector-core/benches/batch_operations.rs

What it measures: Throughput for bulk operations

Metrics:

Batch insert throughput
Parallel vs sequential inserts
Different batch sizes (100, 1K, 10K)

Run:

cargo bench batch_operations

Expected results:

batch_insert_1000_parallel
                        time:   [45.234 ms 46.123 ms 47.012 ms]
                        thrpt:  [21,271 vectors/s]

batch_insert_1000_sequential
                        time:   [234.56 ms 238.91 ms 243.27 ms]
                        thrpt:  [4,111 vectors/s]
                        ↑ 5x slower

4. Quantization Benchmark

File: crates/ruvector-core/benches/quantization_bench.rs

What it measures: Quantization performance and accuracy

Metrics:

Quantization time
Dequantization time
Distance calculation with quantized vectors
Recall impact

Run:

cargo bench quantization

Expected results:

scalar_quantize_128d      time:   [234.56 ns 236.78 ns 239.01 ns]
product_quantize_128d     time:   [1.234 µs 1.245 µs 1.256 µs]

search_with_scalar_quant  time:   [678.90 µs 685.12 µs 691.34 µs]
                          recall: [97.3%]

search_with_product_quant time:   [523.45 µs 528.67 µs 533.89 µs]
                          recall: [92.8%]

5. Comprehensive Benchmark

File: crates/ruvector-core/benches/comprehensive_bench.rs

What it measures: End-to-end system performance

Run:

cargo bench comprehensive

Interpreting Results

Criterion Output

test_name               time:   [lower_bound mean upper_bound]
                        thrpt:  [throughput]
                        change: [% change from baseline]

Example:

search_100K_vectors     time:   [234.56 µs 238.91 µs 243.27 µs]
                        thrpt:  [4,111 queries/s]
                        change: [-5.2% -3.8% -2.1%] (faster)

Interpretation:

Mean: 238.91 µs
95% confidence interval: [234.56 µs, 243.27 µs]
Throughput: ~4,111 queries/second
3.8% faster than baseline

Latency Percentiles

cargo run --release --bin latency_benchmark

Output:

Latency percentiles (100K queries):
  p50:  0.85 ms
  p90:  1.23 ms
  p95:  1.67 ms
  p99:  3.45 ms
  p999: 8.91 ms

Interpretation:

50% of queries complete in < 0.85ms
95% of queries complete in < 1.67ms
99% of queries complete in < 3.45ms

Memory Usage

cargo run --release --bin memory_benchmark

Output:

Memory usage (1M vectors, 128D):
  Vectors (full):        512.0 MB
  Vectors (scalar):      128.0 MB (4x compression)
  HNSW graph:           640.0 MB
  Metadata:              50.0 MB
  ──────────────────────────────
  Total:                818.0 MB

Performance Targets

Search Latency

Dataset	Target p50	Target p95	Target QPS
10K vectors	< 100 µs	< 200 µs	10,000+
100K vectors	< 500 µs	< 1 ms	2,000+
1M vectors	< 1 ms	< 2 ms	1,000+
10M vectors	< 2 ms	< 5 ms	500+

Insert Throughput

Operation	Target
Single insert	1,000+ ops/sec
Batch insert (1K)	10,000+ vectors/sec
Batch insert (10K)	50,000+ vectors/sec

Memory Efficiency

Configuration	Target Memory per Vector
Full precision	512 bytes (128D)
Scalar quant	128 bytes (4x compression)
Product quant	16-32 bytes (16-32x compression)

Recall Accuracy

Configuration	Target Recall
ef_search=50	85%+
ef_search=100	90%+
ef_search=200	95%+
ef_search=500	99%+

Comparison Methodology

Against FAISS

cargo run --release --bin comparison_benchmark -- --system faiss

Metrics compared:

Search latency (same dataset, same k)
Memory usage
Build time
Recall@10

Example output:

Benchmark: 1M vectors, 128D, k=10

                  Ruvector    FAISS       Speedup
────────────────────────────────────────────────
Build time        245s        312s        1.27x
Search (p50)      0.85ms      2.34ms      2.75x
Search (p95)      1.67ms      4.56ms      2.73x
Memory            818MB       1,245MB     1.52x
Recall@10         95.2%       95.8%       ~same

Versioned Benchmarks

Track performance over time:

# Save baseline
git checkout v0.1.0
cargo bench -- --save-baseline v0.1.0

# Compare to new version
git checkout v0.2.0
cargo bench -- --baseline v0.1.0

Contributing Benchmarks

Adding a New Benchmark

Create benchmark file:

// crates/ruvector-core/benches/my_benchmark.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use ruvector_core::*;

fn my_benchmark(c: &mut Criterion) {
    let db = setup_test_db();

    c.bench_function("my_operation", |b| {
        b.iter(|| {
            // Operation to benchmark
            db.my_operation(black_box(&input))
        })
    });
}

criterion_group!(benches, my_benchmark);
criterion_main!(benches);

[[bench]]
name = "my_benchmark"
harness = false

Run and verify:

cargo bench my_benchmark

Benchmark Best Practices

Use black_box: Prevent compiler optimizations
```
b.iter(|| db.search(black_box(&query)))
```
Measure what matters: Focus on user-facing operations
Realistic workloads: Use representative data sizes
Multiple iterations: Criterion handles this automatically
Isolate variables: Benchmark one thing at a time
Document context: Explain what's being measured
CI integration: Run benchmarks in CI to catch regressions

Profiling

# Flamegraph
cargo flamegraph --bench hnsw_search

# perf (Linux)
perf record -g cargo bench hnsw_search
perf report

# Cachegrind (memory profiling)
valgrind --tool=cachegrind cargo bench hnsw_search

CI/CD Integration

GitHub Actions

``yaml

name: Run benchmarks run: | cargo bench --bench distance_metrics -- --save-baseline main
name: Compare to baseline run: | cargo bench --bench distance_metrics -- --baseline main


### Performance Regression Detection

Fail CI if performance regresses > 5%:

```rust
// In benchmark code
let previous_mean = load_baseline("main");
let current_mean = measure_current();
let regression = (current_mean - previous_mean) / previous_mean;

assert!(regression < 0.05, "Performance regression > 5%");

Resources

Criterion.rs documentation
Rust Performance Book
Benchmarking Rust programs
ANN-Benchmarks - Standard vector search benchmarks

Questions?

Open an issue: https://github.com/ruvnet/ruvector/issues

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking Guide

Table of Contents

Running Benchmarks

Quick Start

Benchmark Crates

SIMD Optimization

Benchmark Suite

1. Distance Metrics Benchmark

2. HNSW Search Benchmark

3. Batch Operations Benchmark

4. Quantization Benchmark

5. Comprehensive Benchmark

Interpreting Results

Criterion Output

Latency Percentiles

Memory Usage

Performance Targets

Search Latency

Insert Throughput

Memory Efficiency

Recall Accuracy

Comparison Methodology

Against FAISS

Versioned Benchmarks

Contributing Benchmarks

Adding a New Benchmark

Benchmark Best Practices

Profiling

CI/CD Integration

GitHub Actions

Resources

Questions?

FilesExpand file tree

BENCHMARKING_GUIDE.md

Latest commit

History

BENCHMARKING_GUIDE.md

File metadata and controls

Benchmarking Guide

Table of Contents

Running Benchmarks

Quick Start

Benchmark Crates

SIMD Optimization

Benchmark Suite

1. Distance Metrics Benchmark

2. HNSW Search Benchmark

3. Batch Operations Benchmark

4. Quantization Benchmark

5. Comprehensive Benchmark

Interpreting Results

Criterion Output

Latency Percentiles

Memory Usage

Performance Targets

Search Latency

Insert Throughput

Memory Efficiency

Recall Accuracy

Comparison Methodology

Against FAISS

Versioned Benchmarks

Contributing Benchmarks

Adding a New Benchmark

Benchmark Best Practices

Profiling

CI/CD Integration

GitHub Actions

Resources

Questions?