This guide explains how to run, interpret, and contribute benchmarks for Ruvector.
- Running Benchmarks
- Benchmark Suite
- Interpreting Results
- Performance Targets
- Comparison Methodology
- Contributing Benchmarks
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench distance_metrics
cargo bench hnsw_search
cargo bench batch_operations
# With flamegraph profiling
cargo flamegraph --bench hnsw_search
# With criterion reports
cargo bench -- --save-baseline main
git checkout feature-branch
cargo bench -- --baseline main# Core benchmarks
cd crates/ruvector-bench
cargo bench
# Comparison benchmarks
cargo run --release --bin comparison_benchmark
# Memory benchmarks
cargo run --release --bin memory_benchmark
# Latency benchmarks
cargo run --release --bin latency_benchmarkEnable SIMD for maximum performance:
RUSTFLAGS="-C target-cpu=native" cargo bench
# Or specific features
RUSTFLAGS="-C target-feature=+avx2,+fma" cargo benchFile: crates/ruvector-core/benches/distance_metrics.rs
What it measures: Raw distance calculation performance
Metrics:
- Euclidean (L2) distance
- Cosine similarity
- Dot product
- Manhattan (L1) distance
- SIMD vs scalar implementations
Run:
cargo bench distance_metricsExpected results:
euclidean_128d/simd time: [45.234 ns 45.456 ns 45.678 ns]
euclidean_128d/scalar time: [312.45 ns 315.23 ns 318.91 ns]
↑ 7x slower
cosine_128d/simd time: [52.123 ns 52.345 ns 52.567 ns]
dotproduct_128d/simd time: [38.901 ns 39.123 ns 39.345 ns]
File: crates/ruvector-core/benches/hnsw_search.rs
What it measures: End-to-end search performance
Metrics:
- Search latency (p50, p95, p99)
- Queries per second (QPS)
- Recall accuracy
- Different dataset sizes (1K, 10K, 100K, 1M vectors)
- Different ef_search values (50, 100, 200, 500)
Run:
cargo bench hnsw_searchExpected results:
search_1M_vectors_k10_ef100
time: [845.23 µs 856.78 µs 868.45 µs]
thrpt: [1,151 queries/s]
recall: [95.2%]
search_1M_vectors_k10_ef200
time: [1.678 ms 1.689 ms 1.701 ms]
thrpt: [587 queries/s]
recall: [98.7%]
File: crates/ruvector-core/benches/batch_operations.rs
What it measures: Throughput for bulk operations
Metrics:
- Batch insert throughput
- Parallel vs sequential inserts
- Different batch sizes (100, 1K, 10K)
Run:
cargo bench batch_operationsExpected results:
batch_insert_1000_parallel
time: [45.234 ms 46.123 ms 47.012 ms]
thrpt: [21,271 vectors/s]
batch_insert_1000_sequential
time: [234.56 ms 238.91 ms 243.27 ms]
thrpt: [4,111 vectors/s]
↑ 5x slower
File: crates/ruvector-core/benches/quantization_bench.rs
What it measures: Quantization performance and accuracy
Metrics:
- Quantization time
- Dequantization time
- Distance calculation with quantized vectors
- Recall impact
Run:
cargo bench quantizationExpected results:
scalar_quantize_128d time: [234.56 ns 236.78 ns 239.01 ns]
product_quantize_128d time: [1.234 µs 1.245 µs 1.256 µs]
search_with_scalar_quant time: [678.90 µs 685.12 µs 691.34 µs]
recall: [97.3%]
search_with_product_quant time: [523.45 µs 528.67 µs 533.89 µs]
recall: [92.8%]
File: crates/ruvector-core/benches/comprehensive_bench.rs
What it measures: End-to-end system performance
Run:
cargo bench comprehensivetest_name time: [lower_bound mean upper_bound]
thrpt: [throughput]
change: [% change from baseline]
Example:
search_100K_vectors time: [234.56 µs 238.91 µs 243.27 µs]
thrpt: [4,111 queries/s]
change: [-5.2% -3.8% -2.1%] (faster)
Interpretation:
- Mean: 238.91 µs
- 95% confidence interval: [234.56 µs, 243.27 µs]
- Throughput: ~4,111 queries/second
- 3.8% faster than baseline
cargo run --release --bin latency_benchmarkOutput:
Latency percentiles (100K queries):
p50: 0.85 ms
p90: 1.23 ms
p95: 1.67 ms
p99: 3.45 ms
p999: 8.91 ms
Interpretation:
- 50% of queries complete in < 0.85ms
- 95% of queries complete in < 1.67ms
- 99% of queries complete in < 3.45ms
cargo run --release --bin memory_benchmarkOutput:
Memory usage (1M vectors, 128D):
Vectors (full): 512.0 MB
Vectors (scalar): 128.0 MB (4x compression)
HNSW graph: 640.0 MB
Metadata: 50.0 MB
──────────────────────────────
Total: 818.0 MB
| Dataset | Target p50 | Target p95 | Target QPS |
|---|---|---|---|
| 10K vectors | < 100 µs | < 200 µs | 10,000+ |
| 100K vectors | < 500 µs | < 1 ms | 2,000+ |
| 1M vectors | < 1 ms | < 2 ms | 1,000+ |
| 10M vectors | < 2 ms | < 5 ms | 500+ |
| Operation | Target |
|---|---|
| Single insert | 1,000+ ops/sec |
| Batch insert (1K) | 10,000+ vectors/sec |
| Batch insert (10K) | 50,000+ vectors/sec |
| Configuration | Target Memory per Vector |
|---|---|
| Full precision | 512 bytes (128D) |
| Scalar quant | 128 bytes (4x compression) |
| Product quant | 16-32 bytes (16-32x compression) |
| Configuration | Target Recall |
|---|---|
| ef_search=50 | 85%+ |
| ef_search=100 | 90%+ |
| ef_search=200 | 95%+ |
| ef_search=500 | 99%+ |
cargo run --release --bin comparison_benchmark -- --system faissMetrics compared:
- Search latency (same dataset, same k)
- Memory usage
- Build time
- Recall@10
Example output:
Benchmark: 1M vectors, 128D, k=10
Ruvector FAISS Speedup
────────────────────────────────────────────────
Build time 245s 312s 1.27x
Search (p50) 0.85ms 2.34ms 2.75x
Search (p95) 1.67ms 4.56ms 2.73x
Memory 818MB 1,245MB 1.52x
Recall@10 95.2% 95.8% ~same
Track performance over time:
# Save baseline
git checkout v0.1.0
cargo bench -- --save-baseline v0.1.0
# Compare to new version
git checkout v0.2.0
cargo bench -- --baseline v0.1.0- Create benchmark file:
// crates/ruvector-core/benches/my_benchmark.rs
use criterion::{black_box, criterion_group, criterion_main, Criterion};
use ruvector_core::*;
fn my_benchmark(c: &mut Criterion) {
let db = setup_test_db();
c.bench_function("my_operation", |b| {
b.iter(|| {
// Operation to benchmark
db.my_operation(black_box(&input))
})
});
}
criterion_group!(benches, my_benchmark);
criterion_main!(benches);- Register in
Cargo.toml:
[[bench]]
name = "my_benchmark"
harness = false- Run and verify:
cargo bench my_benchmark-
Use
black_box: Prevent compiler optimizationsb.iter(|| db.search(black_box(&query)))
-
Measure what matters: Focus on user-facing operations
-
Realistic workloads: Use representative data sizes
-
Multiple iterations: Criterion handles this automatically
-
Isolate variables: Benchmark one thing at a time
-
Document context: Explain what's being measured
-
CI integration: Run benchmarks in CI to catch regressions
# Flamegraph
cargo flamegraph --bench hnsw_search
# perf (Linux)
perf record -g cargo bench hnsw_search
perf report
# Cachegrind (memory profiling)
valgrind --tool=cachegrind cargo bench hnsw_search``yaml
-
name: Run benchmarks run: | cargo bench --bench distance_metrics -- --save-baseline main
-
name: Compare to baseline run: | cargo bench --bench distance_metrics -- --baseline main
### Performance Regression Detection
Fail CI if performance regresses > 5%:
```rust
// In benchmark code
let previous_mean = load_baseline("main");
let current_mean = measure_current();
let regression = (current_mean - previous_mean) / previous_mean;
assert!(regression < 0.05, "Performance regression > 5%");
- Criterion.rs documentation
- Rust Performance Book
- Benchmarking Rust programs
- ANN-Benchmarks - Standard vector search benchmarks
Open an issue: https://github.com/ruvnet/ruvector/issues