Comprehensive benchmarking suite for measuring Ruvector performance across different operations and configurations.
Professional-grade performance testing tools for validating sub-millisecond vector search, HNSW optimization, quantization efficiency, and cross-system comparisons. Built for developers who demand data-driven insights.
The ruvector-bench crate provides a complete benchmarking infrastructure to measure and analyze Ruvector's performance characteristics. It includes standardized test suites compatible with ann-benchmarks.com, comprehensive latency profiling, memory usage analysis, and cross-system performance comparison tools.
- ⚡ ANN-Benchmarks Compatible: Standard datasets (SIFT1M, GIST1M, Deep1M) and metrics
- 📊 Latency Profiling: High-precision measurement of p50, p95, p99, p99.9 percentiles
- 💾 Memory Analysis: Track memory usage with quantization and optimization techniques
- 🔬 AgenticDB Workloads: Simulate real-world AI agent memory patterns
- 🏆 Cross-System Comparison: Compare against Python baselines and other vector databases
- 📈 Comprehensive Reporting: JSON, CSV, and Markdown output formats
- 🔥 Performance Profiling: CPU flamegraphs and memory profiling support
Add to your Cargo.toml:
[dev-dependencies]
ruvector-bench = { path = "../ruvector-bench" }
# Optional: Enable profiling features
ruvector-bench = { path = "../ruvector-bench", features = ["profiling"] }
# Optional: Enable HDF5 dataset loading
ruvector-bench = { path = "../ruvector-bench", features = ["hdf5-datasets"] }The suite includes 6 specialized benchmark binaries:
| Benchmark | Purpose | Metrics |
|---|---|---|
| ann-benchmark | ANN-Benchmarks compatibility | QPS, latency, recall@k, memory |
| agenticdb-benchmark | AI agent memory workloads | Insert/search/update latency, memory |
| latency-benchmark | Detailed latency profiling | p50/p95/p99/p99.9 latencies |
| memory-benchmark | Memory usage analysis | Memory per vector, quantization savings |
| comparison-benchmark | Cross-system performance | Ruvector vs baselines (10-100x faster) |
| profiling-benchmark | CPU/memory profiling | Flamegraphs, allocation tracking |
# Run ANN-Benchmarks suite with default settings
cargo run --bin ann-benchmark --release
# Run with custom parameters
cargo run --bin ann-benchmark --release -- \
--num-vectors 100000 \
--dimensions 384 \
--ef-search-values 50,100,200 \
--output bench_results
# Run latency profiling
cargo run --bin latency-benchmark --release
# Run AgenticDB workload simulation
cargo run --bin agenticdb-benchmark --release
# Run cross-system comparison
cargo run --bin comparison-benchmark --release# Build with profiling enabled
cargo build --bin profiling-benchmark --release --features profiling
# Run and generate flamegraph
cargo run --bin profiling-benchmark --release --features profiling -- \
--enable-flamegraph \
--output profiling_resultsStandard benchmarking compatible with ann-benchmarks.com methodology.
Supported Datasets:
- SIFT1M: 1M vectors, 128 dimensions (image descriptors)
- GIST1M: 1M vectors, 960 dimensions (scene recognition)
- Deep1M: 1M vectors, 96 dimensions (deep learning embeddings)
- Synthetic: Configurable size and distribution
Usage:
# Test with synthetic data (default)
cargo run --bin ann-benchmark --release -- \
--dataset synthetic \
--num-vectors 100000 \
--dimensions 384 \
--k 10
# Test with SIFT1M (requires dataset download)
cargo run --bin ann-benchmark --release -- \
--dataset sift1m \
--ef-search-values 50,100,200,400Measured Metrics:
- Queries per second (QPS)
- Latency percentiles (p50, p95, p99, p99.9)
- Recall@1, Recall@10, Recall@100
- Memory usage (MB)
- Build/index time
Example Output:
╔════════════════════════════════════════╗
║ Ruvector ANN-Benchmarks Suite ║
╚════════════════════════════════════════╝
✓ Dataset loaded: 100000 vectors, 1000 queries
============================================================
Testing with ef_search = 100
============================================================
┌───────────┬──────┬──────────┬──────────┬───────────┬─────────────┐
│ ef_search │ QPS │ p50 (ms) │ p99 (ms) │ Recall@10 │ Memory (MB) │
├───────────┼──────┼──────────┼──────────┼───────────┼─────────────┤
│ 100 │ 5243 │ 0.19 │ 0.45 │ 95.23% │ 246.8 │
└───────────┴──────┴──────────┴──────────┴───────────┴─────────────┘
Simulates real-world AI agent memory patterns with mixed read/write workloads.
Workload Types:
- Conversational AI: High read ratio (70/30 read/write)
- Learning Agents: Balanced read/write (50/50)
- Batch Processing: Write-heavy (30/70 read/write)
Usage:
cargo run --bin agenticdb-benchmark --release -- \
--workload conversational \
--num-vectors 50000 \
--num-operations 10000Measured Operations:
- Insert latency
- Search latency
- Update latency
- Batch operation throughput
- Memory efficiency
Detailed latency analysis across different configurations and concurrency levels.
Test Scenarios:
- Single-threaded vs multi-threaded search
- Effect of
ef_searchparameter on latency - Effect of quantization on latency/recall tradeoff
- Concurrent query handling
Usage:
# Test with different thread counts
cargo run --bin latency-benchmark --release -- \
--threads 1,4,8,16 \
--num-vectors 50000 \
--queries 1000Example Output:
Test 1: Single-threaded Latency
- p50: 0.42ms
- p95: 1.23ms
- p99: 2.15ms
- p99.9: 4.87ms
Test 2: Multi-threaded Latency (8 threads)
- p50: 0.38ms
- p95: 1.05ms
- p99: 1.89ms
- p99.9: 3.92ms
Analyzes memory usage with different quantization strategies.
Quantization Tests:
- None: Full precision (baseline)
- Scalar: 4x compression
- Binary: 32x compression
Usage:
cargo run --bin memory-benchmark --release -- \
--num-vectors 100000 \
--dimensions 384Measured Metrics:
- Memory per vector (bytes)
- Compression ratio
- Memory overhead
- Quantization impact on recall
Example Results:
┌──────────────┬─────────────┬───────────────┬────────────┐
│ Quantization │ Memory (MB) │ Bytes/Vector │ Recall@10 │
├──────────────┼─────────────┼───────────────┼────────────┤
│ None │ 147.5 │ 1536 │ 100.00% │
│ Scalar │ 38.2 │ 398 │ 95.80% │
│ Binary │ 4.7 │ 49 │ 87.20% │
└──────────────┴─────────────┴───────────────┴────────────┘
✓ Scalar quantization: 4.0x memory reduction, 4.2% recall loss
✓ Binary quantization: 31.4x memory reduction, 12.8% recall loss
Compare Ruvector against other implementations and baselines.
Comparison Targets:
- Ruvector (optimized: SIMD + Quantization + HNSW)
- Ruvector (no quantization)
- Simulated Python baseline (numpy)
- Simulated brute-force search
Usage:
cargo run --bin comparison-benchmark --release -- \
--num-vectors 50000 \
--dimensions 384Example Results:
┌──────────────────────────┬──────┬──────────┬─────────────┬────────────┐
│ System │ QPS │ p50 (ms) │ Memory (MB) │ Speedup │
├──────────────────────────┼──────┼──────────┼─────────────┼────────────┤
│ Ruvector (optimized) │ 5243 │ 0.19 │ 38.2 │ 1.0x │
│ Ruvector (no quant) │ 4891 │ 0.20 │ 147.5 │ 0.93x │
│ Python baseline │ 89 │ 11.2 │ 153.6 │ 58.9x │
│ Brute-force │ 12 │ 83.3 │ 147.5 │ 437x │
└──────────────────────────┴──────┴──────────┴─────────────┴────────────┘
✓ Ruvector is 58.9x faster than Python baseline
✓ Ruvector uses 74.1% less memory with quantization
CPU and memory profiling with flamegraph generation (requires profiling feature).
Usage:
# Build with profiling support
cargo build --bin profiling-benchmark --release --features profiling
# Run with flamegraph generation
cargo run --bin profiling-benchmark --release --features profiling -- \
--enable-flamegraph \
--num-vectors 50000 \
--output profiling_results
# View flamegraph
open profiling_results/flamegraph.svgGenerated Artifacts:
- CPU flamegraph (SVG)
- Memory allocation profile
- Hotspot analysis
- Function-level timing breakdown
| Percentile | Meaning | Target |
|---|---|---|
| p50 | Median latency - typical query performance | <0.5ms |
| p95 | 95% of queries complete within this time | <1.5ms |
| p99 | 99% of queries complete within this time | <3.0ms |
| p99.9 | 99.9% of queries (tail latency) | <5.0ms |
- Recall@k: Fraction of true nearest neighbors found in top-k results
- Target Recall@10: ≥95% for most applications
- Trade-off: Higher
ef_search→ better recall, higher latency
Memory per vector = Total Memory / Number of Vectors
Typical values:
- No quantization: ~1536 bytes (384D float32)
- Scalar quantization: ~400 bytes (4x compression)
- Binary quantization: ~50 bytes (32x compression)
--num-vectors <N> # Number of vectors to index (default: 50000)
--dimensions <D> # Vector dimensions (default: 384)
--output <PATH> # Output directory for results (default: bench_results)--dataset <NAME> # Dataset: sift1m, gist1m, deep1m, synthetic
--num-queries <N> # Number of search queries (default: 1000)
--k <K> # Number of nearest neighbors to retrieve (default: 10)
--m <M> # HNSW M parameter (default: 32)
--ef-construction <EF> # HNSW build parameter (default: 200)
--ef-search-values <EF> # Comma-separated ef_search values to test (default: 50,100,200,400)
--metric <METRIC> # Distance metric: cosine, euclidean, dot (default: cosine)
--quantization <TYPE> # Quantization: none, scalar, binary (default: scalar)--threads <THREADS> # Comma-separated thread counts (default: 1,4,8,16)--workload <TYPE> # Workload type: conversational, learning, batch
--num-operations <N> # Number of operations to perform (default: 10000)--enable-flamegraph # Generate CPU flamegraph (requires profiling feature)
--enable-memory-profile # Enable detailed memory profilingCreate your own benchmarks using the ruvector-bench library:
use ruvector_bench::{
BenchmarkResult, DatasetGenerator, LatencyStats,
MemoryProfiler, ResultWriter, VectorDistribution,
};
use ruvector_core::{VectorDB, DbOptions, SearchQuery, VectorEntry};
use std::time::Instant;
fn my_custom_benchmark() -> anyhow::Result<()> {
// Generate test data
let gen = DatasetGenerator::new(384, VectorDistribution::Normal {
mean: 0.0,
std_dev: 1.0,
});
let vectors = gen.generate(10000);
let queries = gen.generate(100);
// Create database
let db = VectorDB::new(DbOptions::default())?;
// Measure indexing
let mem_profiler = MemoryProfiler::new();
let build_start = Instant::now();
for (idx, vector) in vectors.iter().enumerate() {
db.insert(VectorEntry {
id: Some(idx.to_string()),
vector: vector.clone(),
metadata: None,
})?;
}
let build_time = build_start.elapsed();
// Measure search performance
let mut latency_stats = LatencyStats::new()?;
for query in &queries {
let start = Instant::now();
db.search(SearchQuery {
vector: query.clone(),
k: 10,
filter: None,
ef_search: None,
})?;
latency_stats.record(start.elapsed())?;
}
// Print results
println!("Build time: {:.2}s", build_time.as_secs_f64());
println!("p50 latency: {:.2}ms", latency_stats.percentile(0.50).as_secs_f64() * 1000.0);
println!("p99 latency: {:.2}ms", latency_stats.percentile(0.99).as_secs_f64() * 1000.0);
println!("Memory usage: {:.2}MB", mem_profiler.current_usage_mb());
Ok(())
}name: Benchmarks
on:
push:
branches: [main]
pull_request:
branches: [main]
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install Rust
uses: actions-rs/toolchain@v1
with:
toolchain: stable
profile: minimal
- name: Run benchmarks
run: |
cd crates/ruvector-bench
cargo run --bin ann-benchmark --release -- --output ci_results
cargo run --bin latency-benchmark --release -- --output ci_results
- name: Upload results
uses: actions/upload-artifact@v3
with:
name: benchmark-results
path: crates/ruvector-bench/ci_results/
- name: Check performance regression
run: |
python scripts/check_regression.py ci_results/ann_benchmark.jsonTrack performance over time using historical benchmark data:
# Run baseline benchmarks (on main branch)
git checkout main
cargo run --bin ann-benchmark --release -- --output baseline_results
# Run comparison benchmarks (on feature branch)
git checkout feature-branch
cargo run --bin ann-benchmark --release -- --output feature_results
# Compare results
python scripts/compare_benchmarks.py \
baseline_results/ann_benchmark.json \
feature_results/ann_benchmark.jsonRegression Thresholds:
- ✅ Pass: <5% latency regression, <10% memory regression
⚠️ Warning: 5-10% latency regression, 10-20% memory regression- ❌ Fail: >10% latency regression, >20% memory regression
Benchmark results are automatically saved in multiple formats:
{
"name": "ruvector-ef100",
"dataset": "synthetic",
"dimensions": 384,
"num_vectors": 100000,
"qps": 5243.2,
"latency_p50": 0.19,
"latency_p99": 2.15,
"recall_at_10": 0.9523,
"memory_mb": 38.2
}name,dataset,dimensions,num_vectors,qps,p50,p99,recall@10,memory_mb
ruvector-ef100,synthetic,384,100000,5243.2,0.19,2.15,0.9523,38.2Results include automatically generated markdown reports with detailed performance analysis.
Generate performance charts using the provided data:
import pandas as pd
import matplotlib.pyplot as plt
# Load benchmark results
df = pd.read_csv('bench_results/ann_benchmark.csv')
# Plot QPS vs Recall tradeoff
plt.figure(figsize=(10, 6))
plt.scatter(df['recall@10'] * 100, df['qps'])
plt.xlabel('Recall@10 (%)')
plt.ylabel('Queries per Second')
plt.title('Ruvector Performance: QPS vs Recall')
plt.grid(True)
plt.savefig('qps_vs_recall.png')- Latest Benchmark Results
- Performance Optimization Guide
- Implementation Summary
- ANN-Benchmarks.com - Standard vector search benchmarks
-
Optimize for Latency (sub-millisecond queries):
HnswConfig { m: 16, // Lower M = faster search, less recall ef_construction: 100, ef_search: 50, // Lower ef_search = faster, less recall max_elements: 100000, }
-
Optimize for Recall (95%+ accuracy):
HnswConfig { m: 64, // Higher M = better recall ef_construction: 400, ef_search: 200, // Higher ef_search = better recall max_elements: 100000, }
-
Optimize for Memory (minimal footprint):
DbOptions { quantization: Some(QuantizationConfig::Binary), // 32x compression ..Default::default() }
| Use Case | M | ef_construction | ef_search | Quantization | Expected Performance |
|---|---|---|---|---|---|
| Low-Latency Search | 16 | 100 | 50 | Scalar | <0.5ms p50, 90%+ recall |
| Balanced | 32 | 200 | 100 | Scalar | <1ms p50, 95%+ recall |
| High Accuracy | 64 | 400 | 200 | None | <2ms p50, 98%+ recall |
| Memory Constrained | 16 | 100 | 50 | Binary | <1ms p50, 85%+ recall, 32x compression |
# Run unit tests
cargo test -p ruvector-bench
# Run specific benchmark
cargo test -p ruvector-bench --test latency_stats_test# Generate API documentation
cargo doc -p ruvector-bench --open-
Create a new binary in
src/bin/:touch src/bin/my_benchmark.rs
-
Add to
Cargo.toml:[[bin]] name = "my-benchmark" path = "src/bin/my_benchmark.rs"
-
Implement using
ruvector-benchutilities:use ruvector_bench::{LatencyStats, ResultWriter};
BenchmarkResult: Comprehensive benchmark result structureLatencyStats: HDR histogram-based latency measurementDatasetGenerator: Synthetic vector data generationMemoryProfiler: Memory usage trackingResultWriter: Multi-format result output (JSON, CSV, Markdown)
calculate_recall(): Compute recall@k metriccreate_progress_bar(): Terminal progress indicationVectorDistribution: Uniform, Normal, or Clustered vector generation
See full API documentation for details.
We welcome contributions to improve the benchmarking suite!
- 📊 Additional benchmark scenarios (concurrent writes, updates, deletes)
- 🔌 Integration with other vector databases (Pinecone, Qdrant, Milvus)
- 📈 Enhanced visualization and reporting
- 🎯 Real-world dataset support (SIFT, GIST, Deep1M loaders)
- 🚀 Performance optimization insights
See Contributing Guidelines for details.
This crate is part of the Ruvector project and is licensed under the MIT License.
Part of Ruvector - Next-generation vector database built in Rust
Built by rUv • GitHub • Documentation