Name	Name	Last commit message	Last commit date
parent directory ..
docs	docs
scripts	scripts
src	src
tests	tests
Cargo.toml	Cargo.toml
README.md	README.md

Ruvector-Bench

Comprehensive benchmarking suite for measuring Ruvector performance across different operations and configurations.

Professional-grade performance testing tools for validating sub-millisecond vector search, HNSW optimization, quantization efficiency, and cross-system comparisons. Built for developers who demand data-driven insights.

🎯 Overview

The ruvector-bench crate provides a complete benchmarking infrastructure to measure and analyze Ruvector's performance characteristics. It includes standardized test suites compatible with ann-benchmarks.com, comprehensive latency profiling, memory usage analysis, and cross-system performance comparison tools.

Key Features

⚡ ANN-Benchmarks Compatible: Standard datasets (SIFT1M, GIST1M, Deep1M) and metrics
📊 Latency Profiling: High-precision measurement of p50, p95, p99, p99.9 percentiles
💾 Memory Analysis: Track memory usage with quantization and optimization techniques
🔬 AgenticDB Workloads: Simulate real-world AI agent memory patterns
🏆 Cross-System Comparison: Compare against Python baselines and other vector databases
📈 Comprehensive Reporting: JSON, CSV, and Markdown output formats
🔥 Performance Profiling: CPU flamegraphs and memory profiling support

📦 Installation

Add to your Cargo.toml:

[dev-dependencies]
ruvector-bench = { path = "../ruvector-bench" }

# Optional: Enable profiling features
ruvector-bench = { path = "../ruvector-bench", features = ["profiling"] }

# Optional: Enable HDF5 dataset loading
ruvector-bench = { path = "../ruvector-bench", features = ["hdf5-datasets"] }

🚀 Available Benchmarks

The suite includes 6 specialized benchmark binaries:

Benchmark	Purpose	Metrics
ann-benchmark	ANN-Benchmarks compatibility	QPS, latency, recall@k, memory
agenticdb-benchmark	AI agent memory workloads	Insert/search/update latency, memory
latency-benchmark	Detailed latency profiling	p50/p95/p99/p99.9 latencies
memory-benchmark	Memory usage analysis	Memory per vector, quantization savings
comparison-benchmark	Cross-system performance	Ruvector vs baselines (10-100x faster)
profiling-benchmark	CPU/memory profiling	Flamegraphs, allocation tracking

⚡ Quick Start

Running Basic Benchmarks

# Run ANN-Benchmarks suite with default settings
cargo run --bin ann-benchmark --release

# Run with custom parameters
cargo run --bin ann-benchmark --release -- \
  --num-vectors 100000 \
  --dimensions 384 \
  --ef-search-values 50,100,200 \
  --output bench_results

# Run latency profiling
cargo run --bin latency-benchmark --release

# Run AgenticDB workload simulation
cargo run --bin agenticdb-benchmark --release

# Run cross-system comparison
cargo run --bin comparison-benchmark --release

Running with Profiling

# Build with profiling enabled
cargo build --bin profiling-benchmark --release --features profiling

# Run and generate flamegraph
cargo run --bin profiling-benchmark --release --features profiling -- \
  --enable-flamegraph \
  --output profiling_results

📊 Benchmark Categories

1. ANN-Benchmarks Suite (`ann-benchmark`)

Standard benchmarking compatible with ann-benchmarks.com methodology.

Supported Datasets:

SIFT1M: 1M vectors, 128 dimensions (image descriptors)
GIST1M: 1M vectors, 960 dimensions (scene recognition)
Deep1M: 1M vectors, 96 dimensions (deep learning embeddings)
Synthetic: Configurable size and distribution

Usage:

# Test with synthetic data (default)
cargo run --bin ann-benchmark --release -- \
  --dataset synthetic \
  --num-vectors 100000 \
  --dimensions 384 \
  --k 10

# Test with SIFT1M (requires dataset download)
cargo run --bin ann-benchmark --release -- \
  --dataset sift1m \
  --ef-search-values 50,100,200,400

Measured Metrics:

Queries per second (QPS)
Latency percentiles (p50, p95, p99, p99.9)
Recall@1, Recall@10, Recall@100
Memory usage (MB)
Build/index time

Example Output:

╔════════════════════════════════════════╗
║   Ruvector ANN-Benchmarks Suite       ║
╚════════════════════════════════════════╝

✓ Dataset loaded: 100000 vectors, 1000 queries

============================================================
Testing with ef_search = 100
============================================================

┌───────────┬──────┬──────────┬──────────┬───────────┬─────────────┐
│ ef_search │ QPS  │ p50 (ms) │ p99 (ms) │ Recall@10 │ Memory (MB) │
├───────────┼──────┼──────────┼──────────┼───────────┼─────────────┤
│ 100       │ 5243 │ 0.19     │ 0.45     │ 95.23%    │ 246.8       │
└───────────┴──────┴──────────┴──────────┴───────────┴─────────────┘

2. AgenticDB Workload Simulation (`agenticdb-benchmark`)

Simulates real-world AI agent memory patterns with mixed read/write workloads.

Workload Types:

Conversational AI: High read ratio (70/30 read/write)
Learning Agents: Balanced read/write (50/50)
Batch Processing: Write-heavy (30/70 read/write)

Usage:

cargo run --bin agenticdb-benchmark --release -- \
  --workload conversational \
  --num-vectors 50000 \
  --num-operations 10000

Measured Operations:

Insert latency
Search latency
Update latency
Batch operation throughput
Memory efficiency

3. Latency Profiling (`latency-benchmark`)

Detailed latency analysis across different configurations and concurrency levels.

Test Scenarios:

Single-threaded vs multi-threaded search
Effect of ef_search parameter on latency
Effect of quantization on latency/recall tradeoff
Concurrent query handling

Usage:

# Test with different thread counts
cargo run --bin latency-benchmark --release -- \
  --threads 1,4,8,16 \
  --num-vectors 50000 \
  --queries 1000

Example Output:

Test 1: Single-threaded Latency
- p50: 0.42ms
- p95: 1.23ms
- p99: 2.15ms
- p99.9: 4.87ms

Test 2: Multi-threaded Latency (8 threads)
- p50: 0.38ms
- p95: 1.05ms
- p99: 1.89ms
- p99.9: 3.92ms

4. Memory Benchmarks (`memory-benchmark`)

Analyzes memory usage with different quantization strategies.

Quantization Tests:

None: Full precision (baseline)
Scalar: 4x compression
Binary: 32x compression

Usage:

cargo run --bin memory-benchmark --release -- \
  --num-vectors 100000 \
  --dimensions 384

Measured Metrics:

Memory per vector (bytes)
Compression ratio
Memory overhead
Quantization impact on recall

Example Results:

┌──────────────┬─────────────┬───────────────┬────────────┐
│ Quantization │ Memory (MB) │ Bytes/Vector  │ Recall@10  │
├──────────────┼─────────────┼───────────────┼────────────┤
│ None         │ 147.5       │ 1536          │ 100.00%    │
│ Scalar       │ 38.2        │ 398           │ 95.80%     │
│ Binary       │ 4.7         │ 49            │ 87.20%     │
└──────────────┴─────────────┴───────────────┴────────────┘

✓ Scalar quantization: 4.0x memory reduction, 4.2% recall loss
✓ Binary quantization: 31.4x memory reduction, 12.8% recall loss

5. Cross-System Comparison (`comparison-benchmark`)

Compare Ruvector against other implementations and baselines.

Comparison Targets:

Ruvector (optimized: SIMD + Quantization + HNSW)
Ruvector (no quantization)
Simulated Python baseline (numpy)
Simulated brute-force search

Usage:

cargo run --bin comparison-benchmark --release -- \
  --num-vectors 50000 \
  --dimensions 384

Example Results:

┌──────────────────────────┬──────┬──────────┬─────────────┬────────────┐
│ System                   │ QPS  │ p50 (ms) │ Memory (MB) │ Speedup    │
├──────────────────────────┼──────┼──────────┼─────────────┼────────────┤
│ Ruvector (optimized)     │ 5243 │ 0.19     │ 38.2        │ 1.0x       │
│ Ruvector (no quant)      │ 4891 │ 0.20     │ 147.5       │ 0.93x      │
│ Python baseline          │ 89   │ 11.2     │ 153.6       │ 58.9x      │
│ Brute-force              │ 12   │ 83.3     │ 147.5       │ 437x       │
└──────────────────────────┴──────┴──────────┴─────────────┴────────────┘

✓ Ruvector is 58.9x faster than Python baseline
✓ Ruvector uses 74.1% less memory with quantization

6. Performance Profiling (`profiling-benchmark`)

CPU and memory profiling with flamegraph generation (requires profiling feature).

Usage:

# Build with profiling support
cargo build --bin profiling-benchmark --release --features profiling

# Run with flamegraph generation
cargo run --bin profiling-benchmark --release --features profiling -- \
  --enable-flamegraph \
  --num-vectors 50000 \
  --output profiling_results

# View flamegraph
open profiling_results/flamegraph.svg

Generated Artifacts:

CPU flamegraph (SVG)
Memory allocation profile
Hotspot analysis
Function-level timing breakdown

📈 Interpreting Results

Latency Metrics

Percentile	Meaning	Target
p50	Median latency - typical query performance	<0.5ms
p95	95% of queries complete within this time	<1.5ms
p99	99% of queries complete within this time	<3.0ms
p99.9	99.9% of queries (tail latency)	<5.0ms

Recall Metrics

Recall@k: Fraction of true nearest neighbors found in top-k results
Target Recall@10: ≥95% for most applications
Trade-off: Higher ef_search → better recall, higher latency

Memory Efficiency

Memory per vector = Total Memory / Number of Vectors

Typical values:
- No quantization: ~1536 bytes (384D float32)
- Scalar quantization: ~400 bytes (4x compression)
- Binary quantization: ~50 bytes (32x compression)

🔧 Benchmark Configuration Options

Common Options (All Benchmarks)

--num-vectors <N>       # Number of vectors to index (default: 50000)
--dimensions <D>        # Vector dimensions (default: 384)
--output <PATH>         # Output directory for results (default: bench_results)

ANN-Benchmark Specific

--dataset <NAME>        # Dataset: sift1m, gist1m, deep1m, synthetic
--num-queries <N>       # Number of search queries (default: 1000)
--k <K>                 # Number of nearest neighbors to retrieve (default: 10)
--m <M>                 # HNSW M parameter (default: 32)
--ef-construction <EF>  # HNSW build parameter (default: 200)
--ef-search-values <EF> # Comma-separated ef_search values to test (default: 50,100,200,400)
--metric <METRIC>       # Distance metric: cosine, euclidean, dot (default: cosine)
--quantization <TYPE>   # Quantization: none, scalar, binary (default: scalar)

Latency-Benchmark Specific

--threads <THREADS>     # Comma-separated thread counts (default: 1,4,8,16)

AgenticDB-Benchmark Specific

--workload <TYPE>       # Workload type: conversational, learning, batch
--num-operations <N>    # Number of operations to perform (default: 10000)

Profiling-Benchmark Specific

--enable-flamegraph     # Generate CPU flamegraph (requires profiling feature)
--enable-memory-profile # Enable detailed memory profiling

🎨 Custom Benchmark Creation

Create your own benchmarks using the ruvector-bench library:

use ruvector_bench::{
    BenchmarkResult, DatasetGenerator, LatencyStats,
    MemoryProfiler, ResultWriter, VectorDistribution,
};
use ruvector_core::{VectorDB, DbOptions, SearchQuery, VectorEntry};
use std::time::Instant;

fn my_custom_benchmark() -> anyhow::Result<()> {
    // Generate test data
    let gen = DatasetGenerator::new(384, VectorDistribution::Normal {
        mean: 0.0,
        std_dev: 1.0,
    });
    let vectors = gen.generate(10000);
    let queries = gen.generate(100);

    // Create database
    let db = VectorDB::new(DbOptions::default())?;

    // Measure indexing
    let mem_profiler = MemoryProfiler::new();
    let build_start = Instant::now();

    for (idx, vector) in vectors.iter().enumerate() {
        db.insert(VectorEntry {
            id: Some(idx.to_string()),
            vector: vector.clone(),
            metadata: None,
        })?;
    }

    let build_time = build_start.elapsed();

    // Measure search performance
    let mut latency_stats = LatencyStats::new()?;

    for query in &queries {
        let start = Instant::now();
        db.search(SearchQuery {
            vector: query.clone(),
            k: 10,
            filter: None,
            ef_search: None,
        })?;
        latency_stats.record(start.elapsed())?;
    }

    // Print results
    println!("Build time: {:.2}s", build_time.as_secs_f64());
    println!("p50 latency: {:.2}ms", latency_stats.percentile(0.50).as_secs_f64() * 1000.0);
    println!("p99 latency: {:.2}ms", latency_stats.percentile(0.99).as_secs_f64() * 1000.0);
    println!("Memory usage: {:.2}MB", mem_profiler.current_usage_mb());

    Ok(())
}

🔄 CI/CD Integration

GitHub Actions Example

name: Benchmarks

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Install Rust
        uses: actions-rs/toolchain@v1
        with:
          toolchain: stable
          profile: minimal

      - name: Run benchmarks
        run: |
          cd crates/ruvector-bench
          cargo run --bin ann-benchmark --release -- --output ci_results
          cargo run --bin latency-benchmark --release -- --output ci_results

      - name: Upload results
        uses: actions/upload-artifact@v3
        with:
          name: benchmark-results
          path: crates/ruvector-bench/ci_results/

      - name: Check performance regression
        run: |
          python scripts/check_regression.py ci_results/ann_benchmark.json

📉 Performance Regression Testing

Track performance over time using historical benchmark data:

# Run baseline benchmarks (on main branch)
git checkout main
cargo run --bin ann-benchmark --release -- --output baseline_results

# Run comparison benchmarks (on feature branch)
git checkout feature-branch
cargo run --bin ann-benchmark --release -- --output feature_results

# Compare results
python scripts/compare_benchmarks.py \
  baseline_results/ann_benchmark.json \
  feature_results/ann_benchmark.json

Regression Thresholds:

✅ Pass: <5% latency regression, <10% memory regression
⚠️ Warning: 5-10% latency regression, 10-20% memory regression
❌ Fail: >10% latency regression, >20% memory regression

📊 Results Visualization

Benchmark results are automatically saved in multiple formats:

JSON Format

{
  "name": "ruvector-ef100",
  "dataset": "synthetic",
  "dimensions": 384,
  "num_vectors": 100000,
  "qps": 5243.2,
  "latency_p50": 0.19,
  "latency_p99": 2.15,
  "recall_at_10": 0.9523,
  "memory_mb": 38.2
}

CSV Format

name,dataset,dimensions,num_vectors,qps,p50,p99,recall@10,memory_mb
ruvector-ef100,synthetic,384,100000,5243.2,0.19,2.15,0.9523,38.2

Markdown Report

Results include automatically generated markdown reports with detailed performance analysis.

Custom Visualization

Generate performance charts using the provided data:

import pandas as pd
import matplotlib.pyplot as plt

# Load benchmark results
df = pd.read_csv('bench_results/ann_benchmark.csv')

# Plot QPS vs Recall tradeoff
plt.figure(figsize=(10, 6))
plt.scatter(df['recall@10'] * 100, df['qps'])
plt.xlabel('Recall@10 (%)')
plt.ylabel('Queries per Second')
plt.title('Ruvector Performance: QPS vs Recall')
plt.grid(True)
plt.savefig('qps_vs_recall.png')

🔗 Links to Benchmark Reports

Latest Benchmark Results
Performance Optimization Guide
Implementation Summary
ANN-Benchmarks.com - Standard vector search benchmarks

🎯 Optimization Based on Benchmarks

Use Benchmark Results to Tune Performance

Optimize for Latency (sub-millisecond queries):

HnswConfig {
    m: 16,              // Lower M = faster search, less recall
    ef_construction: 100,
    ef_search: 50,      // Lower ef_search = faster, less recall
    max_elements: 100000,
}

Optimize for Recall (95%+ accuracy):

HnswConfig {
    m: 64,              // Higher M = better recall
    ef_construction: 400,
    ef_search: 200,     // Higher ef_search = better recall
    max_elements: 100000,
}

Optimize for Memory (minimal footprint):

DbOptions {
    quantization: Some(QuantizationConfig::Binary),  // 32x compression
    ..Default::default()
}

Recommended Configurations by Use Case

Use Case	M	ef_construction	ef_search	Quantization	Expected Performance
Low-Latency Search	16	100	50	Scalar	<0.5ms p50, 90%+ recall
Balanced	32	200	100	Scalar	<1ms p50, 95%+ recall
High Accuracy	64	400	200	None	<2ms p50, 98%+ recall
Memory Constrained	16	100	50	Binary	<1ms p50, 85%+ recall, 32x compression

🛠️ Development

Running Tests

# Run unit tests
cargo test -p ruvector-bench

# Run specific benchmark
cargo test -p ruvector-bench --test latency_stats_test

Building Documentation

# Generate API documentation
cargo doc -p ruvector-bench --open

Adding New Benchmarks

Create a new binary in src/bin/:
```
touch src/bin/my_benchmark.rs
```

Add to Cargo.toml:

[[bin]]
name = "my-benchmark"
path = "src/bin/my_benchmark.rs"

Implement using ruvector-bench utilities:

use ruvector_bench::{LatencyStats, ResultWriter};

📚 API Reference

Core Types

BenchmarkResult: Comprehensive benchmark result structure
LatencyStats: HDR histogram-based latency measurement
DatasetGenerator: Synthetic vector data generation
MemoryProfiler: Memory usage tracking
ResultWriter: Multi-format result output (JSON, CSV, Markdown)

Utilities

calculate_recall(): Compute recall@k metric
create_progress_bar(): Terminal progress indication
VectorDistribution: Uniform, Normal, or Clustered vector generation

See full API documentation for details.

🤝 Contributing

We welcome contributions to improve the benchmarking suite!

Areas for Contribution

📊 Additional benchmark scenarios (concurrent writes, updates, deletes)
🔌 Integration with other vector databases (Pinecone, Qdrant, Milvus)
📈 Enhanced visualization and reporting
🎯 Real-world dataset support (SIFT, GIST, Deep1M loaders)
🚀 Performance optimization insights

See Contributing Guidelines for details.

📜 License

This crate is part of the Ruvector project and is licensed under the MIT License.

Part of Ruvector - Next-generation vector database built in Rust

Built by rUv • GitHub • Documentation

FilesExpand file tree

ruvector-bench

Directory actions

More options

Directory actions

More options

Latest commit

History

ruvector-bench

Folders and files

parent directory

README.md

Ruvector-Bench

🎯 Overview

Key Features

📦 Installation

🚀 Available Benchmarks

⚡ Quick Start

Running Basic Benchmarks

Running with Profiling

📊 Benchmark Categories

1. ANN-Benchmarks Suite (ann-benchmark)

2. AgenticDB Workload Simulation (agenticdb-benchmark)

3. Latency Profiling (latency-benchmark)

4. Memory Benchmarks (memory-benchmark)

5. Cross-System Comparison (comparison-benchmark)

6. Performance Profiling (profiling-benchmark)

📈 Interpreting Results

Latency Metrics

Recall Metrics

Memory Efficiency

🔧 Benchmark Configuration Options

Common Options (All Benchmarks)

ANN-Benchmark Specific

Latency-Benchmark Specific

AgenticDB-Benchmark Specific

Profiling-Benchmark Specific

🎨 Custom Benchmark Creation

🔄 CI/CD Integration

GitHub Actions Example

📉 Performance Regression Testing

📊 Results Visualization

JSON Format

CSV Format

Markdown Report

Custom Visualization

🔗 Links to Benchmark Reports

🎯 Optimization Based on Benchmarks

Use Benchmark Results to Tune Performance

Recommended Configurations by Use Case

🛠️ Development

Running Tests

Building Documentation

Adding New Benchmarks

📚 API Reference

Core Types

Utilities

🤝 Contributing

Areas for Contribution

📜 License

1. ANN-Benchmarks Suite (`ann-benchmark`)

2. AgenticDB Workload Simulation (`agenticdb-benchmark`)

3. Latency Profiling (`latency-benchmark`)

4. Memory Benchmarks (`memory-benchmark`)

5. Cross-System Comparison (`comparison-benchmark`)

6. Performance Profiling (`profiling-benchmark`)