Skip to content

Latest commit

 

History

History
787 lines (587 loc) · 23.5 KB

File metadata and controls

787 lines (587 loc) · 23.5 KB

ADR-001: Ruvector Core Architecture

Status: Proposed Date: 2026-01-18 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board SDK: Claude-Flow

Note: The storage layer described in this ADR is superseded by ADR-029 (RVF as Canonical Binary Format). All vector persistence now uses the RVF segment model.

Version History

Version Date Author Changes
0.1 2026-01-18 ruv.io Initial architecture proposal

Context

The Vector Database Challenge

Modern AI applications require vector databases that can:

  1. Store high-dimensional embeddings from LLMs and embedding models
  2. Search with sub-millisecond latency for real-time inference
  3. Scale to billions of vectors while maintaining performance
  4. Deploy anywhere - edge devices, browsers (WASM), cloud servers
  5. Integrate seamlessly with LLM inference pipelines

Current State of Vector Databases

Existing solutions fall into several categories:

Category Examples Limitations
Cloud-only Pinecone No edge deployment, vendor lock-in
Heavy native Milvus, Qdrant Complex deployment, high memory
Python-first ChromaDB, FAISS Performance overhead, no WASM
Learning-capable None No existing solutions learn from usage

The Ruvector Vision

Ruvector is designed as a high-performance, learning-capable vector database implemented in Rust that:

  • Achieves 61us p50 latency for k=10 search on 384-dim vectors
  • Provides 2-32x memory compression through tiered quantization
  • Runs anywhere - native (x86_64, ARM64), WASM (browser, edge), PostgreSQL extension
  • Learns from usage via GNN layers that improve search quality over time
  • Integrates with AI agent memory systems for policy, session state, and audit logs

Decision

Adopt a Layered, SIMD-Optimized Architecture

We implement ruvector-core as the foundational vector database engine with the following architecture:

+-----------------------------------------------------------------------------+
|                              APPLICATION LAYER                               |
|  AgenticDB | VectorDB API | Cypher Queries | REST/gRPC Server               |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              INDEX LAYER                                     |
|  HNSW Index | Flat Index | Filtered Search | Hybrid Search | MMR            |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              QUANTIZATION LAYER                              |
|  Scalar (4x) | Product (8-16x) | Binary (32x) | Conformal Prediction        |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              DISTANCE LAYER                                  |
|  Euclidean | Cosine | Dot Product | Manhattan | SIMD Dispatch               |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              SIMD INTRINSICS LAYER                           |
|  AVX2/AVX-512 (x86_64) | NEON (ARM64/Apple Silicon) | Scalar Fallback       |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              STORAGE LAYER                                   |
|  REDB (native) | Memory-only (WASM) | PostgreSQL Extension                  |
+-----------------------------------------------------------------------------+

Key Components

1. SIMD Intrinsics Layer (simd_intrinsics.rs)

The performance foundation of ruvector, providing hardware-accelerated distance calculations.

Architecture Dispatch

pub fn euclidean_distance_simd(a: &[f32], b: &[f32]) -> f32 {
    #[cfg(target_arch = "x86_64")]
    {
        if is_x86_feature_detected!("avx2") {
            unsafe { euclidean_distance_avx2_impl(a, b) }
        } else {
            euclidean_distance_scalar(a, b)
        }
    }

    #[cfg(target_arch = "aarch64")]
    {
        unsafe { euclidean_distance_neon_impl(a, b) }
    }

    #[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
    {
        euclidean_distance_scalar(a, b)
    }
}

Supported Operations

Operation AVX2 (x86_64) NEON (ARM64) Scalar Fallback
Euclidean Distance 8 floats/cycle 4 floats/cycle 1 float/cycle
Dot Product 8 floats/cycle 4 floats/cycle 1 float/cycle
Cosine Similarity 8 floats/cycle 4 floats/cycle 1 float/cycle
Manhattan Distance N/A 4 floats/cycle 1 float/cycle

Performance Characteristics

Metric AVX2 NEON Scalar
512-dim Euclidean ~16M ops/sec ~8M ops/sec ~2M ops/sec
384-dim Cosine ~143ns ~200ns ~800ns
1536-dim Dot Product ~33ns ~50ns ~150ns

Security Guarantees

  • Bounds checking via assert_eq!(a.len(), b.len()) prevents buffer overflows
  • Unaligned loads (_mm256_loadu_ps, vld1q_f32) handle arbitrary alignment
  • Scalar fallback handles remainder elements after SIMD processing

2. Distance Metrics Layer (distance.rs)

High-level distance API with optional SimSIMD integration for additional acceleration.

Supported Metrics

pub enum DistanceMetric {
    Euclidean,   // L2 distance: sqrt(sum((a[i] - b[i])^2))
    Cosine,      // 1 - cosine_similarity
    DotProduct,  // Negative dot product (for maximization)
    Manhattan,   // L1 distance: sum(|a[i] - b[i]|)
}

Feature Flags

Feature Description Use Case
simd SimSIMD acceleration Native builds
parallel Rayon batch processing Multi-core systems
None Pure Rust fallback WASM builds

Batch Distance API

pub fn batch_distances(
    query: &[f32],
    vectors: &[Vec<f32>],
    metric: DistanceMetric,
) -> Result<Vec<f32>> {
    #[cfg(all(feature = "parallel", not(target_arch = "wasm32")))]
    {
        use rayon::prelude::*;
        vectors.par_iter()
            .map(|v| distance(query, v, metric))
            .collect()
    }
    // Sequential fallback for WASM...
}

3. Index Structures (index/)

HNSW Index (index/hnsw.rs)

Hierarchical Navigable Small World graph for approximate nearest neighbor search.

Configuration Parameters:

Parameter Default Description
m 32 Connections per layer (higher = better recall, more memory)
ef_construction 200 Build-time search depth (higher = better graph, slower build)
ef_search 100 Query-time search depth (higher = better recall, slower query)
max_elements 10M Pre-allocated capacity

Complexity Analysis:

Operation Time Complexity Space Complexity
Insert O(log n * m * ef_construction) O(m * log n) per vector
Search O(log n * m * ef_search) O(ef_search)
Delete O(1)* O(1)

*Note: HNSW deletion marks vectors as removed but does not restructure the graph.

Serialization:

pub struct HnswState {
    vectors: Vec<(String, Vec<f32>)>,
    id_to_idx: Vec<(String, usize)>,
    idx_to_id: Vec<(usize, String)>,
    next_idx: usize,
    config: SerializableHnswConfig,
    dimensions: usize,
    metric: SerializableDistanceMetric,
}

Flat Index

Linear scan index for small datasets or exact search.

Use Cases:

  • Datasets < 10K vectors
  • Exact k-NN required
  • Benchmarking HNSW recall

4. Quantization Strategies (quantization.rs)

Memory compression techniques trading precision for storage efficiency.

Scalar Quantization (4x compression)

Quantizes f32 to u8 using min-max scaling.

pub struct ScalarQuantized {
    pub data: Vec<u8>,     // Quantized values
    pub min: f32,          // Minimum for dequantization
    pub scale: f32,        // Scale factor
}

Characteristics:

  • Compression: 4x (f32 -> u8)
  • Distance calculation: Uses average scale for symmetric distance
  • Reconstruction error: < 0.4% for typical embedding distributions

Product Quantization (8-16x compression)

Divides vectors into subspaces, each quantized independently via k-means codebooks.

pub struct ProductQuantized {
    pub codes: Vec<u8>,                    // One code per subspace
    pub codebooks: Vec<Vec<Vec<f32>>>,     // Learned centroids
}

Training:

  • K-means clustering on subspace vectors
  • Codebook size typically 256 (fits in u8)
  • Iterations: 10-100 for convergence

Binary Quantization (32x compression)

Single-bit representation based on sign.

pub struct BinaryQuantized {
    pub bits: Vec<u8>,      // Packed bits (8 dimensions per byte)
    pub dimensions: usize,
}

Characteristics:

  • Compression: 32x (f32 -> 1 bit)
  • Distance: Hamming distance (XOR + popcount)
  • Best for: Filtering stage before exact distance on candidates

Tiered Compression Strategy

Ruvector automatically manages compression based on access patterns:

Access Frequency Format Compression Latency
Hot (>80%) f32 1x Instant
Warm (40-80%) f16 2x ~1us
Cool (10-40%) Scalar 4x ~10us
Cold (1-10%) Product 8-16x ~100us
Archive (<1%) Binary 32x ~1ms

5. Memory Management

Arena Allocator (arena.rs)

Bump allocator for batch operations reducing allocation overhead.

Lock-Free Structures (lockfree.rs)

  • Crossbeam-based concurrent data structures
  • Lock-free queues for batch ingestion
  • Available only on parallel feature (not WASM)

Cache-Optimized Operations (cache_optimized.rs)

  • Prefetching hints for sequential access
  • Cache-line aligned storage
  • NUMA-aware allocation on supported platforms

6. Storage Layer (storage.rs)

Native Storage (REDB)

  • ACID transactions
  • Memory-mapped vectors
  • Configuration persistence
  • Connection pooling for multiple VectorDB instances
const VECTORS_TABLE: TableDefinition<&str, &[u8]> = TableDefinition::new("vectors");
const METADATA_TABLE: TableDefinition<&str, &str> = TableDefinition::new("metadata");
const CONFIG_TABLE: TableDefinition<&str, &str> = TableDefinition::new("config");

Security:

  • Path traversal protection
  • Validates relative paths don't escape working directory

Memory-Only Storage (storage_memory.rs)

  • Pure in-memory for WASM
  • No persistence
  • DashMap for concurrent access

Integration Points

1. Policy Memory Store

Ruvector serves as the backing store for AI agent policy memory:

+-------------------+       +-------------------+       +-------------------+
|   AI Agent        |       |   Policy Memory   |       |   ruvector-core   |
|                   | ----> |   (AgenticDB)     | ----> |                   |
| "What action for  |       | Search similar    |       | HNSW search       |
|  this situation?" |       | past situations   |       | with metadata     |
+-------------------+       +-------------------+       +-------------------+

Use Cases:

  • Q-learning state-action lookups
  • Contextual bandit policy retrieval
  • Episodic memory for reasoning

2. Session State Index

Real-time session context for conversational AI:

+-------------------+       +-------------------+       +-------------------+
|   Chat Session    |       |   Session Index   |       |   ruvector-core   |
|                   | ----> |                   | ----> |                   |
| Current context   |       | Find relevant     |       | Cosine similarity |
| embedding         |       | past turns        |       | top-k search      |
+-------------------+       +-------------------+       +-------------------+

Requirements:

  • < 10ms latency for interactive use
  • Session isolation via namespaces
  • TTL-based cleanup

3. Witness Log for Audit

Cryptographically-linked audit trail:

+-------------------+       +-------------------+       +-------------------+
|   Agent Action    |       |   Witness Log     |       |   ruvector-core   |
|                   | ----> |                   | ----> |                   |
| Action embedding  |       | Store with hash   |       | Append-only       |
| + metadata        |       | chain reference   |       | with timestamps   |
+-------------------+       +-------------------+       +-------------------+

Properties:

  • Immutable entries
  • Hash-chain linking
  • Semantic searchability

Decision Drivers

1. Performance (Sub-millisecond Latency)

Requirement Implementation
61us p50 search SIMD-optimized distance + HNSW
16,400 QPS Parallel search with Rayon
Batch ingestion Lock-free queues + bulk insert

2. Memory Efficiency (Quantization Support)

Requirement Implementation
4x compression Scalar quantization
8-16x compression Product quantization
32x compression Binary quantization
Automatic tiering Access pattern tracking

3. Cross-Platform Portability (WASM, Native)

Platform Features Available
x86_64 Linux/macOS Full (SIMD, parallel, storage)
ARM64 macOS (Apple Silicon) Full (NEON, parallel, storage)
WASM (browser) Memory-only, scalar fallback
PostgreSQL extension Full + SQL integration

4. LLM Integration

Requirement Implementation
Embedding ingestion API-based and local providers
Semantic search Cosine/dot product metrics
RAG pipeline Hybrid search + metadata filtering

Alternatives Considered

Alternative 1: Pure Python Implementation (NumPy/FAISS)

Rejected because:

  • 10-100x slower than Rust SIMD
  • No WASM support
  • GIL contention in concurrent workloads

Alternative 2: C++ with Bindings

Rejected because:

  • Memory safety concerns
  • Complex cross-compilation
  • Build system complexity (CMake)

Alternative 3: Qdrant/Milvus Integration

Rejected because:

  • External service dependency
  • No WASM support
  • Complex deployment for edge use cases

Alternative 4: GPU-Only Acceleration (CUDA/ROCm)

Rejected because:

  • Not portable to edge/mobile
  • Driver dependencies
  • Overkill for < 100M vectors

Consequences

Benefits

  1. Performance: Sub-millisecond latency enables real-time AI applications
  2. Portability: Single codebase runs native, WASM, and PostgreSQL
  3. Memory Efficiency: 2-32x compression makes large datasets practical on edge
  4. Integration: Native Rust means zero-cost abstractions for embedding in other systems
  5. Learning: GNN layers can improve search quality without reindexing

Risks and Mitigations

Risk Probability Impact Mitigation
HNSW recall < 100% High Medium ef_search tuning, hybrid with exact search
Quantization accuracy loss Medium Medium Conformal prediction bounds
WASM performance gap Medium Low Specialized WASM-optimized builds
API embeddings require external call High Low Local embedding option via ONNX

Performance Targets

Metric Target Achieved
HNSW Search (k=10, 384-dim) < 100us p50 61us
HNSW Search (k=100, 384-dim) < 200us p50 164us
Cosine Distance (1536-dim) < 200ns 143ns
Dot Product (384-dim) < 50ns 33ns
Batch Distance (1000 vectors) < 500us 237us
QPS (10K vectors, k=10) > 10K 16,400

Implementation Status

Completed (v0.1.x)

Module Status Description
simd_intrinsics Complete AVX2/NEON dispatch with scalar fallback
distance Complete All 4 metrics with SimSIMD integration
index/hnsw Complete Full HNSW with serialization
index/flat Complete Linear scan baseline
quantization Complete Scalar, Product, Binary
storage Complete REDB-based with connection pooling
storage_memory Complete In-memory for WASM
types Complete Core types with serde
error Complete Error types with thiserror
vector_db Complete High-level API
agenticdb Complete AI agent memory interface

Advanced Features

Module Status Description
advanced_features/filtered_search Complete Metadata-based filtering
advanced_features/hybrid_search Complete Dense + sparse (BM25)
advanced_features/mmr Complete Maximal Marginal Relevance
advanced_features/conformal_prediction Complete Uncertainty quantification
advanced_features/product_quantization Complete Enhanced PQ with training

Research Features (advanced/)

Module Status Description
hypergraph Experimental Hyperedge relationships
learned_index Experimental Neural index structures
neural_hash Experimental LSH with neural tuning
tda Experimental Topological data analysis

Feature Flags

Feature Default Description
default Yes simd, storage, hnsw, api-embeddings, parallel
simd Yes SimSIMD acceleration
parallel Yes Rayon parallel processing
storage Yes REDB file-based storage
hnsw Yes HNSW index support
api-embeddings Yes HTTP-based embedding providers
memory-only No Pure in-memory (WASM)
real-embeddings No Deprecated, use api-embeddings

Dependencies

Core Dependencies

Dependency Version Purpose
hnsw_rs workspace HNSW implementation
simsimd workspace SIMD distance functions
rayon workspace Parallel iteration
redb workspace Embedded database
bincode workspace Binary serialization
dashmap workspace Concurrent hash map
parking_lot workspace Optimized locks

Optional Dependencies

Dependency Feature Purpose
reqwest api-embeddings HTTP client for embedding APIs
memmap2 storage Memory-mapped files
crossbeam parallel Lock-free data structures

API Examples

Basic Vector Search

use ruvector_core::{VectorDB, DistanceMetric, HnswConfig};

// Create database
let config = HnswConfig {
    m: 32,
    ef_construction: 200,
    ef_search: 100,
    max_elements: 1_000_000,
};
let mut db = VectorDB::new(384, DistanceMetric::Cosine, config)?;

// Insert vectors
db.insert("doc_1".to_string(), vec![0.1; 384])?;
db.insert("doc_2".to_string(), vec![0.2; 384])?;

// Search
let query = vec![0.15; 384];
let results = db.search(&query, 10)?;

Quantized Search

use ruvector_core::quantization::{ScalarQuantized, QuantizedVector};

// Quantize vectors for storage
let quantized = ScalarQuantized::quantize(&vector);

// Distance in quantized space
let distance = quantized.distance(&other_quantized);

// Reconstruct if needed
let reconstructed = quantized.reconstruct();

Batch Operations

use ruvector_core::distance::batch_distances;

// Calculate distances to many vectors in parallel
let distances = batch_distances(
    &query,
    &corpus_vectors,
    DistanceMetric::Cosine,
)?;

References

  1. Malkov, Y., & Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." arXiv:1603.09320.

  2. Jegou, H., Douze, M., & Schmid, C. (2011). "Product quantization for nearest neighbor search." IEEE TPAMI.

  3. RuVector Team. "ruvector-core Benchmarks." /crates/ruvector-core/benches/

  4. SimSIMD Documentation. https://github.com/ashvardanian/SimSIMD


Appendix A: SIMD Register Usage

AVX2 (256-bit registers)

+-------+-------+-------+-------+-------+-------+-------+-------+
|  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |
+-------+-------+-------+-------+-------+-------+-------+-------+
   [0]     [1]     [2]     [3]     [4]     [5]     [6]     [7]

Operations per cycle:
- _mm256_loadu_ps: Load 8 floats
- _mm256_sub_ps: 8 subtractions
- _mm256_mul_ps: 8 multiplications
- _mm256_add_ps: 8 additions

NEON (128-bit registers)

+-------+-------+-------+-------+
|  f32  |  f32  |  f32  |  f32  |
+-------+-------+-------+-------+
   [0]     [1]     [2]     [3]

Operations per cycle:
- vld1q_f32: Load 4 floats
- vsubq_f32: 4 subtractions
- vfmaq_f32: 4 fused multiply-add
- vaddvq_f32: Horizontal sum

Appendix B: Memory Layout

VectorEntry

+------------------+------------------+------------------+
|     id: String   |  vector: Vec<f32>|  metadata: JSON  |
|     (optional)   |  (required)      |  (optional)      |
+------------------+------------------+------------------+

HNSW Graph Structure

Level 3:  [v0] -------- [v5]
            \            /
Level 2:  [v0] -- [v3] -- [v5] -- [v9]
            \    /    \    /    \
Level 1:  [v0]-[v1]-[v3]-[v4]-[v5]-[v7]-[v9]
            |    |    |    |    |    |    |
Level 0:  [v0]-[v1]-[v2]-[v3]-[v4]-[v5]-[v6]-[v7]-[v8]-[v9]

Appendix C: Benchmark Results

Platform: Apple M2 (ARM64 NEON)

HNSW Search k=10 (10K vectors, 384-dim):
  p50: 61us
  p95: 89us
  p99: 112us
  Throughput: 16,400 QPS

HNSW Search k=100 (10K vectors, 384-dim):
  p50: 164us
  p95: 203us
  p99: 245us
  Throughput: 6,100 QPS

Distance Operations (1536-dim):
  Cosine: 143ns
  Euclidean: 156ns
  Dot Product: 33ns (384-dim)

Batch Distance (1000 vectors, 384-dim):
  Parallel (Rayon): 237us
  Sequential: 890us

Platform: Intel i7 (AVX2)

HNSW Search k=10 (10K vectors, 384-dim):
  p50: 72us
  p95: 105us
  p99: 134us
  Throughput: 13,900 QPS

Distance Operations (1536-dim):
  Cosine: 128ns
  Euclidean: 141ns
  Dot Product: 29ns (384-dim)

Related Decisions

  • ADR-002: RuvLLM Integration with Ruvector
  • ADR-003: SIMD Optimization Strategy
  • ADR-004: KV Cache Management
  • ADR-005: WASM Runtime Integration
  • ADR-006: Memory Management
  • ADR-007: Security Review & Technical Debt

Implementation Status (v2.1)

Component Status Notes
HNSW Index ✅ Implemented M=32, ef_construct=256, 16K QPS
SIMD Distance ✅ Implemented AVX2/NEON with fallback
Scalar Quantization ✅ Implemented 8-bit with min/max scaling
Batch Operations ✅ Implemented Rayon parallel distances
Graph Store ✅ Implemented Adjacency list with metadata
Persistence ✅ Implemented Binary format with versioning

Security Status: Core components reviewed. No critical vulnerabilities in ruvector-core. See ADR-007 for full audit (RuvLLM-specific issues).


Revision History

Version Date Author Changes
1.0 2026-01-18 Ruvector Architecture Team Initial version
1.1 2026-01-19 Security Review Agent Added implementation status, related decisions