ADR-001: Ruvector Core Architecture

Status: Proposed Date: 2026-01-18 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board SDK: Claude-Flow

Note: The storage layer described in this ADR is superseded by ADR-029 (RVF as Canonical Binary Format). All vector persistence now uses the RVF segment model.

Version History

Version	Date	Author	Changes
0.1	2026-01-18	ruv.io	Initial architecture proposal

Context

The Vector Database Challenge

Modern AI applications require vector databases that can:

Store high-dimensional embeddings from LLMs and embedding models
Search with sub-millisecond latency for real-time inference
Scale to billions of vectors while maintaining performance
Deploy anywhere - edge devices, browsers (WASM), cloud servers
Integrate seamlessly with LLM inference pipelines

Current State of Vector Databases

Existing solutions fall into several categories:

Category	Examples	Limitations
Cloud-only	Pinecone	No edge deployment, vendor lock-in
Heavy native	Milvus, Qdrant	Complex deployment, high memory
Python-first	ChromaDB, FAISS	Performance overhead, no WASM
Learning-capable	None	No existing solutions learn from usage

The Ruvector Vision

Ruvector is designed as a high-performance, learning-capable vector database implemented in Rust that:

Achieves 61us p50 latency for k=10 search on 384-dim vectors
Provides 2-32x memory compression through tiered quantization
Runs anywhere - native (x86_64, ARM64), WASM (browser, edge), PostgreSQL extension
Learns from usage via GNN layers that improve search quality over time
Integrates with AI agent memory systems for policy, session state, and audit logs

Decision

Adopt a Layered, SIMD-Optimized Architecture

We implement ruvector-core as the foundational vector database engine with the following architecture:

+-----------------------------------------------------------------------------+
|                              APPLICATION LAYER                               |
|  AgenticDB | VectorDB API | Cypher Queries | REST/gRPC Server               |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              INDEX LAYER                                     |
|  HNSW Index | Flat Index | Filtered Search | Hybrid Search | MMR            |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              QUANTIZATION LAYER                              |
|  Scalar (4x) | Product (8-16x) | Binary (32x) | Conformal Prediction        |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              DISTANCE LAYER                                  |
|  Euclidean | Cosine | Dot Product | Manhattan | SIMD Dispatch               |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              SIMD INTRINSICS LAYER                           |
|  AVX2/AVX-512 (x86_64) | NEON (ARM64/Apple Silicon) | Scalar Fallback       |
+-----------------------------------------------------------------------------+
                                    |
+-----------------------------------------------------------------------------+
|                              STORAGE LAYER                                   |
|  REDB (native) | Memory-only (WASM) | PostgreSQL Extension                  |
+-----------------------------------------------------------------------------+

Key Components

1. SIMD Intrinsics Layer (`simd_intrinsics.rs`)

The performance foundation of ruvector, providing hardware-accelerated distance calculations.

Architecture Dispatch

pub fn euclidean_distance_simd(a: &[f32], b: &[f32]) -> f32 {
    #[cfg(target_arch = "x86_64")]
    {
        if is_x86_feature_detected!("avx2") {
            unsafe { euclidean_distance_avx2_impl(a, b) }
        } else {
            euclidean_distance_scalar(a, b)
        }
    }

    #[cfg(target_arch = "aarch64")]
    {
        unsafe { euclidean_distance_neon_impl(a, b) }
    }

    #[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
    {
        euclidean_distance_scalar(a, b)
    }
}

Supported Operations

Operation	AVX2 (x86_64)	NEON (ARM64)	Scalar Fallback
Euclidean Distance	8 floats/cycle	4 floats/cycle	1 float/cycle
Dot Product	8 floats/cycle	4 floats/cycle	1 float/cycle
Cosine Similarity	8 floats/cycle	4 floats/cycle	1 float/cycle
Manhattan Distance	N/A	4 floats/cycle	1 float/cycle

Performance Characteristics

Metric	AVX2	NEON	Scalar
512-dim Euclidean	~16M ops/sec	~8M ops/sec	~2M ops/sec
384-dim Cosine	~143ns	~200ns	~800ns
1536-dim Dot Product	~33ns	~50ns	~150ns

Security Guarantees

Bounds checking via assert_eq!(a.len(), b.len()) prevents buffer overflows
Unaligned loads (_mm256_loadu_ps, vld1q_f32) handle arbitrary alignment
Scalar fallback handles remainder elements after SIMD processing

2. Distance Metrics Layer (`distance.rs`)

High-level distance API with optional SimSIMD integration for additional acceleration.

Supported Metrics

pub enum DistanceMetric {
    Euclidean,   // L2 distance: sqrt(sum((a[i] - b[i])^2))
    Cosine,      // 1 - cosine_similarity
    DotProduct,  // Negative dot product (for maximization)
    Manhattan,   // L1 distance: sum(|a[i] - b[i]|)
}

Feature Flags

Feature	Description	Use Case
`simd`	SimSIMD acceleration	Native builds
`parallel`	Rayon batch processing	Multi-core systems
None	Pure Rust fallback	WASM builds

Batch Distance API

pub fn batch_distances(
    query: &[f32],
    vectors: &[Vec<f32>],
    metric: DistanceMetric,
) -> Result<Vec<f32>> {
    #[cfg(all(feature = "parallel", not(target_arch = "wasm32")))]
    {
        use rayon::prelude::*;
        vectors.par_iter()
            .map(|v| distance(query, v, metric))
            .collect()
    }
    // Sequential fallback for WASM...
}

3. Index Structures (`index/`)

HNSW Index (`index/hnsw.rs`)

Hierarchical Navigable Small World graph for approximate nearest neighbor search.

Configuration Parameters:

Parameter	Default	Description
`m`	32	Connections per layer (higher = better recall, more memory)
`ef_construction`	200	Build-time search depth (higher = better graph, slower build)
`ef_search`	100	Query-time search depth (higher = better recall, slower query)
`max_elements`	10M	Pre-allocated capacity

Complexity Analysis:

Operation	Time Complexity	Space Complexity
Insert	O(log n * m * ef_construction)	O(m * log n) per vector
Search	O(log n * m * ef_search)	O(ef_search)
Delete	O(1)*	O(1)

*Note: HNSW deletion marks vectors as removed but does not restructure the graph.

Serialization:

pub struct HnswState {
    vectors: Vec<(String, Vec<f32>)>,
    id_to_idx: Vec<(String, usize)>,
    idx_to_id: Vec<(usize, String)>,
    next_idx: usize,
    config: SerializableHnswConfig,
    dimensions: usize,
    metric: SerializableDistanceMetric,
}

Flat Index

Linear scan index for small datasets or exact search.

Use Cases:

Datasets < 10K vectors
Exact k-NN required
Benchmarking HNSW recall

4. Quantization Strategies (`quantization.rs`)

Memory compression techniques trading precision for storage efficiency.

Scalar Quantization (4x compression)

Quantizes f32 to u8 using min-max scaling.

pub struct ScalarQuantized {
    pub data: Vec<u8>,     // Quantized values
    pub min: f32,          // Minimum for dequantization
    pub scale: f32,        // Scale factor
}

Characteristics:

Compression: 4x (f32 -> u8)
Distance calculation: Uses average scale for symmetric distance
Reconstruction error: < 0.4% for typical embedding distributions

Product Quantization (8-16x compression)

Divides vectors into subspaces, each quantized independently via k-means codebooks.

pub struct ProductQuantized {
    pub codes: Vec<u8>,                    // One code per subspace
    pub codebooks: Vec<Vec<Vec<f32>>>,     // Learned centroids
}

Training:

K-means clustering on subspace vectors
Codebook size typically 256 (fits in u8)
Iterations: 10-100 for convergence

Binary Quantization (32x compression)

Single-bit representation based on sign.

pub struct BinaryQuantized {
    pub bits: Vec<u8>,      // Packed bits (8 dimensions per byte)
    pub dimensions: usize,
}

Characteristics:

Compression: 32x (f32 -> 1 bit)
Distance: Hamming distance (XOR + popcount)
Best for: Filtering stage before exact distance on candidates

Tiered Compression Strategy

Ruvector automatically manages compression based on access patterns:

Access Frequency	Format	Compression	Latency
Hot (>80%)	f32	1x	Instant
Warm (40-80%)	f16	2x	~1us
Cool (10-40%)	Scalar	4x	~10us
Cold (1-10%)	Product	8-16x	~100us
Archive (<1%)	Binary	32x	~1ms

5. Memory Management

Arena Allocator (`arena.rs`)

Bump allocator for batch operations reducing allocation overhead.

Lock-Free Structures (`lockfree.rs`)

Crossbeam-based concurrent data structures
Lock-free queues for batch ingestion
Available only on parallel feature (not WASM)

Cache-Optimized Operations (`cache_optimized.rs`)

Prefetching hints for sequential access
Cache-line aligned storage
NUMA-aware allocation on supported platforms

6. Storage Layer (`storage.rs`)

Native Storage (REDB)

ACID transactions
Memory-mapped vectors
Configuration persistence
Connection pooling for multiple VectorDB instances

const VECTORS_TABLE: TableDefinition<&str, &[u8]> = TableDefinition::new("vectors");
const METADATA_TABLE: TableDefinition<&str, &str> = TableDefinition::new("metadata");
const CONFIG_TABLE: TableDefinition<&str, &str> = TableDefinition::new("config");

Security:

Path traversal protection
Validates relative paths don't escape working directory

Memory-Only Storage (`storage_memory.rs`)

Pure in-memory for WASM
No persistence
DashMap for concurrent access

Integration Points

1. Policy Memory Store

Ruvector serves as the backing store for AI agent policy memory:

+-------------------+       +-------------------+       +-------------------+
|   AI Agent        |       |   Policy Memory   |       |   ruvector-core   |
|                   | ----> |   (AgenticDB)     | ----> |                   |
| "What action for  |       | Search similar    |       | HNSW search       |
|  this situation?" |       | past situations   |       | with metadata     |
+-------------------+       +-------------------+       +-------------------+

Use Cases:

Q-learning state-action lookups
Contextual bandit policy retrieval
Episodic memory for reasoning

2. Session State Index

Real-time session context for conversational AI:

+-------------------+       +-------------------+       +-------------------+
|   Chat Session    |       |   Session Index   |       |   ruvector-core   |
|                   | ----> |                   | ----> |                   |
| Current context   |       | Find relevant     |       | Cosine similarity |
| embedding         |       | past turns        |       | top-k search      |
+-------------------+       +-------------------+       +-------------------+

Requirements:

< 10ms latency for interactive use
Session isolation via namespaces
TTL-based cleanup

3. Witness Log for Audit

Cryptographically-linked audit trail:

+-------------------+       +-------------------+       +-------------------+
|   Agent Action    |       |   Witness Log     |       |   ruvector-core   |
|                   | ----> |                   | ----> |                   |
| Action embedding  |       | Store with hash   |       | Append-only       |
| + metadata        |       | chain reference   |       | with timestamps   |
+-------------------+       +-------------------+       +-------------------+

Properties:

Immutable entries
Hash-chain linking
Semantic searchability

Decision Drivers

1. Performance (Sub-millisecond Latency)

Requirement	Implementation
61us p50 search	SIMD-optimized distance + HNSW
16,400 QPS	Parallel search with Rayon
Batch ingestion	Lock-free queues + bulk insert

2. Memory Efficiency (Quantization Support)

Requirement	Implementation
4x compression	Scalar quantization
8-16x compression	Product quantization
32x compression	Binary quantization
Automatic tiering	Access pattern tracking

3. Cross-Platform Portability (WASM, Native)

Platform	Features Available
x86_64 Linux/macOS	Full (SIMD, parallel, storage)
ARM64 macOS (Apple Silicon)	Full (NEON, parallel, storage)
WASM (browser)	Memory-only, scalar fallback
PostgreSQL extension	Full + SQL integration

4. LLM Integration

Requirement	Implementation
Embedding ingestion	API-based and local providers
Semantic search	Cosine/dot product metrics
RAG pipeline	Hybrid search + metadata filtering

Alternatives Considered

Alternative 1: Pure Python Implementation (NumPy/FAISS)

Rejected because:

10-100x slower than Rust SIMD
No WASM support
GIL contention in concurrent workloads

Alternative 2: C++ with Bindings

Rejected because:

Memory safety concerns
Complex cross-compilation
Build system complexity (CMake)

Alternative 3: Qdrant/Milvus Integration

Rejected because:

External service dependency
No WASM support
Complex deployment for edge use cases

Alternative 4: GPU-Only Acceleration (CUDA/ROCm)

Rejected because:

Not portable to edge/mobile
Driver dependencies
Overkill for < 100M vectors

Consequences

Benefits

Performance: Sub-millisecond latency enables real-time AI applications
Portability: Single codebase runs native, WASM, and PostgreSQL
Memory Efficiency: 2-32x compression makes large datasets practical on edge
Integration: Native Rust means zero-cost abstractions for embedding in other systems
Learning: GNN layers can improve search quality without reindexing

Risks and Mitigations

Risk	Probability	Impact	Mitigation
HNSW recall < 100%	High	Medium	ef_search tuning, hybrid with exact search
Quantization accuracy loss	Medium	Medium	Conformal prediction bounds
WASM performance gap	Medium	Low	Specialized WASM-optimized builds
API embeddings require external call	High	Low	Local embedding option via ONNX

Performance Targets

Metric	Target	Achieved
HNSW Search (k=10, 384-dim)	< 100us p50	61us
HNSW Search (k=100, 384-dim)	< 200us p50	164us
Cosine Distance (1536-dim)	< 200ns	143ns
Dot Product (384-dim)	< 50ns	33ns
Batch Distance (1000 vectors)	< 500us	237us
QPS (10K vectors, k=10)	> 10K	16,400

Implementation Status

Completed (v0.1.x)

Module	Status	Description
`simd_intrinsics`	Complete	AVX2/NEON dispatch with scalar fallback
`distance`	Complete	All 4 metrics with SimSIMD integration
`index/hnsw`	Complete	Full HNSW with serialization
`index/flat`	Complete	Linear scan baseline
`quantization`	Complete	Scalar, Product, Binary
`storage`	Complete	REDB-based with connection pooling
`storage_memory`	Complete	In-memory for WASM
`types`	Complete	Core types with serde
`error`	Complete	Error types with thiserror
`vector_db`	Complete	High-level API
`agenticdb`	Complete	AI agent memory interface

Advanced Features

Module	Status	Description
`advanced_features/filtered_search`	Complete	Metadata-based filtering
`advanced_features/hybrid_search`	Complete	Dense + sparse (BM25)
`advanced_features/mmr`	Complete	Maximal Marginal Relevance
`advanced_features/conformal_prediction`	Complete	Uncertainty quantification
`advanced_features/product_quantization`	Complete	Enhanced PQ with training

Research Features (`advanced/`)

Module	Status	Description
`hypergraph`	Experimental	Hyperedge relationships
`learned_index`	Experimental	Neural index structures
`neural_hash`	Experimental	LSH with neural tuning
`tda`	Experimental	Topological data analysis

Feature Flags

Feature	Default	Description
`default`	Yes	simd, storage, hnsw, api-embeddings, parallel
`simd`	Yes	SimSIMD acceleration
`parallel`	Yes	Rayon parallel processing
`storage`	Yes	REDB file-based storage
`hnsw`	Yes	HNSW index support
`api-embeddings`	Yes	HTTP-based embedding providers
`memory-only`	No	Pure in-memory (WASM)
`real-embeddings`	No	Deprecated, use api-embeddings

Dependencies

Core Dependencies

Dependency	Version	Purpose
`hnsw_rs`	workspace	HNSW implementation
`simsimd`	workspace	SIMD distance functions
`rayon`	workspace	Parallel iteration
`redb`	workspace	Embedded database
`bincode`	workspace	Binary serialization
`dashmap`	workspace	Concurrent hash map
`parking_lot`	workspace	Optimized locks

Optional Dependencies

Dependency	Feature	Purpose
`reqwest`	api-embeddings	HTTP client for embedding APIs
`memmap2`	storage	Memory-mapped files
`crossbeam`	parallel	Lock-free data structures

API Examples

Basic Vector Search

use ruvector_core::{VectorDB, DistanceMetric, HnswConfig};

// Create database
let config = HnswConfig {
    m: 32,
    ef_construction: 200,
    ef_search: 100,
    max_elements: 1_000_000,
};
let mut db = VectorDB::new(384, DistanceMetric::Cosine, config)?;

// Insert vectors
db.insert("doc_1".to_string(), vec![0.1; 384])?;
db.insert("doc_2".to_string(), vec![0.2; 384])?;

// Search
let query = vec![0.15; 384];
let results = db.search(&query, 10)?;

Quantized Search

use ruvector_core::quantization::{ScalarQuantized, QuantizedVector};

// Quantize vectors for storage
let quantized = ScalarQuantized::quantize(&vector);

// Distance in quantized space
let distance = quantized.distance(&other_quantized);

// Reconstruct if needed
let reconstructed = quantized.reconstruct();

Batch Operations

use ruvector_core::distance::batch_distances;

// Calculate distances to many vectors in parallel
let distances = batch_distances(
    &query,
    &corpus_vectors,
    DistanceMetric::Cosine,
)?;

References

Malkov, Y., & Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." arXiv:1603.09320.
Jegou, H., Douze, M., & Schmid, C. (2011). "Product quantization for nearest neighbor search." IEEE TPAMI.
RuVector Team. "ruvector-core Benchmarks." /crates/ruvector-core/benches/
SimSIMD Documentation. https://github.com/ashvardanian/SimSIMD

Appendix A: SIMD Register Usage

AVX2 (256-bit registers)

+-------+-------+-------+-------+-------+-------+-------+-------+
|  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |  f32  |
+-------+-------+-------+-------+-------+-------+-------+-------+
   [0]     [1]     [2]     [3]     [4]     [5]     [6]     [7]

Operations per cycle:
- _mm256_loadu_ps: Load 8 floats
- _mm256_sub_ps: 8 subtractions
- _mm256_mul_ps: 8 multiplications
- _mm256_add_ps: 8 additions

NEON (128-bit registers)

+-------+-------+-------+-------+
|  f32  |  f32  |  f32  |  f32  |
+-------+-------+-------+-------+
   [0]     [1]     [2]     [3]

Operations per cycle:
- vld1q_f32: Load 4 floats
- vsubq_f32: 4 subtractions
- vfmaq_f32: 4 fused multiply-add
- vaddvq_f32: Horizontal sum

Appendix B: Memory Layout

VectorEntry

+------------------+------------------+------------------+
|     id: String   |  vector: Vec<f32>|  metadata: JSON  |
|     (optional)   |  (required)      |  (optional)      |
+------------------+------------------+------------------+

HNSW Graph Structure

Level 3:  [v0] -------- [v5]
            \            /
Level 2:  [v0] -- [v3] -- [v5] -- [v9]
            \    /    \    /    \
Level 1:  [v0]-[v1]-[v3]-[v4]-[v5]-[v7]-[v9]
            |    |    |    |    |    |    |
Level 0:  [v0]-[v1]-[v2]-[v3]-[v4]-[v5]-[v6]-[v7]-[v8]-[v9]

Appendix C: Benchmark Results

Platform: Apple M2 (ARM64 NEON)

HNSW Search k=10 (10K vectors, 384-dim):
  p50: 61us
  p95: 89us
  p99: 112us
  Throughput: 16,400 QPS

HNSW Search k=100 (10K vectors, 384-dim):
  p50: 164us
  p95: 203us
  p99: 245us
  Throughput: 6,100 QPS

Distance Operations (1536-dim):
  Cosine: 143ns
  Euclidean: 156ns
  Dot Product: 33ns (384-dim)

Batch Distance (1000 vectors, 384-dim):
  Parallel (Rayon): 237us
  Sequential: 890us

Platform: Intel i7 (AVX2)

HNSW Search k=10 (10K vectors, 384-dim):
  p50: 72us
  p95: 105us
  p99: 134us
  Throughput: 13,900 QPS

Distance Operations (1536-dim):
  Cosine: 128ns
  Euclidean: 141ns
  Dot Product: 29ns (384-dim)

Related Decisions

ADR-002: RuvLLM Integration with Ruvector
ADR-003: SIMD Optimization Strategy
ADR-004: KV Cache Management
ADR-005: WASM Runtime Integration
ADR-006: Memory Management
ADR-007: Security Review & Technical Debt

Implementation Status (v2.1)

Component	Status	Notes
HNSW Index	✅ Implemented	M=32, ef_construct=256, 16K QPS
SIMD Distance	✅ Implemented	AVX2/NEON with fallback
Scalar Quantization	✅ Implemented	8-bit with min/max scaling
Batch Operations	✅ Implemented	Rayon parallel distances
Graph Store	✅ Implemented	Adjacency list with metadata
Persistence	✅ Implemented	Binary format with versioning

Security Status: Core components reviewed. No critical vulnerabilities in ruvector-core. See ADR-007 for full audit (RuvLLM-specific issues).

Revision History

Version	Date	Author	Changes
1.0	2026-01-18	Ruvector Architecture Team	Initial version
1.1	2026-01-19	Security Review Agent	Added implementation status, related decisions

FilesExpand file tree

ADR-001-ruvector-core-architecture.md

Latest commit

History

ADR-001-ruvector-core-architecture.md

File metadata and controls

ADR-001: Ruvector Core Architecture

Version History

Context

The Vector Database Challenge

Current State of Vector Databases

The Ruvector Vision

Decision

Adopt a Layered, SIMD-Optimized Architecture

Key Components

1. SIMD Intrinsics Layer (simd_intrinsics.rs)

Architecture Dispatch

Supported Operations

Performance Characteristics

Security Guarantees

2. Distance Metrics Layer (distance.rs)

Supported Metrics

Feature Flags

Batch Distance API

3. Index Structures (index/)

HNSW Index (index/hnsw.rs)

Flat Index

4. Quantization Strategies (quantization.rs)

Scalar Quantization (4x compression)

Product Quantization (8-16x compression)

Binary Quantization (32x compression)

Tiered Compression Strategy

5. Memory Management

Arena Allocator (arena.rs)

Lock-Free Structures (lockfree.rs)

Cache-Optimized Operations (cache_optimized.rs)

6. Storage Layer (storage.rs)

Native Storage (REDB)

Memory-Only Storage (storage_memory.rs)

Integration Points

1. Policy Memory Store

2. Session State Index

3. Witness Log for Audit

Decision Drivers

1. Performance (Sub-millisecond Latency)

2. Memory Efficiency (Quantization Support)

3. Cross-Platform Portability (WASM, Native)

4. LLM Integration

Alternatives Considered

Alternative 1: Pure Python Implementation (NumPy/FAISS)

Alternative 2: C++ with Bindings

Alternative 3: Qdrant/Milvus Integration

Alternative 4: GPU-Only Acceleration (CUDA/ROCm)

Consequences

Benefits

Risks and Mitigations

Performance Targets

Implementation Status

Completed (v0.1.x)

Advanced Features

Research Features (advanced/)

Feature Flags

Dependencies

Core Dependencies

Optional Dependencies

API Examples

Basic Vector Search

Quantized Search

Batch Operations

References

Appendix A: SIMD Register Usage

AVX2 (256-bit registers)

NEON (128-bit registers)

Appendix B: Memory Layout

VectorEntry

HNSW Graph Structure

Appendix C: Benchmark Results

Platform: Apple M2 (ARM64 NEON)

Platform: Intel i7 (AVX2)

Related Decisions

1. SIMD Intrinsics Layer (`simd_intrinsics.rs`)

2. Distance Metrics Layer (`distance.rs`)

3. Index Structures (`index/`)

HNSW Index (`index/hnsw.rs`)

4. Quantization Strategies (`quantization.rs`)

Arena Allocator (`arena.rs`)

Lock-Free Structures (`lockfree.rs`)

Cache-Optimized Operations (`cache_optimized.rs`)

6. Storage Layer (`storage.rs`)

Memory-Only Storage (`storage_memory.rs`)

Research Features (`advanced/`)