Status: Proposed Date: 2026-01-18 Authors: ruv.io, RuVector Team Deciders: Architecture Review Board SDK: Claude-Flow
Note: The storage layer described in this ADR is superseded by ADR-029 (RVF as Canonical Binary Format). All vector persistence now uses the RVF segment model.
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | 2026-01-18 | ruv.io | Initial architecture proposal |
Modern AI applications require vector databases that can:
- Store high-dimensional embeddings from LLMs and embedding models
- Search with sub-millisecond latency for real-time inference
- Scale to billions of vectors while maintaining performance
- Deploy anywhere - edge devices, browsers (WASM), cloud servers
- Integrate seamlessly with LLM inference pipelines
Existing solutions fall into several categories:
| Category | Examples | Limitations |
|---|---|---|
| Cloud-only | Pinecone | No edge deployment, vendor lock-in |
| Heavy native | Milvus, Qdrant | Complex deployment, high memory |
| Python-first | ChromaDB, FAISS | Performance overhead, no WASM |
| Learning-capable | None | No existing solutions learn from usage |
Ruvector is designed as a high-performance, learning-capable vector database implemented in Rust that:
- Achieves 61us p50 latency for k=10 search on 384-dim vectors
- Provides 2-32x memory compression through tiered quantization
- Runs anywhere - native (x86_64, ARM64), WASM (browser, edge), PostgreSQL extension
- Learns from usage via GNN layers that improve search quality over time
- Integrates with AI agent memory systems for policy, session state, and audit logs
We implement ruvector-core as the foundational vector database engine with the following architecture:
+-----------------------------------------------------------------------------+
| APPLICATION LAYER |
| AgenticDB | VectorDB API | Cypher Queries | REST/gRPC Server |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| INDEX LAYER |
| HNSW Index | Flat Index | Filtered Search | Hybrid Search | MMR |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| QUANTIZATION LAYER |
| Scalar (4x) | Product (8-16x) | Binary (32x) | Conformal Prediction |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| DISTANCE LAYER |
| Euclidean | Cosine | Dot Product | Manhattan | SIMD Dispatch |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| SIMD INTRINSICS LAYER |
| AVX2/AVX-512 (x86_64) | NEON (ARM64/Apple Silicon) | Scalar Fallback |
+-----------------------------------------------------------------------------+
|
+-----------------------------------------------------------------------------+
| STORAGE LAYER |
| REDB (native) | Memory-only (WASM) | PostgreSQL Extension |
+-----------------------------------------------------------------------------+
The performance foundation of ruvector, providing hardware-accelerated distance calculations.
pub fn euclidean_distance_simd(a: &[f32], b: &[f32]) -> f32 {
#[cfg(target_arch = "x86_64")]
{
if is_x86_feature_detected!("avx2") {
unsafe { euclidean_distance_avx2_impl(a, b) }
} else {
euclidean_distance_scalar(a, b)
}
}
#[cfg(target_arch = "aarch64")]
{
unsafe { euclidean_distance_neon_impl(a, b) }
}
#[cfg(not(any(target_arch = "x86_64", target_arch = "aarch64")))]
{
euclidean_distance_scalar(a, b)
}
}| Operation | AVX2 (x86_64) | NEON (ARM64) | Scalar Fallback |
|---|---|---|---|
| Euclidean Distance | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Dot Product | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Cosine Similarity | 8 floats/cycle | 4 floats/cycle | 1 float/cycle |
| Manhattan Distance | N/A | 4 floats/cycle | 1 float/cycle |
| Metric | AVX2 | NEON | Scalar |
|---|---|---|---|
| 512-dim Euclidean | ~16M ops/sec | ~8M ops/sec | ~2M ops/sec |
| 384-dim Cosine | ~143ns | ~200ns | ~800ns |
| 1536-dim Dot Product | ~33ns | ~50ns | ~150ns |
- Bounds checking via
assert_eq!(a.len(), b.len())prevents buffer overflows - Unaligned loads (
_mm256_loadu_ps,vld1q_f32) handle arbitrary alignment - Scalar fallback handles remainder elements after SIMD processing
High-level distance API with optional SimSIMD integration for additional acceleration.
pub enum DistanceMetric {
Euclidean, // L2 distance: sqrt(sum((a[i] - b[i])^2))
Cosine, // 1 - cosine_similarity
DotProduct, // Negative dot product (for maximization)
Manhattan, // L1 distance: sum(|a[i] - b[i]|)
}| Feature | Description | Use Case |
|---|---|---|
simd |
SimSIMD acceleration | Native builds |
parallel |
Rayon batch processing | Multi-core systems |
| None | Pure Rust fallback | WASM builds |
pub fn batch_distances(
query: &[f32],
vectors: &[Vec<f32>],
metric: DistanceMetric,
) -> Result<Vec<f32>> {
#[cfg(all(feature = "parallel", not(target_arch = "wasm32")))]
{
use rayon::prelude::*;
vectors.par_iter()
.map(|v| distance(query, v, metric))
.collect()
}
// Sequential fallback for WASM...
}Hierarchical Navigable Small World graph for approximate nearest neighbor search.
Configuration Parameters:
| Parameter | Default | Description |
|---|---|---|
m |
32 | Connections per layer (higher = better recall, more memory) |
ef_construction |
200 | Build-time search depth (higher = better graph, slower build) |
ef_search |
100 | Query-time search depth (higher = better recall, slower query) |
max_elements |
10M | Pre-allocated capacity |
Complexity Analysis:
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Insert | O(log n * m * ef_construction) | O(m * log n) per vector |
| Search | O(log n * m * ef_search) | O(ef_search) |
| Delete | O(1)* | O(1) |
*Note: HNSW deletion marks vectors as removed but does not restructure the graph.
Serialization:
pub struct HnswState {
vectors: Vec<(String, Vec<f32>)>,
id_to_idx: Vec<(String, usize)>,
idx_to_id: Vec<(usize, String)>,
next_idx: usize,
config: SerializableHnswConfig,
dimensions: usize,
metric: SerializableDistanceMetric,
}Linear scan index for small datasets or exact search.
Use Cases:
- Datasets < 10K vectors
- Exact k-NN required
- Benchmarking HNSW recall
Memory compression techniques trading precision for storage efficiency.
Quantizes f32 to u8 using min-max scaling.
pub struct ScalarQuantized {
pub data: Vec<u8>, // Quantized values
pub min: f32, // Minimum for dequantization
pub scale: f32, // Scale factor
}Characteristics:
- Compression: 4x (f32 -> u8)
- Distance calculation: Uses average scale for symmetric distance
- Reconstruction error: < 0.4% for typical embedding distributions
Divides vectors into subspaces, each quantized independently via k-means codebooks.
pub struct ProductQuantized {
pub codes: Vec<u8>, // One code per subspace
pub codebooks: Vec<Vec<Vec<f32>>>, // Learned centroids
}Training:
- K-means clustering on subspace vectors
- Codebook size typically 256 (fits in u8)
- Iterations: 10-100 for convergence
Single-bit representation based on sign.
pub struct BinaryQuantized {
pub bits: Vec<u8>, // Packed bits (8 dimensions per byte)
pub dimensions: usize,
}Characteristics:
- Compression: 32x (f32 -> 1 bit)
- Distance: Hamming distance (XOR + popcount)
- Best for: Filtering stage before exact distance on candidates
Ruvector automatically manages compression based on access patterns:
| Access Frequency | Format | Compression | Latency |
|---|---|---|---|
| Hot (>80%) | f32 | 1x | Instant |
| Warm (40-80%) | f16 | 2x | ~1us |
| Cool (10-40%) | Scalar | 4x | ~10us |
| Cold (1-10%) | Product | 8-16x | ~100us |
| Archive (<1%) | Binary | 32x | ~1ms |
Bump allocator for batch operations reducing allocation overhead.
- Crossbeam-based concurrent data structures
- Lock-free queues for batch ingestion
- Available only on
parallelfeature (not WASM)
- Prefetching hints for sequential access
- Cache-line aligned storage
- NUMA-aware allocation on supported platforms
- ACID transactions
- Memory-mapped vectors
- Configuration persistence
- Connection pooling for multiple VectorDB instances
const VECTORS_TABLE: TableDefinition<&str, &[u8]> = TableDefinition::new("vectors");
const METADATA_TABLE: TableDefinition<&str, &str> = TableDefinition::new("metadata");
const CONFIG_TABLE: TableDefinition<&str, &str> = TableDefinition::new("config");Security:
- Path traversal protection
- Validates relative paths don't escape working directory
- Pure in-memory for WASM
- No persistence
- DashMap for concurrent access
Ruvector serves as the backing store for AI agent policy memory:
+-------------------+ +-------------------+ +-------------------+
| AI Agent | | Policy Memory | | ruvector-core |
| | ----> | (AgenticDB) | ----> | |
| "What action for | | Search similar | | HNSW search |
| this situation?" | | past situations | | with metadata |
+-------------------+ +-------------------+ +-------------------+
Use Cases:
- Q-learning state-action lookups
- Contextual bandit policy retrieval
- Episodic memory for reasoning
Real-time session context for conversational AI:
+-------------------+ +-------------------+ +-------------------+
| Chat Session | | Session Index | | ruvector-core |
| | ----> | | ----> | |
| Current context | | Find relevant | | Cosine similarity |
| embedding | | past turns | | top-k search |
+-------------------+ +-------------------+ +-------------------+
Requirements:
- < 10ms latency for interactive use
- Session isolation via namespaces
- TTL-based cleanup
Cryptographically-linked audit trail:
+-------------------+ +-------------------+ +-------------------+
| Agent Action | | Witness Log | | ruvector-core |
| | ----> | | ----> | |
| Action embedding | | Store with hash | | Append-only |
| + metadata | | chain reference | | with timestamps |
+-------------------+ +-------------------+ +-------------------+
Properties:
- Immutable entries
- Hash-chain linking
- Semantic searchability
| Requirement | Implementation |
|---|---|
| 61us p50 search | SIMD-optimized distance + HNSW |
| 16,400 QPS | Parallel search with Rayon |
| Batch ingestion | Lock-free queues + bulk insert |
| Requirement | Implementation |
|---|---|
| 4x compression | Scalar quantization |
| 8-16x compression | Product quantization |
| 32x compression | Binary quantization |
| Automatic tiering | Access pattern tracking |
| Platform | Features Available |
|---|---|
| x86_64 Linux/macOS | Full (SIMD, parallel, storage) |
| ARM64 macOS (Apple Silicon) | Full (NEON, parallel, storage) |
| WASM (browser) | Memory-only, scalar fallback |
| PostgreSQL extension | Full + SQL integration |
| Requirement | Implementation |
|---|---|
| Embedding ingestion | API-based and local providers |
| Semantic search | Cosine/dot product metrics |
| RAG pipeline | Hybrid search + metadata filtering |
Rejected because:
- 10-100x slower than Rust SIMD
- No WASM support
- GIL contention in concurrent workloads
Rejected because:
- Memory safety concerns
- Complex cross-compilation
- Build system complexity (CMake)
Rejected because:
- External service dependency
- No WASM support
- Complex deployment for edge use cases
Rejected because:
- Not portable to edge/mobile
- Driver dependencies
- Overkill for < 100M vectors
- Performance: Sub-millisecond latency enables real-time AI applications
- Portability: Single codebase runs native, WASM, and PostgreSQL
- Memory Efficiency: 2-32x compression makes large datasets practical on edge
- Integration: Native Rust means zero-cost abstractions for embedding in other systems
- Learning: GNN layers can improve search quality without reindexing
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| HNSW recall < 100% | High | Medium | ef_search tuning, hybrid with exact search |
| Quantization accuracy loss | Medium | Medium | Conformal prediction bounds |
| WASM performance gap | Medium | Low | Specialized WASM-optimized builds |
| API embeddings require external call | High | Low | Local embedding option via ONNX |
| Metric | Target | Achieved |
|---|---|---|
| HNSW Search (k=10, 384-dim) | < 100us p50 | 61us |
| HNSW Search (k=100, 384-dim) | < 200us p50 | 164us |
| Cosine Distance (1536-dim) | < 200ns | 143ns |
| Dot Product (384-dim) | < 50ns | 33ns |
| Batch Distance (1000 vectors) | < 500us | 237us |
| QPS (10K vectors, k=10) | > 10K | 16,400 |
| Module | Status | Description |
|---|---|---|
simd_intrinsics |
Complete | AVX2/NEON dispatch with scalar fallback |
distance |
Complete | All 4 metrics with SimSIMD integration |
index/hnsw |
Complete | Full HNSW with serialization |
index/flat |
Complete | Linear scan baseline |
quantization |
Complete | Scalar, Product, Binary |
storage |
Complete | REDB-based with connection pooling |
storage_memory |
Complete | In-memory for WASM |
types |
Complete | Core types with serde |
error |
Complete | Error types with thiserror |
vector_db |
Complete | High-level API |
agenticdb |
Complete | AI agent memory interface |
| Module | Status | Description |
|---|---|---|
advanced_features/filtered_search |
Complete | Metadata-based filtering |
advanced_features/hybrid_search |
Complete | Dense + sparse (BM25) |
advanced_features/mmr |
Complete | Maximal Marginal Relevance |
advanced_features/conformal_prediction |
Complete | Uncertainty quantification |
advanced_features/product_quantization |
Complete | Enhanced PQ with training |
| Module | Status | Description |
|---|---|---|
hypergraph |
Experimental | Hyperedge relationships |
learned_index |
Experimental | Neural index structures |
neural_hash |
Experimental | LSH with neural tuning |
tda |
Experimental | Topological data analysis |
| Feature | Default | Description |
|---|---|---|
default |
Yes | simd, storage, hnsw, api-embeddings, parallel |
simd |
Yes | SimSIMD acceleration |
parallel |
Yes | Rayon parallel processing |
storage |
Yes | REDB file-based storage |
hnsw |
Yes | HNSW index support |
api-embeddings |
Yes | HTTP-based embedding providers |
memory-only |
No | Pure in-memory (WASM) |
real-embeddings |
No | Deprecated, use api-embeddings |
| Dependency | Version | Purpose |
|---|---|---|
hnsw_rs |
workspace | HNSW implementation |
simsimd |
workspace | SIMD distance functions |
rayon |
workspace | Parallel iteration |
redb |
workspace | Embedded database |
bincode |
workspace | Binary serialization |
dashmap |
workspace | Concurrent hash map |
parking_lot |
workspace | Optimized locks |
| Dependency | Feature | Purpose |
|---|---|---|
reqwest |
api-embeddings | HTTP client for embedding APIs |
memmap2 |
storage | Memory-mapped files |
crossbeam |
parallel | Lock-free data structures |
use ruvector_core::{VectorDB, DistanceMetric, HnswConfig};
// Create database
let config = HnswConfig {
m: 32,
ef_construction: 200,
ef_search: 100,
max_elements: 1_000_000,
};
let mut db = VectorDB::new(384, DistanceMetric::Cosine, config)?;
// Insert vectors
db.insert("doc_1".to_string(), vec![0.1; 384])?;
db.insert("doc_2".to_string(), vec![0.2; 384])?;
// Search
let query = vec![0.15; 384];
let results = db.search(&query, 10)?;use ruvector_core::quantization::{ScalarQuantized, QuantizedVector};
// Quantize vectors for storage
let quantized = ScalarQuantized::quantize(&vector);
// Distance in quantized space
let distance = quantized.distance(&other_quantized);
// Reconstruct if needed
let reconstructed = quantized.reconstruct();use ruvector_core::distance::batch_distances;
// Calculate distances to many vectors in parallel
let distances = batch_distances(
&query,
&corpus_vectors,
DistanceMetric::Cosine,
)?;-
Malkov, Y., & Yashunin, D. (2018). "Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs." arXiv:1603.09320.
-
Jegou, H., Douze, M., & Schmid, C. (2011). "Product quantization for nearest neighbor search." IEEE TPAMI.
-
RuVector Team. "ruvector-core Benchmarks." /crates/ruvector-core/benches/
-
SimSIMD Documentation. https://github.com/ashvardanian/SimSIMD
+-------+-------+-------+-------+-------+-------+-------+-------+
| f32 | f32 | f32 | f32 | f32 | f32 | f32 | f32 |
+-------+-------+-------+-------+-------+-------+-------+-------+
[0] [1] [2] [3] [4] [5] [6] [7]
Operations per cycle:
- _mm256_loadu_ps: Load 8 floats
- _mm256_sub_ps: 8 subtractions
- _mm256_mul_ps: 8 multiplications
- _mm256_add_ps: 8 additions
+-------+-------+-------+-------+
| f32 | f32 | f32 | f32 |
+-------+-------+-------+-------+
[0] [1] [2] [3]
Operations per cycle:
- vld1q_f32: Load 4 floats
- vsubq_f32: 4 subtractions
- vfmaq_f32: 4 fused multiply-add
- vaddvq_f32: Horizontal sum
+------------------+------------------+------------------+
| id: String | vector: Vec<f32>| metadata: JSON |
| (optional) | (required) | (optional) |
+------------------+------------------+------------------+
Level 3: [v0] -------- [v5]
\ /
Level 2: [v0] -- [v3] -- [v5] -- [v9]
\ / \ / \
Level 1: [v0]-[v1]-[v3]-[v4]-[v5]-[v7]-[v9]
| | | | | | |
Level 0: [v0]-[v1]-[v2]-[v3]-[v4]-[v5]-[v6]-[v7]-[v8]-[v9]
HNSW Search k=10 (10K vectors, 384-dim):
p50: 61us
p95: 89us
p99: 112us
Throughput: 16,400 QPS
HNSW Search k=100 (10K vectors, 384-dim):
p50: 164us
p95: 203us
p99: 245us
Throughput: 6,100 QPS
Distance Operations (1536-dim):
Cosine: 143ns
Euclidean: 156ns
Dot Product: 33ns (384-dim)
Batch Distance (1000 vectors, 384-dim):
Parallel (Rayon): 237us
Sequential: 890us
HNSW Search k=10 (10K vectors, 384-dim):
p50: 72us
p95: 105us
p99: 134us
Throughput: 13,900 QPS
Distance Operations (1536-dim):
Cosine: 128ns
Euclidean: 141ns
Dot Product: 29ns (384-dim)
- ADR-002: RuvLLM Integration with Ruvector
- ADR-003: SIMD Optimization Strategy
- ADR-004: KV Cache Management
- ADR-005: WASM Runtime Integration
- ADR-006: Memory Management
- ADR-007: Security Review & Technical Debt
| Component | Status | Notes |
|---|---|---|
| HNSW Index | ✅ Implemented | M=32, ef_construct=256, 16K QPS |
| SIMD Distance | ✅ Implemented | AVX2/NEON with fallback |
| Scalar Quantization | ✅ Implemented | 8-bit with min/max scaling |
| Batch Operations | ✅ Implemented | Rayon parallel distances |
| Graph Store | ✅ Implemented | Adjacency list with metadata |
| Persistence | ✅ Implemented | Binary format with versioning |
Security Status: Core components reviewed. No critical vulnerabilities in ruvector-core. See ADR-007 for full audit (RuvLLM-specific issues).
| Version | Date | Author | Changes |
|---|---|---|---|
| 1.0 | 2026-01-18 | Ruvector Architecture Team | Initial version |
| 1.1 | 2026-01-19 | Security Review Agent | Added implementation status, related decisions |