RuVector-Postgres is a high-performance, drop-in replacement for the pgvector extension, built in Rust using the pgrx framework. It provides SIMD-optimized vector similarity search with advanced indexing algorithms, quantization support, and hybrid search capabilities.
- pgvector API Compatibility: 100% compatible SQL interface with pgvector
- Superior Performance: 2-10x faster than pgvector through SIMD and algorithmic optimizations
- Memory Efficiency: Up to 32x memory reduction via quantization
- Neon Compatibility: Designed for serverless PostgreSQL (Neon, Supabase, etc.)
- Production Ready: Battle-tested algorithms from ruvector-core
┌─────────────────────────────────────────────────────────────────────────────┐
│ PostgreSQL Server │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────────────┐ │
│ │ RuVector-Postgres Extension │ │
│ ├─────────────────────────────────────────────────────────────────────────┤ │
│ │ │ │
│ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │
│ │ │ Vector │ │ HNSW │ │ IVFFlat │ │ Flat Index │ │ │
│ │ │ Type │ │ Index │ │ Index │ │ (fallback) │ │ │
│ │ │ │ │ │ │ │ │ │ │ │
│ │ │ - ruvector │ │ - O(log n) │ │ - O(√n) │ │ - O(n) │ │ │
│ │ │ - halfvec │ │ - 95%+ rec │ │ - clusters │ │ - exact search │ │ │
│ │ │ - sparsevec │ │ - SIMD ops │ │ - training │ │ │ │ │
│ │ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ └────────┬────────┘ │ │
│ │ │ │ │ │ │ │
│ │ ┌──────┴────────────────┴────────────────┴───────────────────┴────────┐ │ │
│ │ │ SIMD Distance Layer │ │ │
│ │ │ │ │ │
│ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────┐ │ │ │
│ │ │ │ AVX-512 │ │ AVX2 │ │ NEON │ │ Scalar │ │ │ │
│ │ │ │ (x86_64) │ │ (x86_64) │ │ (ARM64) │ │ Fallback │ │ │ │
│ │ │ └────────────┘ └────────────┘ └────────────┘ └────────────────┘ │ │ │
│ │ └──────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Quantization Engine │ │ │
│ │ │ │ │ │
│ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────────┐ │ │ │
│ │ │ │ Scalar │ │ Product │ │ Binary │ │ Half-Prec │ │ │ │
│ │ │ │ (4x) │ │ (8-16x) │ │ (32x) │ │ (2x) │ │ │ │
│ │ │ └────────────┘ └────────────┘ └────────────┘ └────────────────┘ │ │ │
│ │ └──────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────────────────────────┐ │ │
│ │ │ Hybrid Search Engine │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────────────┐ ┌─────────────────────┐ ┌──────────────┐ │ │ │
│ │ │ │ Vector Similarity │ │ BM25 Text Search │ │ RRF Fusion │ │ │ │
│ │ │ │ (dense) │ │ (sparse) │ │ (ranking) │ │ │ │
│ │ │ └─────────────────────┘ └─────────────────────┘ └──────────────┘ │ │ │
│ │ └──────────────────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘
Varlena Memory Layout (Zero-Copy Design)
┌─────────────────────────────────────────────────────────────────┐
│ RuVector Varlena Layout │
├─────────────────────────────────────────────────────────────────┤
│ Bytes 0-3 │ Bytes 4-5 │ Bytes 6-7 │ Bytes 8+ │
│ vl_len_ │ dimensions │ _unused │ f32 data... │
│ (varlena hdr)│ (u16) │ (padding) │ [dim0, dim1...] │
├─────────────────────────────────────────────────────────────────┤
│ 4 bytes │ 2 bytes │ 2 bytes │ 4*dims bytes │
│ PostgreSQL │ pgvector │ Alignment │ Vector data │
│ header │ compatible │ to 8 bytes │ (f32 floats) │
└─────────────────────────────────────────────────────────────────┘
Key Layout Features:
- Varlena Header (VARHDRSZ): Standard PostgreSQL variable-length type header (4 bytes)
- Dimensions (u16): Compatible with pgvector's 16-bit dimension count (max 16,000)
- Padding (2 bytes): Ensures f32 data is 8-byte aligned for efficient SIMD access
- Data Array: Contiguous f32 elements for zero-copy SIMD operations
Memory Alignment Requirements:
- Total header size: 8 bytes (4 + 2 + 2)
- Data alignment: 8-byte aligned for optimal performance
- SIMD alignment:
- AVX-512 prefers 64-byte alignment (checked at runtime)
- AVX2 prefers 32-byte alignment (checked at runtime)
- Unaligned loads used as fallback (minimal performance penalty)
Zero-Copy Access Pattern:
// Direct pointer access to varlena data (zero allocation)
pub unsafe fn as_ptr(&self) -> *const f32 {
// Skip varlena header (4 bytes) + RuVectorHeader (4 bytes)
let base = self as *const _ as *const u8;
base.add(VARHDRSZ + RuVectorHeader::SIZE) as *const f32
}
// SIMD functions operate directly on this pointer
let distance = l2_distance_ptr_avx512(vec_a.as_ptr(), vec_b.as_ptr(), dims);SQL Usage:
-- Dimensions: 1 to 16,000
-- Storage: 4 bytes per dimension (f32) + 8 bytes header
CREATE TABLE items (
id SERIAL PRIMARY KEY,
embedding ruvector(1536) -- OpenAI embedding dimensions
);
-- Total storage per vector: 8 + (1536 * 4) = 6,152 bytesVarlena Layout:
┌─────────────────────────────────────────────────────────────────┐
│ HalfVec Varlena Layout │
├─────────────────────────────────────────────────────────────────┤
│ Bytes 0-3 │ Bytes 4-5 │ Bytes 6-7 │ Bytes 8+ │
│ vl_len_ │ dimensions │ _unused │ f16 data... │
│ (varlena hdr)│ (u16) │ (padding) │ [dim0, dim1...] │
├─────────────────────────────────────────────────────────────────┤
│ 4 bytes │ 2 bytes │ 2 bytes │ 2*dims bytes │
│ PostgreSQL │ pgvector │ Alignment │ Half-precision │
│ header │ compatible │ to 8 bytes │ (f16 floats) │
└─────────────────────────────────────────────────────────────────┘
Storage Benefits:
- 50% memory savings vs ruvector
- Minimal accuracy loss (<0.01% for most embeddings)
- SIMD f16 support on modern CPUs (AVX-512 FP16, ARM Neon FP16)
-- Storage: 2 bytes per dimension (f16) + 8 bytes header
-- 50% memory savings, minimal accuracy loss
CREATE TABLE items (
id SERIAL PRIMARY KEY,
embedding halfvec(1536)
);
-- Total storage per vector: 8 + (1536 * 2) = 3,080 bytesVarlena Layout:
┌─────────────────────────────────────────────────────────────────┐
│ SparseVec Varlena Layout │
├─────────────────────────────────────────────────────────────────┤
│ Bytes 0-3 │ Bytes 4-7 │ Bytes 8-11 │ Bytes 12+ │
│ vl_len_ │ dimensions │ nnz │ indices+values │
│ (varlena hdr)│ (u32) │ (u32) │ [(idx,val)...] │
├─────────────────────────────────────────────────────────────────┤
│ 4 bytes │ 4 bytes │ 4 bytes │ 8*nnz bytes │
│ PostgreSQL │ Total dims │ Non-zero │ (u32,f32) pairs │
│ header │ (full size) │ count │ for sparse data │
└─────────────────────────────────────────────────────────────────┘
Storage: Only non-zero elements stored (u32 index + f32 value pairs)
-- Storage: Only non-zero elements stored
-- Ideal for high-dimensional sparse data (BM25, TF-IDF)
CREATE TABLE items (
id SERIAL PRIMARY KEY,
sparse_embedding sparsevec(50000)
);
-- Total storage: 12 + (nnz * 8) bytes
-- Example: 100 non-zero out of 50,000 = 12 + 800 = 812 bytes| Operator | Distance Metric | Description | SIMD Optimized |
|---|---|---|---|
<-> |
L2 (Euclidean) | sqrt(sum((a[i] - b[i])^2)) |
✓ |
<#> |
Inner Product | -sum(a[i] * b[i]) (negative for ORDER BY) |
✓ |
<=> |
Cosine | 1 - (a·b)/(‖a‖‖b‖) |
✓ |
<+> |
L1 (Manhattan) | sum(abs(a[i] - b[i])) |
✓ |
<~> |
Hamming | Bit differences (binary vectors) | ✓ |
<%> |
Jaccard | Set similarity (sparse vectors) | - |
Runtime Feature Detection:
/// Initialize SIMD dispatch table at extension load
pub fn init_simd_dispatch() {
#[cfg(target_arch = "x86_64")]
{
if is_x86_feature_detected!("avx512f") {
SIMD_LEVEL.store(SimdLevel::AVX512, Ordering::Relaxed);
return;
}
if is_x86_feature_detected!("avx2") {
SIMD_LEVEL.store(SimdLevel::AVX2, Ordering::Relaxed);
return;
}
}
#[cfg(target_arch = "aarch64")]
{
if is_aarch64_feature_detected!("neon") {
SIMD_LEVEL.store(SimdLevel::NEON, Ordering::Relaxed);
return;
}
}
SIMD_LEVEL.store(SimdLevel::Scalar, Ordering::Relaxed);
}Dispatch Flow:
┌─────────────────────────────────────────────────────────────────┐
│ Distance Function Call (SQL Operator) │
├─────────────────────────────────────────────────────────────────┤
│ ↓ │
│ ┌─────────────────────────────────────────────────────────────┐│
│ │ euclidean_distance(a: &[f32], b: &[f32]) -> f32 ││
│ │ ↓ ││
│ │ Check SIMD_LEVEL (atomic read, cached) ││
│ └─────────────────────────────────────────────────────────────┘│
│ ↓ │
│ ┌────────────────────┴────────────────────┐ │
│ ↓ ↓ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ AVX-512? │ │ AVX2? │ │ NEON/Scalar? │ │
│ └──────┬───────┘ └──────┬───────┘ └──────┬───────────────┘ │
│ ↓ ↓ ↓ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────────────┐ │
│ │ 16 floats/ │ │ 8 floats/ │ │ 4 floats (NEON) or │ │
│ │ iteration │ │ iteration │ │ 1 float (scalar) │ │
│ │ │ │ │ │ │ │
│ │ _mm512_* │ │ _mm256_* │ │ vaddq_f32/for loop │ │
│ │ FMA support │ │ FMA support │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────────────┘ │
│ ↓ ↓ ↓ │
│ └────────────────────┬─────────────────┘ │
│ ↓ │
│ ┌──────────────────┐ │
│ │ Return distance │ │
│ └──────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Performance Characteristics:
| SIMD Level | Floats/Iter | Relative Speed | Instruction Examples |
|---|---|---|---|
| AVX-512 | 16 | 16x | _mm512_loadu_ps, _mm512_fmadd_ps |
| AVX2 | 8 | 8x | _mm256_loadu_ps, _mm256_fmadd_ps |
| NEON | 4 | 4x | vld1q_f32, vmlaq_f32 |
| Scalar | 1 | 1x | Standard f32 operations |
TOAST (The Oversized-Attribute Storage Technique):
PostgreSQL automatically TOASTs values > ~2KB. RuVector handles this transparently:
/// Detoast varlena pointer if needed
#[inline]
unsafe fn detoast_vector(raw: *mut varlena) -> *mut varlena {
if VARATT_IS_EXTENDED(raw) {
// PostgreSQL automatically detoasts
pg_detoast_datum(raw as *const varlena) as *mut varlena
} else {
raw
}
}When TOAST Occurs:
- RuVector: ~512+ dimensions (2048+ bytes)
- HalfVec: ~1024+ dimensions (2048+ bytes)
- Automatic compression and external storage
Performance Impact:
- First access: Detoasting overhead (~10-50μs)
- Subsequent access: Cached in PostgreSQL buffer
- Index operations: Typically work with detoasted values
CREATE INDEX ON items USING ruhnsw (embedding ruvector_l2_ops)
WITH (m = 16, ef_construction = 200);Parameters:
m: Maximum connections per layer (default: 16, range: 2-100)ef_construction: Build-time search breadth (default: 64, range: 4-1000)
Characteristics:
- Search: O(log n)
- Insert: O(log n)
- Memory: ~1.5x index overhead
- Recall: 95-99%+ with tuned parameters
HNSW Index Layout:
┌─────────────────────────────────────────────────────────────────┐
│ HNSW Index Structure │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Layer L (top): ○──────○ │
│ │ │ │
│ Layer L-1: ○──○───○──○ │
│ │ │ │ │ │
│ Layer L-2: ○──○───○──○──○──○ │
│ │ │ │ │ │ │ │
│ Layer 0 (base): ○──○───○──○──○──○──○──○──○ │
│ │
│ Entry Point: Top layer node │
│ Search: Greedy descent + local beam search │
│ │
└─────────────────────────────────────────────────────────────────┘
CREATE INDEX ON items USING ruivfflat (embedding ruvector_l2_ops)
WITH (lists = 100);Parameters:
lists: Number of clusters (default: sqrt(n), recommended: rows/1000 to rows/10000)
Characteristics:
- Search: O(√n)
- Insert: O(1) after training
- Memory: Minimal overhead
- Recall: 90-95% with
probes = sqrt(lists)
┌─────────────────────────────────────────────────────────────────┐
│ Query: SELECT ... ORDER BY v <-> q │
├─────────────────────────────────────────────────────────────────┤
│ │
│ 1. Parse & Plan │
│ └─> Identify index scan opportunity │
│ │
│ 2. Index Selection │
│ └─> Choose HNSW/IVFFlat based on cost estimation │
│ │
│ 3. Index Scan (SIMD-accelerated) │
│ ├─> HNSW: Navigate layers, beam search at layer 0 │
│ └─> IVFFlat: Probe nearest centroids, scan cells │
│ │
│ 4. Distance Calculation (per candidate) │
│ ├─> Detoast vector if needed │
│ ├─> Zero-copy pointer access │
│ ├─> SIMD dispatch (AVX-512/AVX2/NEON/Scalar) │
│ └─> Full precision or quantized distance │
│ │
│ 5. Result Aggregation │
│ └─> Return top-k with distances │
│ │
└─────────────────────────────────────────────────────────────────┘
| Feature | pgvector 0.8.0 | RuVector-Postgres |
|---|---|---|
| Vector dimensions | 16,000 max | 16,000 max |
| HNSW index | ✓ | ✓ (optimized) |
| IVFFlat index | ✓ | ✓ (optimized) |
| Half-precision | ✓ | ✓ |
| Sparse vectors | ✓ | ✓ |
| Binary quantization | ✓ | ✓ |
| Product quantization | ✗ | ✓ |
| Scalar quantization | ✗ | ✓ |
| AVX-512 optimized | Partial | Full |
| ARM NEON optimized | ✗ | ✓ |
| Zero-copy access | ✗ | ✓ |
| Varlena alignment | Basic | Optimized (8-byte) |
| Hybrid search | ✗ | ✓ |
| Filtered HNSW | Partial | ✓ |
| Parallel queries | ✓ | ✓ (PARALLEL SAFE) |
RuVector-Postgres is fully thread-safe:
- Read operations: Lock-free concurrent reads
- Write operations: Fine-grained locking per graph layer
- Index builds: Parallel with work-stealing
// Internal synchronization primitives
pub struct HnswIndex {
layers: Vec<RwLock<Layer>>, // Per-layer locks
entry_point: AtomicUsize, // Lock-free entry point
node_count: AtomicUsize, // Lock-free counter
vectors: DashMap<NodeId, Vec<f32>>, // Concurrent hashmap
}[dependencies]
pgrx = "0.12" # PostgreSQL extension framework
simsimd = "5.9" # SIMD-accelerated distance functions
parking_lot = "0.12" # Fast synchronization primitives
dashmap = "6.0" # Concurrent hashmap
rayon = "1.10" # Data parallelism
half = "2.4" # Half-precision floats
bitflags = "2.6" # Compact flags storage-- Parallel index build (uses all available cores)
SET maintenance_work_mem = '8GB';
SET max_parallel_maintenance_workers = 8;
CREATE INDEX CONCURRENTLY ON items
USING ruhnsw (embedding ruvector_l2_ops)
WITH (m = 32, ef_construction = 400);-- Adjust search quality vs speed tradeoff
SET ruvector.ef_search = 200; -- Higher = better recall, slower
SET ruvector.probes = 10; -- For IVFFlat: more probes = better recall
-- Use iterative scan for filtered queries
SELECT * FROM items
WHERE category = 'electronics'
ORDER BY embedding <-> '[0.1, 0.2, ...]'::ruvector
LIMIT 10;crates/ruvector-postgres/
├── Cargo.toml # Rust dependencies
├── ruvector.control # Extension metadata
├── docs/
│ ├── ARCHITECTURE.md # This file
│ ├── NEON_COMPATIBILITY.md # Neon deployment guide
│ ├── SIMD_OPTIMIZATION.md # SIMD implementation details
│ ├── INSTALLATION.md # Installation instructions
│ ├── API.md # SQL API reference
│ └── MIGRATION.md # Migration from pgvector
├── sql/
│ ├── ruvector--0.1.0.sql # Extension SQL definitions
│ └── ruvector--0.0.0--0.1.0.sql # Migration script
├── src/
│ ├── lib.rs # Extension entry point
│ ├── types/
│ │ ├── mod.rs
│ │ ├── vector.rs # ruvector type (zero-copy varlena)
│ │ ├── halfvec.rs # Half-precision vector
│ │ └── sparsevec.rs # Sparse vector
│ ├── distance/
│ │ ├── mod.rs
│ │ ├── simd.rs # SIMD implementations (AVX-512/AVX2/NEON)
│ │ └── scalar.rs # Scalar fallbacks
│ ├── index/
│ │ ├── mod.rs
│ │ ├── hnsw.rs # HNSW implementation
│ │ ├── ivfflat.rs # IVFFlat implementation
│ │ └── scan.rs # Index scan operators
│ ├── quantization/
│ │ ├── mod.rs
│ │ ├── scalar.rs # SQ8 quantization
│ │ ├── product.rs # PQ quantization
│ │ └── binary.rs # Binary quantization
│ ├── operators.rs # SQL operators (<->, <=>, etc.)
│ └── functions.rs # SQL functions
└── tests/
├── integration_tests.rs
└── compatibility_tests.rs # pgvector compatibility
- 0.1.0: Initial release with pgvector compatibility
- HNSW and IVFFlat indexes
- SIMD-optimized distance functions
- Scalar quantization support
- Neon compatibility
- Zero-copy varlena access
- AVX-512/AVX2/NEON support
MIT License - Same as ruvector-core