RVF — RuVector Format
One file. Store vectors. Ship models. Boot services. Prove everything.
🚀 Quick Start • 📦 What It Contains • 🧠 Cognitive Engines • 🏗️ Architecture • ⚡ Performance • 📊 Comparison
dsp
RVF (RuVector Format) is a universal binary substrate that merges database, model, graph engine, kernel, and attestation into a single deployable file.
A .rvf file can store vector embeddings, carry LoRA adapter deltas, embed GNN graph state, include a bootable Linux microkernel, run queries in a 5.5 KB WASM runtime, and prove every operation through a cryptographic witness chain — all in one file that runs anywhere from a browser to bare metal.
This is not a database format. It is an executable knowledge unit.
| Capability | How | Segment |
|---|---|---|
| 🖥️ Self-boot as a microservice | The file contains a real Linux kernel. Drop it on a VM and it boots as a running service in under 125 ms. No install, no dependencies. | KERNEL_SEG (0x0E) |
| ⚡ Hardware-speed lookups via eBPF | Hot vectors are served directly in the Linux kernel data path, bypassing userspace entirely. Three real C programs handle distance, filtering, and routing. | EBPF_SEG (0x0F) |
| 🌐 Runs in any browser | A 5.5 KB WebAssembly runtime lets the same file serve queries in a browser tab with zero backend. | WASM_SEG |
| Capability | How | Segment |
|---|---|---|
| 🧠 Ship models, graphs, and quantum state | One file carries LoRA fine-tune weights, graph neural network state, and quantum circuit snapshots alongside vectors. No separate model registry needed. | OVERLAY / GRAPH / SKETCH |
| 🌿 Git-like branching | Create a child file that shares all parent data. Only changed vectors are copied. A 1M-vector parent with 100 edits produces a ~2.5 MB child instead of a 512 MB copy. | COW_MAP / MEMBERSHIP (0x20-0x23) |
| 📊 Instant queries while loading | Start answering queries at 70% accuracy immediately. Accuracy improves to 95%+ as the full index loads in the background. No waiting. | INDEX_SEG |
| 🔍 Search with filters | Combine vector similarity with metadata conditions like "genre = sci-fi AND year > 2020" in a single query. | META_IDX_SEG (0x0D) |
| 💥 Never corrupts on crash | Power loss mid-write? The file is always readable. Append-only design means incomplete writes are simply ignored on recovery. No write-ahead log needed. | Format rule |
RVF treats security as a structural property of the format, not an afterthought. Every segment can be individually signed, every operation is hash-chained into a tamper-evident ledger, and every derived file carries a cryptographic link to its parent. The result: you can hand someone a .rvf file and they can independently verify what data is inside, who produced it, what operations were performed, and whether anything was altered — without trusting the sender.
| Capability | How | Segment |
|---|---|---|
| 🔗 Tamper-evident audit trail | Every insert, query, and deletion is recorded in a SHAKE-256 hash-linked chain. Change one byte anywhere and the entire chain fails verification. | WITNESS_SEG (0x0A) |
| 🔐 Kernel locked to its data | A 128-byte KernelBinding footer ties each signed kernel to its manifest hash. Prevents segment-swap attacks — the kernel only boots if the data it was built for is present and unmodified. |
KERNEL_SEG + CRYPTO_SEG |
| 🛡️ Quantum-safe signatures | Segments can be signed with ML-DSA-65 (FIPS 204) and SLH-DSA-128s alongside Ed25519. Dual-signing means files stay trustworthy even after quantum computers break classical crypto. | CRYPTO_SEG (0x0C) |
| 🧬 Track where data came from | Every file records its parent, grandparent, and full derivation history with cryptographic hashes — DNA-style lineage. Verify that a child was legitimately derived from its parent without accessing the parent file. | MANIFEST_SEG |
| 🏛️ TEE attestation | Record hardware attestation quotes from Intel SGX, AMD SEV-SNP, Intel TDX, and ARM CCA. Proves vector operations ran inside a verified secure enclave. | CRYPTO_SEG |
| 🛡️ Adversarial hardening | Input validation, rate limiting, and resource exhaustion guards. Declarative SecurityPolicy configuration prevents denial-of-service and malformed-input attacks. |
Runtime |
| Capability | How | Segment |
|---|---|---|
| 🤖 Plug into AI agents | An MCP server lets Claude Code, Cursor, and other AI tools create, query, and manage vector stores directly. | npm package |
| 📦 Use from any language | Published as 14 Rust crates, 6 adapters, 4 npm packages, a CLI tool, and an HTTP server. Works from Rust, Node.js, browsers, and the command line. | 14 crates + 6 adapters + 4 npm |
| ♻️ Always backward-compatible | Old tools skip new segment types they don't understand. A file with COW branching still works in a reader that only knows basic vectors. | Format rule |
📦 Anatomy of a .rvf Cognitive Container (24 segment types)
┌─────────────────────────────────────────────────────────────┐
│ .rvf file │
├──────────────────────────┬──────────────────────────────────┤
│ 📋 Core Data │ 🧠 AI & Models │
│ MANIFEST (4 KB root) │ OVERLAY (LoRA deltas) │
│ VEC_SEG (embeddings) │ GRAPH (GNN state) │
│ INDEX_SEG (HNSW graph) │ SKETCH (quantum / VQE) │
│ QUANT (codebooks) │ META (key-value) │
│ HOT (promoted) │ PROFILE (domain config) │
│ META_IDX (filter idx) │ JOURNAL (mutations) │
├──────────────────────────┼──────────────────────────────────┤
│ 🌿 COW Branching │ 🔐 Security & Trust │
│ COW_MAP (ownership) │ WITNESS (audit chain) │
│ REFCOUNT (ref counts) │ CRYPTO (signatures) │
│ MEMBERSHIP (visibility) │ KERNEL (Linux + binding) │
│ DELTA (sparse patch)│ EBPF (XDP / TC / socket) │
│ │ WASM (5.5 KB runtime) │
├──────────────────────────┴──────────────────────────────────┤
│ │
│ Store it ─── single-file vector DB, no external deps │
│ Ship it ─── wire-format streaming, one file = one unit │
│ Run it ─── boots Linux, runs in browser, eBPF in kernel │
│ Trust it ─── witness chain + attestation + PQ signatures │
│ Branch it ── COW at cluster granularity, <3 ms │
│ Track it ─── DNA-style lineage from parent to child │
│ │
│ ┌──────────┐ ┌──────────┐ │
│ │ 🖥️ Boots │ │ 🌐 Runs │ │
│ │ as Linux │ │ in any │ │
│ │ microVM │ │ browser │ │
│ │ <125 ms │ │ 5.5 KB │ │
│ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
The same .rvf file runs on servers, browsers (WASM), edge devices, TEE enclaves, Firecracker microVMs, and in the Linux kernel data path (eBPF) — no conversion, no re-indexing, no external dependencies.
| Crate | Version | Description |
|---|---|---|
rvf-types |
0.2.0 | Segment types, 24 headers, quality, security, AGI container types (no_std) |
rvf-wire |
0.1.0 | Wire format read/write (no_std) |
rvf-manifest |
0.1.0 | Two-level manifest, FileIdentity, COW pointers |
rvf-quant |
0.1.0 | Scalar, product, and binary quantization |
rvf-index |
0.1.0 | HNSW progressive indexing (Layer A/B/C) |
rvf-crypto |
0.2.0 | SHAKE-256, Ed25519, witness chains, seed crypto |
rvf-runtime |
0.2.0 | Full store API, COW engine, AGI containers, QR seeds, safety net |
rvf-kernel |
0.1.0 | Linux kernel builder, initramfs, Docker pipeline |
rvf-ebpf |
0.1.0 | BPF C compiler (XDP, socket filter, TC) |
rvf-launch |
0.1.0 | QEMU microvm launcher, KVM/TCG, QMP |
rvf-server |
0.1.0 | HTTP REST + TCP streaming server |
rvf-import |
0.1.0 | JSON, CSV, NumPy importers |
rvf-cli |
0.1.0 | Unified CLI with 17 subcommands |
rvf-solver-wasm |
0.1.0 | Thompson Sampling temporal solver (WASM, no_std) |
| Package | Version | Description |
|---|---|---|
@ruvector/rvf |
0.1.0 | Unified TypeScript SDK |
@ruvector/rvf-node |
0.1.0 | Node.js N-API native bindings |
@ruvector/rvf-wasm |
0.1.0 | WASM browser package |
@ruvector/rvf-mcp-server |
0.1.0 | MCP server for AI agents |
| Platform | Status | Notes |
|---|---|---|
| Linux (x86_64, aarch64) | Full | KVM acceleration, eBPF, SIMD (AVX2/NEON) |
| macOS (x86_64, Apple Silicon) | Full | TCG fallback for QEMU, NEON SIMD on ARM |
| Windows (x86_64) | Core | Store, query, index, crypto work. QEMU launcher requires WSL or Windows QEMU. |
| WASM (browser, edge) | Full | 5.5 KB microkernel, ~46 KB control plane |
| no_std (embedded) | Types only | rvf-types and rvf-wire are no_std compatible |
# Rust crate (library)
cargo add rvf-runtime
# CLI tool
cargo install rvf-cli
# or build from source:
cd crates/rvf && cargo build -p rvf-cli --release
# Node.js / npm
npm install @ruvector/rvf-node
# WASM (browser / edge)
rustup target add wasm32-unknown-unknown
cargo build -p rvf-wasm --target wasm32-unknown-unknown --release
# → target/wasm32-unknown-unknown/release/rvf_wasm.wasm (~46 KB)
# MCP Server (for Claude Code, Cursor, etc.)
npx @ruvector/rvf-mcp-server --transport stdio# Cargo.toml
[dependencies]
rvf-runtime = "0.2" # full store API
rvf-types = "0.2" # types only (no_std)
rvf-wire = "0.1" # wire format (no_std)
rvf-crypto = "0.2" # signatures + witness chains
rvf-import = "0.1" # JSON/CSV/NumPy importersuse rvf_runtime::{RvfStore, options::{RvfOptions, QueryOptions, DistanceMetric}};
let mut store = RvfStore::create("vectors.rvf", RvfOptions {
dimension: 384,
metric: DistanceMetric::Cosine,
..Default::default()
})?;
// Insert
store.ingest_batch(&[&embedding], &[1], None)?;
// Query
let results = store.query(&query, 10, &QueryOptions::default())?;
// Derive a child with lineage tracking
let child = store.derive("child.rvf", DerivationType::Filter, None)?;
// Embed a kernel — file now boots as a microservice
store.embed_kernel(0x00, 0x01, 0, &kernel_image, 8080, None)?;
store.close()?;npm install @ruvector/rvf-nodeconst { RvfDatabase } = require('@ruvector/rvf-node');
// Create, insert, query
const db = RvfDatabase.create('vectors.rvf', { dimension: 384 });
db.ingestBatch(new Float32Array(384), [1]);
const results = db.query(new Float32Array(384), 10);
// Lineage & inspection
console.log(db.fileId()); // unique file UUID
console.log(db.dimension()); // 384
console.log(db.segments()); // [{ type, id, size }]
db.close();<script type="module">
import init, { WasmRvfStore } from './rvf_wasm.js';
await init();
const store = WasmRvfStore.create(384);
store.ingest(1, new Float32Array(384));
const results = store.query(new Float32Array(384), 10);
console.log(results); // [{ id, distance }]
</script>The WASM binary is ~46 KB (control plane with in-memory store) or ~5.5 KB (tile microkernel for Cognitum). No backend required.
# Full lifecycle from the command line
rvf create vectors.rvf --dimension 384
rvf ingest vectors.rvf --input data.json --format json
rvf query vectors.rvf --vector "0.1,0.2,..." --k 10
rvf status vectors.rvf
rvf inspect vectors.rvf # show all segments
rvf compact vectors.rvf # reclaim deleted space
rvf derive parent.rvf child.rvf --type filter
rvf serve vectors.rvf --port 8080
# Machine-readable output
rvf status vectors.rvf --jsonuse rvf_adapter_rvlite::{RvliteCollection, RvliteConfig};
let mut col = RvliteCollection::create(RvliteConfig::new("vectors.rvf", 128))?;
col.add(1, &[0.1; 128])?;
let matches = col.search(&[0.15; 128], 5);cd examples/rvf
cargo run --example generate_all
ls output/ # 46 .rvf files ready to inspect
rvf status output/sealed_engine.rvf
rvf inspect output/linux_microkernel.rvfAn RVF file is a sequence of typed segments. Each segment is self-describing, 64-byte aligned, and independently integrity-checked. The format supports 24 segment types that together constitute a complete cognitive runtime:
.rvf file (Sealed Cognitive Engine)
|
+-- MANIFEST_SEG .... 4 KB root manifest, segment directory, instant boot
+-- VEC_SEG ......... Vector embeddings (fp16/fp32/int8/int4/binary)
+-- INDEX_SEG ....... HNSW progressive index (Layer A/B/C)
+-- OVERLAY_SEG ..... LoRA adapter deltas, incremental updates
+-- GRAPH_SEG ....... GNN adjacency, edge weights, graph state
+-- QUANT_SEG ....... Quantization codebooks (scalar/PQ/binary)
+-- SKETCH_SEG ...... Access sketches, VQE snapshots, quantum state
+-- META_SEG ........ Key-value metadata, observation-state
+-- WITNESS_SEG ..... Tamper-evident audit trails, attestation records
+-- CRYPTO_SEG ...... ML-DSA-65 / Ed25519 signatures, sealed keys
+-- WASM_SEG ........ 5.5 KB query microkernel (Tier 1: browser/edge)
+-- EBPF_SEG ........ eBPF fast-path program (Tier 2: kernel acceleration)
+-- KERNEL_SEG ...... Compressed unikernel (Tier 3: self-booting service)
+-- PROFILE_SEG ..... Domain profile (RVDNA/RVText/RVGraph/RVVision)
+-- HOT_SEG ......... Temperature-promoted hot data
+-- META_IDX_SEG .... Metadata inverted indexes for filtered search
+-- COW_MAP_SEG ..... Cluster ownership map for COW branching (0x20)
+-- REFCOUNT_SEG .... Cluster reference counts, rebuildable (0x21)
+-- MEMBERSHIP_SEG .. Vector visibility filter for branches (0x22)
+-- DELTA_SEG ....... Sparse delta patches / LoRA overlays (0x23)
+-- TRANSFER_PRIOR .. Transfer learning priors (0x30)
+-- POLICY_KERNEL ... Thompson Sampling policy state (0x31)
+-- COST_CURVE ...... Cost/reward curves for solver (0x32)
When an RVF file combines vectors, models, compute, and trust segments, it becomes a deployable intelligence capsule:
ClinicalOncologyEngine.rvdna (one file, ~50 MB)
Contains:
-- Medical corpus embeddings VEC_SEG 384-dim, 2M vectors
-- MicroLoRA oncology fine-tune OVERLAY_SEG adapter deltas
-- Biological pathway GNN GRAPH_SEG pathway modeling
-- Molecular similarity state SKETCH_SEG quantum-enhanced
-- Linux microkernel service KERNEL_SEG boots on Firecracker
-- Browser query runtime WASM_SEG 5.5 KB, no backend
-- eBPF drug lookup accelerator EBPF_SEG sub-microsecond
-- Attested execution proof WITNESS_SEG tamper-evident chain
-- Post-quantum signature CRYPTO_SEG ML-DSA-65
This is not a database. It is a sealed, auditable, self-booting domain expert. Copy it to a Firecracker VM and it boots a Linux service. Open it in a browser and WASM serves queries locally. Ship it air-gapped and it produces identical results under audit.
RVF is the canonical binary format across 87+ Rust crates in the RuVector ecosystem:
| Domain | Crates | RVF Segment |
|---|---|---|
| LLM Inference | ruvllm, ruvllm-cli, ruvllm-wasm |
VEC_SEG (KV cache), OVERLAY_SEG (LoRA) |
| Attention | ruvector-attention, coherence-gated transformer |
VEC_SEG, INDEX_SEG |
| GNN | ruvector-gnn, ruvector-graph, graph-node/wasm |
GRAPH_SEG |
| Quantum | ruQu, ruqu-core, ruqu-algorithms, ruqu-exotic |
SKETCH_SEG (VQE, syndrome tables) |
| Min-Cut Coherence | ruvector-mincut, mincut-gated-transformer |
GRAPH_SEG, INDEX_SEG |
| Delta Tracking | ruvector-delta-core, delta-graph, delta-index |
OVERLAY_SEG, JOURNAL_SEG |
| Neural Routing | ruvector-tiny-dancer-core (FastGRNN) |
VEC_SEG, META_SEG |
| Sparse Inference | ruvector-sparse-inference |
VEC_SEG, QUANT_SEG |
| Temporal Tensors | ruvector-temporal-tensor |
VEC_SEG, META_SEG |
| Cognitum Silicon | cognitum-gate-kernel, cognitum-gate-tilezero |
WASM_SEG (64 KB tiles) |
| SONA Learning | sona (self-optimizing neural arch) |
VEC_SEG, WITNESS_SEG |
| Agent Memory | claude-flow, agentdb, agentic-flow, ospipe | All segments via adapters |
The same .rvf file format runs on cloud servers, Firecracker microVMs, TEE enclaves, edge devices, Cognitum tiles, and in the browser.
| Feature | Description |
|---|---|
| Append-only segments | Crash-safe without WAL. Every write is atomic with per-segment integrity checksums. |
| Progressive indexing | Three-tier HNSW (Layer A/B/C). First query at 70% recall before full index loads. |
| Temperature-tiered quantization | Hot vectors stay fp16, warm use product quantization, cold use binary — automatically. |
| Metadata filtering | Filtered k-NN with boolean expressions (AND/OR/NOT/IN/RANGE). |
| 4 KB instant boot | Root manifest fits in one page read. Cold boot < 5 ms. |
| 24 segment types | VEC, INDEX, MANIFEST, QUANT, WITNESS, CRYPTO, KERNEL, EBPF, WASM, COW_MAP, MEMBERSHIP, DELTA, TRANSFER_PRIOR, POLICY_KERNEL, COST_CURVE, and 9 more. |
| Feature | Description |
|---|---|
| COW branching | Git-like copy-on-write at cluster granularity. Derive child stores that share parent data; only changed clusters are copied. |
| Membership filters | Shared HNSW index across branches with bitmap visibility control. Include/exclude modes. |
| Snapshot freeze | Immutable snapshot at any generation. Metadata-only operation, no data copy. |
| Delta segments | Sparse patches for LoRA overlays. Hot-path guard upgrades to full slab. |
| Rebuildable refcounts | No WAL. Refcounts derived from COW map chain during compaction. |
| Feature | Description |
|---|---|
| Domain profiles | .rvdna, .rvtext, .rvgraph, .rvvis extensions map to optimized profiles. |
| Unified CLI | 17 subcommands: create, ingest, query, delete, status, inspect, compact, derive, serve, launch, embed-kernel, embed-ebpf, filter, freeze, verify-witness, verify-attestation, rebuild-refcounts. |
| 6 library adapters | Drop-in integration for claude-flow, agentdb, ospipe, agentic-flow, rvlite, sona. |
| MCP server | Model Context Protocol integration for Claude Code, Cursor, and AI agents. |
| Node.js bindings | N-API bindings with lineage, kernel/eBPF, and inspection support. |
+-----------------------------------------------------------------+
| Cognitive Layer |
| ruvllm (LLM) | ruvector-gnn (GNN) | ruQu (Quantum) |
| ruvector-attention | sona (SONA) | ruvector-mincut |
+---+------------------+-----------------+-----------+------------+
| | | |
+---v------------------v-----------------v-----------v------------+
| Agent & Application Layer |
| claude-flow | agentdb | agentic-flow | ospipe | rvlite |
+---+------------------+-----------------+-----------+------------+
| | | |
+---v------------------v-----------------v-----------v------------+
| RVF SDK Layer |
| rvf-runtime | rvf-index | rvf-quant | rvf-crypto | rvf-wire |
| rvf-manifest | rvf-types | rvf-import | rvf-adapters |
+---+--------+---------+----------+-----------+------------------+
| | | | |
+---v---+ +--v----+ +--v-----+ +-v--------+ +v-----------+ +v------+
|server | | node | | wasm | | kernel | | ebpf | | cli |
|HTTP | | N-API | | ~46 KB | |bzImage+ | |clang BPF | |17 cmds|
|REST+ | | | | | |initramfs | |XDP/TC/sock | | |
|TCP | | | | | +----------+ +------------+ +-------+
+-------+ +-------+ +--------+ +-v--------+
| launch |
|QEMU+QMP |
+----------+
An .rvf file is a sequence of 64-byte-aligned segments. Each segment has a self-describing header:
+--------+------+-------+--------+-----------+-------+----------+
| Magic | Ver | Type | Flags | SegmentID | Size | Hash |
| 4B | 1B | 1B | 2B | 8B | 8B | 16B ... |
+--------+------+-------+--------+-----------+-------+----------+
| Payload (variable length, 64-byte aligned) |
+----------------------------------------------------------------+
| Crate | Lines | Purpose |
|---|---|---|
rvf-types |
7,000+ | 24 segment types, AGI container, quality, security, WASM bootstrap, QR seed (no_std) |
rvf-wire |
2,011 | Wire format read/write (no_std) |
rvf-manifest |
1,700+ | Two-level manifest with 4 KB root, FileIdentity codec, COW pointers, double-root scheme |
rvf-index |
2,691 | HNSW progressive indexing (Layer A/B/C) |
rvf-quant |
1,443 | Scalar, product, and binary quantization |
rvf-crypto |
1,725 | SHAKE-256, Ed25519, witness chains, attestation, seed crypto |
rvf-runtime |
8,000+ | Full store API, COW engine, AGI containers, QR seeds, safety net, adversarial defense |
rvf-kernel |
2,400+ | Real Linux kernel builder, cpio/newc initramfs, Docker build, SHA3-256 verification |
rvf-launch |
1,200+ | QEMU microvm launcher, KVM/TCG detection, QMP shutdown protocol |
rvf-ebpf |
1,100+ | Real BPF C compiler (XDP, socket filter, TC), vmlinux.h generation |
rvf-wasm |
1,700+ | WASM control plane: in-memory store, query, segment inspection, witness chain verification (~46 KB) |
rvf-solver-wasm |
1,500+ | Thompson Sampling temporal solver, PolicyKernel, three-loop architecture (no_std) |
rvf-node |
852 | Node.js N-API bindings with lineage, kernel/eBPF, and inspection |
rvf-cli |
1,800+ | Unified CLI with 17 subcommands (create, ingest, query, delete, status, inspect, compact, derive, serve, launch, embed-kernel, embed-ebpf, filter, freeze, verify-witness, verify-attestation, rebuild-refcounts) |
rvf-server |
1,165 | HTTP REST + TCP streaming server |
rvf-import |
980 | JSON, CSV, NumPy (.npy) importers |
| Adapters | 6,493 | 6 library integrations (see below) |
| Metric | Target | Achieved |
|---|---|---|
| Cold boot (4 KB manifest read) | < 5 ms | 1.6 us |
| First query recall@10 (Layer A only) | >= 0.70 | >= 0.70 |
| Full quality recall@10 (Layer C) | >= 0.95 | >= 0.95 |
| WASM binary (tile microkernel) | < 8 KB | ~5.5 KB |
| WASM binary (control plane) | < 50 KB | ~46 KB |
| Segment header size | 64 bytes | 64 bytes |
| Minimum file overhead | < 1 KB | < 256 bytes |
| COW branch creation (10K vecs) | < 10 ms | 2.6 ms (child = 162 bytes) |
| COW branch creation (100K vecs) | < 50 ms | 6.8 ms (child = 162 bytes) |
| COW read (local cluster, pread) | < 5 us | 1,348 ns/vector |
| COW read (inherited from parent) | < 5 us | 1,442 ns/vector |
| Write coalescing (32 vecs, 1 cluster) | 1 COW event | 654 us, 1 event |
| CowMap lookup | < 100 ns | 28 ns |
| Membership filter contains() | < 100 ns | 23-33 ns |
| Snapshot freeze | < 100 ns | 30-52 ns |
RVF doesn't make you wait for the full index:
| Stage | Data Loaded | Recall@10 | Latency |
|---|---|---|---|
| Layer A | Entry points + centroids | >= 0.70 | < 5 ms |
| Layer B | Hot region adjacency | >= 0.85 | ~10 ms |
| Layer C | Full HNSW graph | >= 0.95 | ~50 ms |
| Feature | RVF | Annoy | FAISS | Qdrant | Milvus |
|---|---|---|---|---|---|
| Single-file format | Yes | Yes | No | No | No |
| Crash-safe (no WAL) | Yes | No | No | Needs WAL | Needs WAL |
| Progressive loading | Yes (3 layers) | No | No | No | No |
| COW branching | Yes (cluster-level) | No | No | No | No |
| Membership filters | Yes (shared HNSW) | No | No | No | No |
| Snapshot freeze | Yes (zero-copy) | No | No | No | No |
| WASM support | Yes (5.5 KB) | No | No | No | No |
| Self-booting kernel | Yes (real Linux) | No | No | No | No |
| eBPF acceleration | Yes (XDP/TC/socket) | No | No | No | No |
no_std compatible |
Yes | No | No | No | No |
| Post-quantum sigs | Yes (ML-DSA-65) | No | No | No | No |
| TEE attestation | Yes | No | No | No | No |
| Metadata filtering | Yes | No | Yes | Yes | Yes |
| Temperature tiering | Automatic | No | Manual | No | No |
| Quantization | 3-tier auto | No | Yes (manual) | Yes | Yes |
| Lineage provenance | Yes (DNA-style) | No | No | No | No |
| Domain profiles | 5 profiles | No | No | No | No |
| Append-only | Yes | Build-once | Build-once | Log-based | Log-based |
| RVF Cognitive Container | Docker / OCI | |
|---|---|---|
| File format | Single .rvf file |
Layered tarball images |
| Boot target | QEMU microVM (microvm machine) | Container runtime (runc, containerd) |
| Vector data | Native segment, HNSW-indexed | External volume mount |
| Branching | Vector-native COW at cluster granularity | Layer-based COW (filesystem) |
| eBPF | Embedded in file, verified | Separate deployment |
| Attestation | Witness chain + KernelBinding | External signing (cosign, notary) |
| Size (hello world) | ~17 KB (with initramfs + vectors) | ~5 MB (Alpine) |
| RVF | Pinecone / Milvus / Qdrant | |
|---|---|---|
| Deployment | Single file, zero dependencies | Server process + storage |
| Branching | Native COW, 2.6 ms for 10K vectors | Copy entire collection |
| Multi-tenant | Membership filter on shared index | Separate collections |
| Edge deploy | scp file.rvf host: + boot |
Install + configure + import |
| Provenance | Cryptographic witness chain | External audit logs |
| Compute | Embedded kernel + eBPF | N/A |
| RVF COW | Git LFS / DVC | |
|---|---|---|
| Granularity | Vector cluster (256 KB) | Whole file |
| Index sharing | Shared HNSW + membership filter | No index awareness |
| Query during branch | Yes, sub-microsecond | No query capability |
| Delta encoding | Sparse row patches (LoRA) | Binary diff |
| RVF | SQLite | DuckDB | |
|---|---|---|---|
| Vector-native | Yes (HNSW, quantization, COW) | No (extension needed) | No (extension needed) |
| Self-booting | Yes (KERNEL_SEG) | No | No |
| eBPF acceleration | Yes (XDP, TC, socket) | No | No |
| Cryptographic audit | Yes (witness chains) | No | No |
| Progressive loading | 3-tier HNSW (70% → 95% recall) | N/A | N/A |
| WASM support | 5.5 KB microkernel | Yes (via wasm) | No |
| Single file | Yes | Yes | Yes |
RVF supports DNA-style derivation chains for tracking how files were produced from one another. Each .rvf file carries a 68-byte FileIdentity recording its unique ID, its parent's ID, and a cryptographic hash of the parent's manifest. This enables tamper-evident provenance verification from any file back to its root ancestor.
parent.rvf child.rvf grandchild.rvf
(depth=0) (depth=1) (depth=2)
file_id: AAA file_id: BBB file_id: CCC
parent_id: 000 parent_id: AAA parent_id: BBB
parent_hash: 000 parent_hash: H(A) parent_hash: H(B)
| | |
+-------derive------+-------derive------+
Domain-specific extensions are automatically mapped to optimized profiles. The authoritative profile lives in the Level0Root.profile_id field; the file extension is a convenience hint:
| Extension | Domain Profile | Optimized For |
|---|---|---|
.rvf |
Generic | General-purpose vectors |
.rvdna |
RVDNA | Genomic sequence embeddings |
.rvtext |
RVText | Language model embeddings |
.rvgraph |
RVGraph | Graph/network node embeddings |
.rvvis |
RVVision | Image/vision model embeddings |
use rvf_runtime::{RvfStore, options::{RvfOptions, DistanceMetric}};
use rvf_types::DerivationType;
use std::path::Path;
let options = RvfOptions {
dimension: 384,
metric: DistanceMetric::Cosine,
..Default::default()
};
let parent = RvfStore::create(Path::new("parent.rvf"), options)?;
// Derive a filtered child -- inherits dimensions and options
let child = parent.derive(
Path::new("child.rvf"),
DerivationType::Filter,
None,
)?;
assert_eq!(child.lineage_depth(), 1);
assert_eq!(child.parent_id(), parent.file_id());RVF supports an optional three-tier execution model that allows a single .rvf file to carry executable compute alongside its vector data. A file can serve queries from a browser (Tier 1 WASM), accelerate hot-path lookups in the Linux kernel (Tier 2 eBPF), or boot as a standalone microservice inside a Firecracker microVM or TEE enclave (Tier 3 unikernel) -- all from the same file.
| Tier | Segment | Size | Environment | Boot Time | Use Case |
|---|---|---|---|---|---|
| 1: WASM | WASM_SEG (existing) | 5.5 KB | Browser, edge, IoT | <1 ms | Portable queries everywhere |
| 2: eBPF | EBPF_SEG (0x0F) |
10-50 KB | Linux kernel (XDP, TC) | <20 ms | Sub-microsecond hot cache hits |
| 3: Unikernel | KERNEL_SEG (0x0E) |
200 KB - 2 MB | Firecracker, TEE, bare metal | <125 ms | Zero-dependency self-booting service |
Readers that do not recognize KERNEL_SEG or EBPF_SEG skip them per the RVF forward-compatibility rule. The computational capability is purely additive.
use rvf_runtime::RvfStore;
use rvf_types::kernel::{KernelArch, KernelType};
use std::path::Path;
let mut store = RvfStore::open(Path::new("vectors.rvf"))?;
// Embed a compressed unikernel image
store.embed_kernel(
KernelArch::X86_64 as u8, // arch
KernelType::Hermit as u8, // kernel type
0x0018, // flags: HAS_QUERY_API | HAS_NETWORKING
&compressed_kernel_image, // kernel binary
8080, // API port
Some("console=ttyS0 quiet"), // cmdline (optional)
)?;
// Later, extract it
if let Some((header, image_data)) = store.extract_kernel()? {
println!("Kernel: {:?} ({} bytes)", header.kernel_arch(), image_data.len());
}use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType};
// Embed an eBPF XDP program for fast-path vector lookup
store.embed_ebpf(
EbpfProgramType::XdpDistance as u8, // program type
EbpfAttachType::XdpIngress as u8, // attach point
384, // max vector dimension
&ebpf_bytecode, // BPF ELF object
Some(&btf_section), // BTF data (optional)
)?;
if let Some((header, program_data)) = store.extract_ebpf()? {
println!("eBPF: {:?} ({} bytes)", header.program_type, program_data.len());
}- 7-step fail-closed verification: hash, signature, TEE measurement, all must pass before kernel boot
- Authority boundary: guest kernel owns auth/audit/witness; host eBPF is acceleration-only
- Signing: Ed25519 for development, ML-DSA-65 (FIPS 204) for production
- TEE priority: SEV-SNP first, SGX second, ARM CCA third
- Size limits: kernel images capped at 128 MiB, eBPF programs at 16 MiB
For the full specification including wire formats, attestation binding, and implementation phases, see ADR-030: RVF Cognitive Container.
The claude_code_appliance example builds a complete self-booting AI development environment as a single .rvf file. It uses real infrastructure — a Docker-built Linux kernel, Ed25519 SSH keys, a BPF C socket filter, and a cryptographic witness chain.
Prerequisites: Docker (for kernel build), Rust 1.87+
# Build and run the example
cd examples/rvf
cargo run --example claude_code_applianceWhat it produces (5.1 MB file):
claude_code_appliance.rvf
├── KERNEL_SEG Linux 6.8.12 bzImage (5.2 MB, x86_64)
├── EBPF_SEG Socket filter — allows ports 2222, 8080 only
├── VEC_SEG 20 package embeddings (128-dim)
├── INDEX_SEG HNSW graph for package search
├── WITNESS_SEG 6-entry tamper-evident audit trail
├── CRYPTO_SEG 3 Ed25519 SSH user keys (root, deploy, claude)
├── MANIFEST_SEG 4 KB root with segment directory
└── Snapshot v1 derived image with lineage tracking
Boot sequence (once launched on Firecracker/QEMU):
1. Firecracker loads KERNEL_SEG → Linux boots (<125 ms)
2. SSH server starts on port 2222
3. curl -fsSL https://claude.ai/install.sh | bash
4. RVF query server starts on port 8080
5. Claude Code ready for use
Connect and use:
# Boot the file (requires QEMU or Firecracker)
rvf launch claude_code_appliance.rvf
# SSH in
ssh -p 2222 deploy@localhost
# Query the package database
curl -s localhost:8080/query -d '{"vector":[0.1,...], "k":5}'
# Or use the CLI
rvf query claude_code_appliance.rvf --vector "0.1,0.2,..." --k 5Verified output from the example run:
=== Claude Code Appliance Summary ===
File size: 5,260,093 bytes (5.1 MB)
Segments: 8
Packages: 20 (203.1 MB manifest)
KERNEL_SEG: MicroLinux x86_64 (5,243,904 bytes)
EBPF_SEG: SocketFilter (3,805 bytes)
SSH users: 3 (Ed25519 signed, all verified)
Witness chain: 6 entries (tamper-evident, all verified)
Lineage: base + v1 snapshot (parent hash matches)
Final file: 5.1 MB single .rvf — boots Linux, serves queries, runs Claude Code.
One file. Boots Linux. Runs SSH. Serves vectors. Installs Claude Code. Proves every step.
# CLI launcher (auto-detects KVM or falls back to TCG)
rvf launch vectors.rvf
# Manual QEMU (if you want control)
rvf launch vectors.rvf --memory 512M --cpus 2 --port-forward 2222:22,8080:8080
# Extract kernel for external use
rvf inspect vectors.rvf --segment kernel --output kernel.bin
qemu-system-x86_64 -M microvm -kernel kernel.bin -append "console=ttyS0" -nographicStep-by-step to create a self-booting .rvf from scratch:
# 1. Create a vector store
rvf create myservice.rvf --dimension 384
# 2. Ingest your data
rvf ingest myservice.rvf --input embeddings.json --format json
# 3. Build and embed a Linux kernel (uses Docker)
rvf embed-kernel myservice.rvf --arch x86_64
# 4. Optionally embed an eBPF filter
rvf embed-ebpf myservice.rvf --program filter.c
# 5. Verify the result
rvf inspect myservice.rvf
# MANIFEST_SEG, VEC_SEG, INDEX_SEG, KERNEL_SEG, EBPF_SEG, WITNESS_SEG
# 6. Boot it
rvf launch myservice.rvfRVF provides drop-in adapters for 6 libraries in the RuVector ecosystem:
| Adapter | Purpose | Key Feature |
|---|---|---|
rvf-adapter-claude-flow |
AI agent memory | WITNESS_SEG audit trails |
rvf-adapter-agentdb |
Agent vector database | Progressive HNSW indexing |
rvf-adapter-ospipe |
Observation-State pipeline | META_SEG for state vectors |
rvf-adapter-agentic-flow |
Swarm coordination | Inter-agent memory sharing |
rvf-adapter-rvlite |
Lightweight embedded store | Minimal API, edge-friendly |
rvf-adapter-sona |
Neural architecture | Experience replay + trajectories |
An AGI container packages a complete AI agent runtime into a single sealed .rvf file. Where the Self-Booting RVF section covers the compute tiers (WASM/eBPF/Kernel), the AGI container adds the intelligence layer on top: model identity, orchestration config, tool registries, evaluation harnesses, authority controls, and coherence gates.
AGI Cognitive Container (.rvf)
├── Identity ────── container UUID, build UUID, model ID hash
├── Orchestrator ── Claude Code / Claude Flow config (JSON)
├── Tools ──────── MCP tool adapter registry
├── Agent Prompts ─ role definitions per agent type
├── Eval Harness ── task suite + grading rules
├── Skills ──────── promoted skill library
├── Policy ──────── governance rules + authority config
├── Coherence ───── min score, contradiction rate, rollback ratio
├── Resources ───── time/token/cost budgets with clamping
├── Replay ──────── automation script for deterministic re-execution
├── Kernel Config ─ boot parameters, network, SSH
├── Domain Profile ─ coding / research / ops specialization
└── Signature ───── HMAC-SHA256 or Ed25519 tamper seal
| Mode | Purpose | Requires |
|---|---|---|
| Replay | Deterministic re-execution from witness logs | Witness chain |
| Verify | Validate container integrity and run eval harness | Kernel + world model, or WASM + vectors |
| Live | Full autonomous operation with tool use | Kernel + world model |
Authority is hierarchical — each level permits everything below it:
| Level | Allows |
|---|---|
ReadOnly |
Read vectors, run queries |
WriteMemory |
+ Write to vector store, update index |
ExecuteTools |
+ Invoke MCP tools, run commands |
WriteExternal |
+ Network access, file I/O, push to git |
Default authority per mode: Replay → ReadOnly, Verify → ExecuteTools, Live → WriteMemory.
Every container carries hard limits that are clamped to safety maximums:
| Resource | Max | Default |
|---|---|---|
| Time | 3,600 sec | 300 sec |
| Tokens | 1,000,000 | 100,000 |
| Cost | $10.00 | $1.00 |
| Tool calls | 500 | 100 |
| External writes | 50 | 10 |
Coherence thresholds halt execution when the agent's world model drifts:
min_coherence_score(0.0–1.0) — minimum quality gatemax_contradiction_rate(0.0–1.0) — tolerable contradiction frequencymax_rollback_ratio(0.0–1.0) — ratio of rolled-back decisions
use rvf_runtime::agi_container::AgiContainerBuilder;
use rvf_types::agi_container::*;
let (payload, header) = AgiContainerBuilder::new(container_id, build_id)
.with_model_id("claude-opus-4-6")
.with_orchestrator(b"{\"max_turns\":100}")
.with_tool_registry(b"[{\"name\":\"search\",\"type\":\"rvf_query\"}]")
.with_eval_tasks(b"[{\"id\":1,\"spec\":\"fix bug\"}]")
.with_eval_graders(b"[{\"type\":\"test_pass\"}]")
.with_authority_config(b"{\"level\":\"WriteMemory\"}")
.with_coherence_config(b"{\"min_cut\":0.7,\"rollback\":true}")
.with_project_instructions(b"# CLAUDE.md\nFix bugs, run tests.")
.with_segments(ContainerSegments {
kernel_present: true, manifest_present: true,
world_model_present: true, ..Default::default()
})
.build_and_sign(signing_key)?;
// Parse and validate
let manifest = ParsedAgiManifest::parse(&payload)?;
assert_eq!(manifest.model_id_str(), Some("claude-opus-4-6"));
assert!(manifest.is_autonomous_capable());
assert!(header.is_signed());See ADR-036 for the full specification.
A QR Cognitive Seed (RVQS) encodes a portable intelligence capsule into a scannable QR code. It carries bootstrap hosts, layer hashes, and cryptographic signatures in a compact binary format.
use rvf_runtime::seed_crypto;
let hash = seed_crypto::seed_content_hash(data); // 8-byte SHAKE-256
let sig = seed_crypto::sign_seed(key, payload); // 32-byte HMAC
let ok = seed_crypto::verify_seed(key, payload, &sig);Types: SeedHeader, HostEntry, LayerEntry (rvf-types), plus qr_encode for QR matrix generation (rvf-runtime).
The quality system tracks retrieval fidelity across progressive index layers and enforces graceful degradation when budgets are exceeded.
RetrievalQuality— Full / Partial / Degraded / FailedResponseQuality— per-query quality metadata with evidenceSafetyNetBudget— time, token, and cost budgets with automatic clampingDegradationReport— structured fallback path and reason tracking
| Module | Crate | Purpose |
|---|---|---|
SecurityPolicy / HardeningFields |
rvf-types | Declarative per-file security configuration |
adversarial |
rvf-runtime | Input validation, dimension/size checks at write boundary |
dos |
rvf-runtime | Rate limiting, resource exhaustion guards |
KernelBinding |
rvf-types | Binds signed kernels to specific manifest hashes |
verify_witness_chain |
rvf-crypto | SHAKE-256 chain integrity verification |
WASM_SEG enables an RVF file to carry its own WASM interpreter, creating a three-layer bootstrap stack:
Raw bytes → WASM interpreter → microkernel → vector data
Types: WasmRole (Interpreter/Microkernel/Solver), WasmTarget (Browser/Node/Edge/Embedded), WasmHeader (rvf-types/wasm_bootstrap).
The rvf-solver-wasm crate implements a Thompson Sampling temporal solver as a no_std WASM module with dlmalloc, producing segment types TRANSFER_PRIOR (0x30), POLICY_KERNEL (0x31), and COST_CURVE (0x32).
46 Runnable Examples
Every example uses real RVF APIs end-to-end — no mocks, no stubs. Run any example with:
cd examples/rvf
cargo run --example <name>| # | Example | What It Demonstrates |
|---|---|---|
| 1 | basic_store |
Create, insert 100 vectors, k-NN query, close, reopen, verify persistence |
| 2 | progressive_index |
Build three-layer HNSW, measure recall@10 progression (0.70 → 0.95) |
| 3 | quantization |
Scalar, product, and binary quantization with temperature tiering |
| 4 | wire_format |
Raw 64-byte segment I/O, CRC32c hash validation, manifest tail-scan |
| 5 | crypto_signing |
Ed25519 segment signing, SHAKE-256 witness chains, tamper detection |
| 6 | filtered_search |
Metadata-filtered queries: Eq, Ne, Gt, Range, In, And, Or |
| # | Example | What It Demonstrates |
|---|---|---|
| 7 | agent_memory |
Persistent agent memory across sessions with witness audit trail |
| 8 | swarm_knowledge |
Multi-agent shared knowledge base, cross-agent semantic search |
| 9 | reasoning_trace |
Chain-of-thought lineage: parent → child → grandchild derivation |
| 10 | tool_cache |
Tool call result caching with TTL expiry, delete_by_filter, compaction |
| 11 | agent_handoff |
Transfer agent state between instances via derive + clone |
| 12 | experience_replay |
Reinforcement learning replay buffer with priority sampling |
| # | Example | What It Demonstrates |
|---|---|---|
| 13 | semantic_search |
Document search engine with 4 filter workflows |
| 14 | recommendation |
Collaborative filtering with genre and quality filters |
| 15 | rag_pipeline |
5-step RAG: chunk, embed, retrieve, rerank, assemble context |
| 16 | embedding_cache |
Zipf access patterns, 3-tier quantization, memory savings |
| 17 | dedup_detector |
Near-duplicate detection, clustering, compaction |
| # | Example | What It Demonstrates |
|---|---|---|
| 18 | genomic_pipeline |
DNA k-mer search with .rvdna profile and lineage tracking |
| 19 | financial_signals |
Market signals with Ed25519 signing and TEE attestation |
| 20 | medical_imaging |
Radiology embedding search with .rvvis profile |
| 21 | legal_discovery |
Legal document similarity with .rvtext profile |
| # | Example | What It Demonstrates |
|---|---|---|
| 22 | self_booting |
Embed/extract unikernel (KERNEL_SEG), header verification |
| 23 | ebpf_accelerator |
Embed/extract eBPF (EBPF_SEG), XDP program co-existence |
| 24 | hyperbolic_taxonomy |
Hierarchy-aware Poincaré embeddings, depth-filtered search |
| 25 | multimodal_fusion |
Cross-modal text + image search with modality filtering |
| 26 | sealed_engine |
Capstone: vectors + kernel + eBPF + witness + lineage in one file |
| # | Example | What It Demonstrates |
|---|---|---|
| 27 | browser_wasm |
WASM-compatible API surface, raw wire segments, size budget |
| 28 | edge_iot |
Constrained IoT device with binary quantization |
| 29 | serverless_function |
Cold-start optimization, manifest tail-scan, progressive loading |
| 30 | ruvllm_inference |
LLM KV cache + LoRA adapters + policy store via RVF |
| 31 | postgres_bridge |
PostgreSQL export/import with lineage and witness audit |
| # | Example | What It Demonstrates |
|---|---|---|
| 32 | network_sync |
Peer-to-peer vector store synchronization |
| 33 | tee_attestation |
TEE platform attestation, sealed keys, computation proof |
| 34 | access_control |
Role-based vector access control with audit trails |
| 35 | zero_knowledge |
Zero-knowledge proofs for privacy-preserving vector ops |
| # | Example | What It Demonstrates |
|---|---|---|
| 36 | ruvbot |
Autonomous agent with RVF memory, planning, and tool use |
| 37 | posix_fileops |
POSIX raw I/O, atomic rename, advisory locking, segment access |
| 38 | linux_microkernel |
20-package Linux distro with SSH keys and kernel embed |
| 39 | mcp_in_rvf |
MCP server runtime + eBPF filter embedded in RVF |
| 40 | network_interfaces |
6-chassis / 60-interface network telemetry with anomaly detection |
| # | Example | What It Demonstrates |
|---|---|---|
| 41 | cow_branching |
COW derive, cluster-level copy, write coalescing, parent inheritance |
| 42 | membership_filter |
Include/exclude bitmap filters for shared HNSW traversal |
| 43 | snapshot_freeze |
Generation snapshots, immutable freeze, generation tracking |
| # | Example | What It Demonstrates |
|---|---|---|
| 44 | claude_code_appliance |
Bootable AI dev environment: real kernel + eBPF + vectors + witness + crypto |
| 45 | live_boot_proof |
Docker-boot an .rvf, SSH in, verify segments are live and operational |
| 46 | generate_all |
Batch generation of all example .rvf files |
See the examples README for tutorials, usage patterns, and detailed walkthroughs.
Importing Data
use rvf_import::numpy::{parse_npy_file, NpyConfig};
use std::path::Path;
let records = parse_npy_file(
Path::new("embeddings.npy"),
&NpyConfig { start_id: 0 },
)?;
// records: Vec<VectorRecord> with id, vector, metadatause rvf_import::csv_import::{parse_csv_file, CsvConfig};
use std::path::Path;
let config = CsvConfig {
id_column: Some("id".into()),
dimension: 128,
..Default::default()
};
let records = parse_csv_file(Path::new("vectors.csv"), &config)?;use rvf_import::json::{parse_json_file, JsonConfig};
use std::path::Path;
let config = JsonConfig {
id_field: "id".into(),
vector_field: "embedding".into(),
..Default::default()
};
let records = parse_json_file(Path::new("vectors.json"), &config)?;# Using rvf-import binary directly
cargo run --bin rvf-import -- \
--input data.npy \
--output vectors.rvf \
--format npy \
--dimension 384
# Or via the unified rvf CLI
rvf create vectors.rvf --dimension 384
rvf ingest vectors.rvf --input data.json --format jsonHTTP Server API
cargo run --bin rvf-server -- --path vectors.rvf --port 8080Ingest vectors:
curl -X POST http://localhost:8080/ingest \
-H "Content-Type: application/json" \
-d '{
"vectors": [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8]],
"ids": [1, 2]
}'Query nearest neighbors:
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"vector": [0.1, 0.2, 0.3, 0.4],
"k": 10
}'Delete vectors:
curl -X POST http://localhost:8080/delete \
-H "Content-Type: application/json" \
-d '{"ids": [1, 2]}'Get status:
curl http://localhost:8080/statusCompact (reclaim space):
curl -X POST http://localhost:8080/compactMCP Server (Model Context Protocol)
The @ruvector/rvf-mcp-server package exposes RVF stores to AI agents via the Model Context Protocol. Supports stdio and SSE transports.
# stdio transport (for Claude Code, Cursor, etc.)
npx @ruvector/rvf-mcp-server --transport stdio
# SSE transport (for web clients)
npx @ruvector/rvf-mcp-server --transport sse --port 3100Add to your Claude Code MCP config:
{
"mcpServers": {
"rvf": {
"command": "npx",
"args": ["@ruvector/rvf-mcp-server", "--transport", "stdio"]
}
}
}| Tool | Description |
|---|---|
rvf_create_store |
Create a new RVF vector store |
rvf_open_store |
Open an existing store (read-write or read-only) |
rvf_close_store |
Close a store and release the writer lock |
rvf_ingest |
Insert vectors with optional metadata |
rvf_query |
k-NN similarity search with metadata filters |
rvf_delete |
Delete vectors by ID |
rvf_delete_filter |
Delete vectors matching a metadata filter |
rvf_compact |
Compact store to reclaim dead space |
rvf_status |
Get store status (dimensions, vector count, etc.) |
rvf_list_stores |
List all open stores |
| URI | Description |
|---|---|
rvf://stores |
JSON listing of all open stores and their status |
| Prompt | Description |
|---|---|
rvf-search |
Natural language similarity search |
rvf-ingest |
Data ingestion with auto-embedding |
Confidential Core Attestation
RVF can record hardware TEE (Trusted Execution Environment) attestation quotes alongside vector data. This proves that vector operations occurred inside a verified secure enclave.
| Platform | Enum Value | Quote Format |
|---|---|---|
| Intel SGX | TeePlatform::Sgx (0) |
DCAP quote |
| AMD SEV-SNP | TeePlatform::SevSnp (1) |
VCEK attestation report |
| Intel TDX | TeePlatform::Tdx (2) |
TD quote |
| ARM CCA | TeePlatform::ArmCca (3) |
CCA token |
| Software (testing) | TeePlatform::SoftwareTee (0xFE) |
Synthetic |
| Type | Witness Code | Purpose |
|---|---|---|
| Platform Attestation | 0x05 |
TEE identity and measurement verification |
| Key Binding | 0x06 |
Encryption keys sealed to TEE measurement |
| Computation Proof | 0x07 |
Proof that operations ran inside the enclave |
| Data Provenance | 0x08 |
Chain of custody: model to TEE to RVF |
use rvf_crypto::attestation::*;
use rvf_types::attestation::*;
// Build attestation header
let mut header = AttestationHeader::new(
TeePlatform::SoftwareTee as u8,
AttestationWitnessType::PlatformAttestation as u8,
);
header.measurement = shake256_256(b"my-enclave-code");
header.nonce = [0x42; 16];
header.quote_length = 64;
header.timestamp_ns = 1_700_000_000_000_000_000;
// Encode the full record
let report_data = b"model=all-MiniLM-L6-v2";
let quote = vec![0xAA; 64]; // platform-specific quote bytes
let record = encode_attestation_record(&header, report_data, "e);
// Create a witness chain entry binding this attestation
let entry = attestation_witness_entry(
&record,
header.timestamp_ns,
AttestationWitnessType::PlatformAttestation,
);
// entry.action_hash == SHAKE-256-256(record)use rvf_crypto::attestation::*;
use rvf_types::attestation::*;
let key = TeeBoundKeyRecord {
key_type: KEY_TYPE_TEE_BOUND,
algorithm: 0, // Ed25519
sealed_key_length: 32,
key_id: shake256_128(b"my-public-key"),
measurement: shake256_256(b"my-enclave"),
platform: TeePlatform::Sgx as u8,
reserved: [0; 3],
valid_from: 0,
valid_until: 0, // no expiry
sealed_key: vec![0xBB; 32],
};
// Verify the key is accessible in the current environment
verify_key_binding(
&key,
TeePlatform::Sgx,
&shake256_256(b"my-enclave"),
current_time_ns,
)?; // Ok(()) if platform + measurement matchAny segment produced inside a TEE can set the ATTESTED flag for fast scanning:
use rvf_types::SegmentFlags;
let flags = SegmentFlags::empty()
.with(SegmentFlags::SIGNED)
.with(SegmentFlags::ATTESTED);
// bit 2 (SIGNED) + bit 10 (ATTESTED) = 0x0404Progressive Indexing
Traditional vector databases make you wait for the full index before you can query. RVF uses a three-layer progressive model:
Layer A (Coarse Routing)
- Contains entry points and partition centroids
- Loads in microseconds from the manifest
- Provides approximate results immediately (recall >= 0.70)
Layer B (Hot Region)
- Contains adjacency lists for frequently-accessed vectors
- Loaded based on temperature heuristics
- Improves recall to >= 0.85
Layer C (Full Graph)
- Complete HNSW adjacency for all vectors
- Full recall >= 0.95
- Loaded in the background while queries are already being served
use rvf_index::progressive::ProgressiveIndex;
use rvf_index::layers::IndexLayer;
let mut adapter = RvfIndexAdapter::new(IndexAdapterConfig::default());
adapter.build(vectors, ids);
// Start with Layer A only (fastest)
adapter.load_progressive(&[IndexLayer::A]);
let fast_results = adapter.search(&query, 10);
// Add layers as they load
adapter.load_progressive(&[IndexLayer::A, IndexLayer::B, IndexLayer::C]);
let precise_results = adapter.search(&query, 10);Quantization Tiers
RVF automatically assigns vectors to quantization tiers based on access frequency:
| Tier | Temperature | Method | Memory | Recall |
|---|---|---|---|---|
| Hot | Frequently accessed | fp16 / scalar | 2x per dim | ~0.999 |
| Warm | Moderate access | Product quantization | 8-16x compression | ~0.95 |
| Cold | Rarely accessed | Binary quantization | 32x compression | ~0.80 |
- A Count-Min Sketch tracks access frequency per vector
- Vectors are assigned to tiers based on configurable thresholds
- Hot vectors stay at full precision for fast, accurate retrieval
- Cold vectors are heavily compressed but still searchable
- Tier assignment is stored in SKETCH_SEG and updated periodically
use rvf_quant::scalar::ScalarQuantizer;
use rvf_quant::product::ProductQuantizer;
use rvf_quant::binary::{encode_binary, hamming_distance};
use rvf_quant::traits::Quantizer;
// Scalar quantization (Hot tier)
let sq = ScalarQuantizer::train(&vectors);
let encoded = sq.encode(&vector);
let decoded = sq.decode(&encoded);
// Product quantization (Warm tier)
let pq = ProductQuantizer::train(&vectors, 8); // 8 subquantizers
let code = pq.encode(&vector);
// Binary quantization (Cold tier)
let bits = encode_binary(&vector);
let dist = hamming_distance(&bits_a, &bits_b);Wire Format Specification
| Offset | Size | Field | Description |
|---|---|---|---|
| 0x00 | 4 | magic |
0x52564653 ("RVFS") |
| 0x04 | 1 | version |
Format version (currently 1) |
| 0x05 | 1 | seg_type |
Segment type (see enum below) |
| 0x06 | 2 | flags |
Bitfield (COMPRESSED, ENCRYPTED, SIGNED, SEALED, ATTESTED, ...) |
| 0x08 | 8 | segment_id |
Monotonically increasing ID |
| 0x10 | 8 | payload_length |
Byte length of payload |
| 0x18 | 8 | timestamp_ns |
Nanosecond UNIX timestamp |
| 0x20 | 1 | checksum_algo |
0=CRC32C, 1=XXH3-128, 2=SHAKE-256 |
| 0x21 | 1 | compression |
0=none, 1=LZ4, 2=ZSTD |
| 0x22 | 2 | reserved_0 |
Must be zero |
| 0x24 | 4 | reserved_1 |
Must be zero |
| 0x28 | 16 | content_hash |
First 128 bits of payload hash |
| 0x38 | 4 | uncompressed_len |
Original size before compression |
| 0x3C | 4 | alignment_pad |
Padding to 64-byte boundary |
| Code | Name | Description |
|---|---|---|
0x01 |
VEC | Raw vector embeddings |
0x02 |
INDEX | HNSW adjacency and routing |
0x03 |
OVERLAY | Graph overlay deltas |
0x04 |
JOURNAL | Metadata mutations, deletions |
0x05 |
MANIFEST | Segment directory, epoch state |
0x06 |
QUANT | Quantization dictionaries |
0x07 |
META | Key-value metadata |
0x08 |
HOT | Temperature-promoted data |
0x09 |
SKETCH | Access counter sketches |
0x0A |
WITNESS | Audit trails, attestation proofs |
0x0B |
PROFILE | Domain profile declarations |
0x0C |
CRYPTO | Key material, signature chains |
0x0D |
META_IDX | Metadata inverted indexes |
0x0E |
KERNEL | Compressed unikernel image (self-booting) |
0x0F |
EBPF | eBPF program for kernel-level acceleration |
0x10 |
WASM | WASM microkernel / self-bootstrapping bytecode |
0x20 |
COW_MAP | Cluster ownership map (local vs parent) |
0x21 |
REFCOUNT | Cluster reference counts (rebuildable) |
0x22 |
MEMBERSHIP | Vector visibility filter for branches |
0x23 |
DELTA | Sparse delta patches (LoRA overlays) |
0x30 |
TRANSFER_PRIOR | Transfer learning prior distributions |
0x31 |
POLICY_KERNEL | Thompson Sampling policy kernels |
0x32 |
COST_CURVE | Cost/reward curves for solver |
| Bit | Name | Description |
|---|---|---|
| 0 | COMPRESSED | Payload is compressed |
| 1 | ENCRYPTED | Payload is encrypted |
| 2 | SIGNED | Signature footer follows payload |
| 3 | SEALED | Immutable (compaction output) |
| 4 | PARTIAL | Streaming/partial write |
| 5 | TOMBSTONE | Logical deletion |
| 6 | HOT | Temperature-promoted |
| 7 | OVERLAY | Contains delta data |
| 8 | SNAPSHOT | Full snapshot |
| 9 | CHECKPOINT | Safe rollback point |
| 10 | ATTESTED | Produced inside attested TEE |
| 11 | HAS_LINEAGE | File carries FileIdentity lineage data |
RVF uses a two-fsync protocol:
- Write data segment + payload, then
fsync - Write MANIFEST_SEG with updated state, then
fsync
If the process crashes between fsyncs, the incomplete segment is ignored on recovery (no valid manifest references it). No write-ahead log is needed.
When SIGNED flag is set, a signature footer follows the payload:
| Offset | Size | Field |
|---|---|---|
| 0x00 | 2 | sig_algo (0=Ed25519, 1=ML-DSA-65, 2=SLH-DSA-128s) |
| 0x02 | 2 | sig_length |
| 0x04 | var | signature (64 to 7,856 bytes) |
| var | 4 | footer_length (for backward scan) |
Witness Chains & Audit Trails
A witness chain is a tamper-evident linked list of events, stored in WITNESS_SEG payloads. Each entry is 73 bytes:
| Field | Size | Description |
|---|---|---|
prev_hash |
32 | SHAKE-256 of previous entry (zero for genesis) |
action_hash |
32 | SHAKE-256 of the action being witnessed |
timestamp_ns |
8 | Nanosecond timestamp |
witness_type |
1 | Event type discriminator |
Changing any byte in any entry causes all subsequent prev_hash values to fail verification. This provides tamper-evidence without a blockchain.
| Code | Name | Usage |
|---|---|---|
0x01 |
PROVENANCE | Data origin tracking |
0x02 |
COMPUTATION | Operation recording |
0x03 |
SEARCH | Query audit logging |
0x04 |
DELETION | Deletion audit logging |
0x05 |
PLATFORM_ATTESTATION | TEE attestation quote |
0x06 |
KEY_BINDING | Key sealed to TEE |
0x07 |
COMPUTATION_PROOF | Verified enclave computation |
0x08 |
DATA_PROVENANCE | Model-to-TEE-to-RVF chain |
0x09 |
DERIVATION | File lineage derivation event |
0x0A |
LINEAGE_MERGE | Multi-parent lineage merge |
0x0B |
LINEAGE_SNAPSHOT | Lineage snapshot checkpoint |
0x0C |
LINEAGE_TRANSFORM | Lineage transform operation |
0x0D |
LINEAGE_VERIFY | Lineage verification event |
0x0E |
CLUSTER_COW | COW cluster copy event |
0x0F |
CLUSTER_DELTA | Delta patch applied to cluster |
use rvf_crypto::{create_witness_chain, verify_witness_chain, WitnessEntry};
use rvf_crypto::shake256_256;
let entries = vec![
WitnessEntry {
prev_hash: [0; 32],
action_hash: shake256_256(b"inserted 1000 vectors"),
timestamp_ns: 1_700_000_000_000_000_000,
witness_type: 0x01,
},
WitnessEntry {
prev_hash: [0; 32], // set by create_witness_chain
action_hash: shake256_256(b"queried top-10"),
timestamp_ns: 1_700_000_001_000_000_000,
witness_type: 0x03,
},
];
let chain_bytes = create_witness_chain(&entries);
let verified = verify_witness_chain(&chain_bytes)?;
assert_eq!(verified.len(), 2);Building from Source
- Rust 1.87+ (
rustup update stable) - For WASM:
rustup target add wasm32-unknown-unknown - For Node.js bindings: Node.js 18+ and
npm
cd crates/rvf
cargo build --workspacecargo test --workspacecargo clippy --all-targets --workspace --exclude rvf-wasmcargo build --target wasm32-unknown-unknown -p rvf-wasm --release
ls target/wasm32-unknown-unknown/release/rvf_wasm.wasmcargo build -p rvf-cli
./target/debug/rvf --helpcd rvf-node
npm install
npm run buildcargo bench --bench rvf_benchmarksDomain Profiles
Domain profiles optimize RVF behavior for specific data types:
| Profile | Code | Optimized For |
|---|---|---|
| Generic | 0x00 |
General-purpose vectors |
| RVDNA | 0x01 |
Genomic sequence embeddings |
| RVText | 0x02 |
Language model embeddings (default for agentdb) |
| RVGraph | 0x03 |
Graph/network node embeddings |
| RVVision | 0x04 |
Image/vision model embeddings |
| Profile | Level | Description |
|---|---|---|
| Generic | 0 | Minimal features, fits anywhere |
| Core | 1 | Moderate resources, good defaults |
| Hot | 2 | Memory-rich, high-performance |
| Full | 3 | All features enabled |
File Format Reference
.rvf— Standard RuVector Format file.rvf.cold.N— Cold shard N (multi-file mode).rvf.idx.N— Index shard N (multi-file mode)
application/x-ruvector-format
0x52564653 (ASCII: "RVFS")
All multi-byte integers are little-endian.
All segments are 64-byte aligned (cache-line friendly).
The root manifest (Level 0) occupies the last 4,096 bytes of the most recent MANIFEST_SEG. This enables instant location via seek(EOF - scan) and provides:
- Segment directory (offsets to all segments)
- Hotset pointers (entry points, top layer, centroids, quant dicts)
- Epoch counter
- Vector count and dimension
- Profile identifiers
RVF supports copy-on-write branching at cluster granularity (ADR-031). Instead of copying an entire file to create a variant, a derived file stores only the clusters that changed. This enables Git-like branching for vector databases.
A COW child inherits all vector data from its parent by reference. Writes only allocate local clusters as needed (one slab copy per modified cluster). A 1M-vector parent (~512 MB) with 100 modified vectors produces a child of ~10 clusters (~2.5 MB).
use rvf_runtime::RvfStore;
// Create parent with vectors
let parent = RvfStore::create(Path::new("parent.rvf"), options)?;
// ... ingest vectors ...
// Derive a COW child — inherits all data, stores only changes
let child = parent.branch(Path::new("child.rvf"))?;
// COW statistics
if let Some(stats) = child.cow_stats() {
println!("Clusters: {} total, {} local", stats.cluster_count, stats.local_cluster_count);
}Branches share the parent's HNSW index. A membership filter (dense bitmap) controls which vectors are visible per branch. Excluded nodes still serve as routing waypoints during graph traversal but are never returned in results.
- Include mode (default): vector visible iff
filter.contains(id). Empty filter = empty view (fail-safe). - Exclude mode: vector visible iff
!filter.contains(id). Empty filter = full view.
use rvf_runtime::membership::MembershipFilter;
let mut filter = MembershipFilter::new_include(1_000_000);
filter.add(42); // vector 42 is now visible
filter.contains(42); // true
filter.contains(100); // falseFreeze creates an immutable snapshot of the current generation. Further writes require deriving a new branch. Freeze is a metadata-only operation (no data copy).
let mut branch = parent.branch(Path::new("snapshot.rvf"))?;
branch.freeze()?;
// Writes now fail:
assert!(branch.ingest_batch(&[&vec], &[1], None).is_err());
// Continue on a new branch:
let next = parent.branch(Path::new("next.rvf"))?;The KernelBinding footer (128 bytes, padded) cryptographically ties a KERNEL_SEG to its manifest. This prevents segment-swap attacks where a signed kernel from one file is embedded into a different file.
use rvf_types::kernel_binding::KernelBinding;
let binding = KernelBinding {
manifest_root_hash: manifest_hash, // SHAKE-256-256 of Level0Root
policy_hash: policy_hash, // SHAKE-256-256 of security policy
binding_version: 1,
..Default::default()
};
store.embed_kernel_with_binding(arch, ktype, flags, &image, port, cmdline, &binding)?;| Code | Name | Size | Purpose |
|---|---|---|---|
0x20 |
COW_MAP | 64B header | Cluster ownership map (local vs parent) |
0x21 |
REFCOUNT | 32B header | Cluster reference counts (rebuildable) |
0x22 |
MEMBERSHIP | 96B header | Vector visibility filter for branches |
0x23 |
DELTA | 64B header | Sparse delta patches between clusters |
rvf launch <file> # Boot RVF in QEMU microVM
rvf embed-kernel <file> [--arch x86_64] # Embed kernel image
rvf embed-ebpf <file> --program <src.c> # Compile and embed eBPF
rvf filter <file> --include <id-list> # Create membership filter
rvf freeze <file> # Snapshot-freeze current state
rvf verify-witness <file> # Verify witness chain
rvf verify-attestation <file> # Verify KernelBinding + attestation
rvf rebuild-refcounts <file> # Recompute refcounts from COW mapFor the full specification, see ADR-031: RVCOW Branching and Real Cognitive Containers.
Verified end-to-end workflows that demonstrate real capabilities:
# Create a store, ingest 100 vectors, query, derive a child
rvf create demo.rvf --dimension 128
rvf ingest demo.rvf --input data.json --format json
rvf query demo.rvf --vector "0.1,0.2,0.3,..." --k 5
rvf derive demo.rvf child.rvf --type filter
rvf inspect demo.rvf
# MANIFEST_SEG (4 KB), VEC_SEG (51 KB), INDEX_SEG (12 KB)cargo run --example self_booting
# Output:
# Ingested 50 vectors (128 dims)
# Pre-kernel query: top-5 results OK (nearest ID=25)
# Kernel: 4,640 bytes embedded (x86_64, Hermit)
# Extracted kernel: arch=X86_64, api_port=8080
# Witness chain: 5 entries, all verified ✓
# File size: 31 KB — data + kernel + witness in one filecargo run --example linux_microkernel
# Output:
# 20 packages installed as vector embeddings
# Kernel: Linux x86_64 (4,640 bytes)
# SSH: Ed25519 keys signed and verified ✓
# Witness chain: 22 entries, all verified ✓
# Package search: "build tool" → found gcc, make, cmake
# File size: 14 KB — bootable system imagecargo run --example claude_code_appliance
# Output:
# 20 dev packages (rust, node, python, docker, ...)
# Kernel: Linux x86_64 with SSH on port 2222
# eBPF: XDP distance program embedded
# Witness chain: 6 entries, all verified ✓
# Ed25519 signed, tamper-evident
# File size: 17 KB — sealed cognitive containercargo test --workspace
# agi_e2e .................. 12 passed
# adr033_integration ....... 34 passed
# qr_seed_e2e .............. 11 passed
# witness_e2e .............. 10 passed
# attestation .............. 6 passed
# crypto ................... 10 passed
# computational_container .. 8 passed
# cow_branching ............ 8 passed
# cross_platform ........... 6 passed
# lineage .................. 4 passed
# smoke .................... 4 passed
# + unit tests across all crates
# Total: 1,156 tests passedcd examples/rvf && cargo run --example generate_all
ls output/ # 45 .rvf files (~11 MB total)
rvf inspect output/sealed_engine.rvf
rvf inspect output/linux_microkernel.rvfgit clone https://github.com/ruvnet/ruvector
cd ruvector/crates/rvf
cargo test --workspaceAll contributions must pass cargo clippy --all-targets with zero warnings and maintain the existing test count (currently 1,156+).
| ADR | Title |
|---|---|
| ADR-030 | RVF Cognitive Container (Kernel, eBPF, WASM tiers) |
| ADR-031 | RVCOW Branching & Real Cognitive Containers |
| ADR-033 | Progressive Indexing Hardening |
| ADR-034 | QR Cognitive Seed (RVQS) |
| ADR-035 | Capability Report |
| ADR-036 | AGI Cognitive Container |
| ADR-037 | Publishable RVF Acceptance Tests |
| ADR-038 | npx ruvector rvlite Witness Integration |
| ADR-039 | RVF Solver WASM AGI Integration |
Dual-licensed under MIT or Apache-2.0 at your option.
Built with Rust. Not a database — a portable cognitive runtime.