RuVector/crates/rvf at main · ruvnet/RuVector

Name	Name	Last commit message	Last commit date
parent directory ..
benches	benches
docs	docs
rvf-adapters	rvf-adapters
rvf-cli	rvf-cli
rvf-crypto	rvf-crypto
rvf-ebpf	rvf-ebpf
rvf-federation	rvf-federation
rvf-import	rvf-import
rvf-index	rvf-index
rvf-kernel	rvf-kernel
rvf-launch	rvf-launch
rvf-manifest	rvf-manifest
rvf-node	rvf-node
rvf-quant	rvf-quant
rvf-runtime	rvf-runtime
rvf-server	rvf-server
rvf-solver-wasm	rvf-solver-wasm
rvf-types	rvf-types
rvf-wasm	rvf-wasm
rvf-wire	rvf-wire
tests/rvf-integration	tests/rvf-integration
Cargo.lock	Cargo.lock
Cargo.toml	Cargo.toml
README.md	README.md

RVF — RuVector Format

One file. Store vectors. Ship models. Boot services. Prove everything.

🚀 Quick Start • 📦 What It Contains • 🧠 Cognitive Engines • 🏗️ Architecture • ⚡ Performance • 📊 Comparison

dsp

🧠 What is RVF? A Cognitive Container

RVF (RuVector Format) is a universal binary substrate that merges database, model, graph engine, kernel, and attestation into a single deployable file.

A .rvf file can store vector embeddings, carry LoRA adapter deltas, embed GNN graph state, include a bootable Linux microkernel, run queries in a 5.5 KB WASM runtime, and prove every operation through a cryptographic witness chain — all in one file that runs anywhere from a browser to bare metal.

This is not a database format. It is an executable knowledge unit.

🖥️ Compute & Execution

Capability	How	Segment
🖥️ Self-boot as a microservice	The file contains a real Linux kernel. Drop it on a VM and it boots as a running service in under 125 ms. No install, no dependencies.	`KERNEL_SEG` (0x0E)
⚡ Hardware-speed lookups via eBPF	Hot vectors are served directly in the Linux kernel data path, bypassing userspace entirely. Three real C programs handle distance, filtering, and routing.	`EBPF_SEG` (0x0F)
🌐 Runs in any browser	A 5.5 KB WebAssembly runtime lets the same file serve queries in a browser tab with zero backend.	`WASM_SEG`

🧠 AI & Data Storage

Capability	How	Segment
🧠 Ship models, graphs, and quantum state	One file carries LoRA fine-tune weights, graph neural network state, and quantum circuit snapshots alongside vectors. No separate model registry needed.	`OVERLAY` / `GRAPH` / `SKETCH`
🌿 Git-like branching	Create a child file that shares all parent data. Only changed vectors are copied. A 1M-vector parent with 100 edits produces a ~2.5 MB child instead of a 512 MB copy.	`COW_MAP` / `MEMBERSHIP` (0x20-0x23)
📊 Instant queries while loading	Start answering queries at 70% accuracy immediately. Accuracy improves to 95%+ as the full index loads in the background. No waiting.	`INDEX_SEG`
🔍 Search with filters	Combine vector similarity with metadata conditions like "genre = sci-fi AND year > 2020" in a single query.	`META_IDX_SEG` (0x0D)
💥 Never corrupts on crash	Power loss mid-write? The file is always readable. Append-only design means incomplete writes are simply ignored on recovery. No write-ahead log needed.	Format rule

🔐 Security & Trust

RVF treats security as a structural property of the format, not an afterthought. Every segment can be individually signed, every operation is hash-chained into a tamper-evident ledger, and every derived file carries a cryptographic link to its parent. The result: you can hand someone a .rvf file and they can independently verify what data is inside, who produced it, what operations were performed, and whether anything was altered — without trusting the sender.

Capability	How	Segment
🔗 Tamper-evident audit trail	Every insert, query, and deletion is recorded in a SHAKE-256 hash-linked chain. Change one byte anywhere and the entire chain fails verification.	`WITNESS_SEG` (0x0A)
🔐 Kernel locked to its data	A 128-byte `KernelBinding` footer ties each signed kernel to its manifest hash. Prevents segment-swap attacks — the kernel only boots if the data it was built for is present and unmodified.	`KERNEL_SEG` + `CRYPTO_SEG`
🛡️ Quantum-safe signatures	Segments can be signed with ML-DSA-65 (FIPS 204) and SLH-DSA-128s alongside Ed25519. Dual-signing means files stay trustworthy even after quantum computers break classical crypto.	`CRYPTO_SEG` (0x0C)
🧬 Track where data came from	Every file records its parent, grandparent, and full derivation history with cryptographic hashes — DNA-style lineage. Verify that a child was legitimately derived from its parent without accessing the parent file.	`MANIFEST_SEG`
🏛️ TEE attestation	Record hardware attestation quotes from Intel SGX, AMD SEV-SNP, Intel TDX, and ARM CCA. Proves vector operations ran inside a verified secure enclave.	`CRYPTO_SEG`
🛡️ Adversarial hardening	Input validation, rate limiting, and resource exhaustion guards. Declarative `SecurityPolicy` configuration prevents denial-of-service and malformed-input attacks.	Runtime

📦 Ecosystem & Tooling

Capability	How	Segment
🤖 Plug into AI agents	An MCP server lets Claude Code, Cursor, and other AI tools create, query, and manage vector stores directly.	npm package
📦 Use from any language	Published as 14 Rust crates, 6 adapters, 4 npm packages, a CLI tool, and an HTTP server. Works from Rust, Node.js, browsers, and the command line.	14 crates + 6 adapters + 4 npm
♻️ Always backward-compatible	Old tools skip new segment types they don't understand. A file with COW branching still works in a reader that only knows basic vectors.	Format rule

         📦 Anatomy of a .rvf Cognitive Container (24 segment types)
       ┌─────────────────────────────────────────────────────────────┐
       │                       .rvf file                             │
       ├──────────────────────────┬──────────────────────────────────┤
       │  📋 Core Data            │  🧠 AI & Models                 │
       │  MANIFEST  (4 KB root)   │  OVERLAY   (LoRA deltas)         │
       │  VEC_SEG   (embeddings)  │  GRAPH     (GNN state)           │
       │  INDEX_SEG (HNSW graph)  │  SKETCH    (quantum / VQE)       │
       │  QUANT     (codebooks)   │  META      (key-value)           │
       │  HOT       (promoted)    │  PROFILE   (domain config)       │
       │  META_IDX  (filter idx)  │  JOURNAL   (mutations)           │
       ├──────────────────────────┼──────────────────────────────────┤
       │  🌿 COW Branching        │  🔐 Security & Trust            │
       │  COW_MAP   (ownership)   │  WITNESS   (audit chain)         │
       │  REFCOUNT  (ref counts)  │  CRYPTO    (signatures)          │
       │  MEMBERSHIP (visibility) │  KERNEL    (Linux + binding)     │
       │  DELTA     (sparse patch)│  EBPF      (XDP / TC / socket)   │
       │                          │  WASM      (5.5 KB runtime)      │
       ├──────────────────────────┴──────────────────────────────────┤
       │                                                             │
       │   Store it ─── single-file vector DB, no external deps      │
       │   Ship it  ─── wire-format streaming, one file = one unit   │
       │   Run it   ─── boots Linux, runs in browser, eBPF in kernel │
       │   Trust it ─── witness chain + attestation + PQ signatures  │
       │   Branch it ── COW at cluster granularity, <3 ms            │
       │   Track it ─── DNA-style lineage from parent to child       │
       │                                                             │
       │         ┌──────────┐              ┌──────────┐              │
       │         │ 🖥️ Boots │             │ 🌐 Runs  │              │
       │         │ as Linux │              │ in any   │              │
       │         │ microVM  │              │ browser  │              │
       │         │ <125 ms  │              │ 5.5 KB   │              │
       │         └──────────┘              └──────────┘              │
       └─────────────────────────────────────────────────────────────┘

The same .rvf file runs on servers, browsers (WASM), edge devices, TEE enclaves, Firecracker microVMs, and in the Linux kernel data path (eBPF) — no conversion, no re-indexing, no external dependencies.

📦 Published Packages

Rust Crates (crates.io)

Crate	Version	Description
`rvf-types`	0.2.0	Segment types, 24 headers, quality, security, AGI container types (`no_std`)
`rvf-wire`	0.1.0	Wire format read/write (`no_std`)
`rvf-manifest`	0.1.0	Two-level manifest, FileIdentity, COW pointers
`rvf-quant`	0.1.0	Scalar, product, and binary quantization
`rvf-index`	0.1.0	HNSW progressive indexing (Layer A/B/C)
`rvf-crypto`	0.2.0	SHAKE-256, Ed25519, witness chains, seed crypto
`rvf-runtime`	0.2.0	Full store API, COW engine, AGI containers, QR seeds, safety net
`rvf-kernel`	0.1.0	Linux kernel builder, initramfs, Docker pipeline
`rvf-ebpf`	0.1.0	BPF C compiler (XDP, socket filter, TC)
`rvf-launch`	0.1.0	QEMU microvm launcher, KVM/TCG, QMP
`rvf-server`	0.1.0	HTTP REST + TCP streaming server
`rvf-import`	0.1.0	JSON, CSV, NumPy importers
`rvf-cli`	0.1.0	Unified CLI with 17 subcommands
`rvf-solver-wasm`	0.1.0	Thompson Sampling temporal solver (WASM, `no_std`)

npm Packages (npmjs.org)

Package	Version	Description
`@ruvector/rvf`	0.1.0	Unified TypeScript SDK
`@ruvector/rvf-node`	0.1.0	Node.js N-API native bindings
`@ruvector/rvf-wasm`	0.1.0	WASM browser package
`@ruvector/rvf-mcp-server`	0.1.0	MCP server for AI agents

Platform Support

Platform	Status	Notes
Linux (x86_64, aarch64)	Full	KVM acceleration, eBPF, SIMD (AVX2/NEON)
macOS (x86_64, Apple Silicon)	Full	TCG fallback for QEMU, NEON SIMD on ARM
Windows (x86_64)	Core	Store, query, index, crypto work. QEMU launcher requires WSL or Windows QEMU.
WASM (browser, edge)	Full	5.5 KB microkernel, ~46 KB control plane
no_std (embedded)	Types only	`rvf-types` and `rvf-wire` are `no_std` compatible

🚀 Quick Start

Install

# Rust crate (library)
cargo add rvf-runtime

# CLI tool
cargo install rvf-cli
# or build from source:
cd crates/rvf && cargo build -p rvf-cli --release

# Node.js / npm
npm install @ruvector/rvf-node

# WASM (browser / edge)
rustup target add wasm32-unknown-unknown
cargo build -p rvf-wasm --target wasm32-unknown-unknown --release
# → target/wasm32-unknown-unknown/release/rvf_wasm.wasm (~46 KB)

# MCP Server (for Claude Code, Cursor, etc.)
npx @ruvector/rvf-mcp-server --transport stdio

Rust Crate

# Cargo.toml
[dependencies]
rvf-runtime = "0.2"          # full store API
rvf-types   = "0.2"          # types only (no_std)
rvf-wire    = "0.1"          # wire format (no_std)
rvf-crypto  = "0.2"          # signatures + witness chains
rvf-import  = "0.1"          # JSON/CSV/NumPy importers

use rvf_runtime::{RvfStore, options::{RvfOptions, QueryOptions, DistanceMetric}};

let mut store = RvfStore::create("vectors.rvf", RvfOptions {
    dimension: 384,
    metric: DistanceMetric::Cosine,
    ..Default::default()
})?;

// Insert
store.ingest_batch(&[&embedding], &[1], None)?;

// Query
let results = store.query(&query, 10, &QueryOptions::default())?;

// Derive a child with lineage tracking
let child = store.derive("child.rvf", DerivationType::Filter, None)?;

// Embed a kernel — file now boots as a microservice
store.embed_kernel(0x00, 0x01, 0, &kernel_image, 8080, None)?;

store.close()?;

Node.js / npm

npm install @ruvector/rvf-node

const { RvfDatabase } = require('@ruvector/rvf-node');

// Create, insert, query
const db = RvfDatabase.create('vectors.rvf', { dimension: 384 });
db.ingestBatch(new Float32Array(384), [1]);
const results = db.query(new Float32Array(384), 10);

// Lineage & inspection
console.log(db.fileId());       // unique file UUID
console.log(db.dimension());    // 384
console.log(db.segments());     // [{ type, id, size }]

db.close();

WASM (Browser / Edge)

<script type="module">
  import init, { WasmRvfStore } from './rvf_wasm.js';
  await init();

  const store = WasmRvfStore.create(384);
  store.ingest(1, new Float32Array(384));
  const results = store.query(new Float32Array(384), 10);
  console.log(results); // [{ id, distance }]
</script>

The WASM binary is ~46 KB (control plane with in-memory store) or ~5.5 KB (tile microkernel for Cognitum). No backend required.

CLI

# Full lifecycle from the command line
rvf create vectors.rvf --dimension 384
rvf ingest vectors.rvf --input data.json --format json
rvf query  vectors.rvf --vector "0.1,0.2,..." --k 10
rvf status vectors.rvf
rvf inspect vectors.rvf          # show all segments
rvf compact vectors.rvf          # reclaim deleted space
rvf derive parent.rvf child.rvf --type filter
rvf serve  vectors.rvf --port 8080

# Machine-readable output
rvf status vectors.rvf --json

Lightweight (rvlite)

use rvf_adapter_rvlite::{RvliteCollection, RvliteConfig};

let mut col = RvliteCollection::create(RvliteConfig::new("vectors.rvf", 128))?;
col.add(1, &[0.1; 128])?;
let matches = col.search(&[0.15; 128], 5);

Generate Sample Files

cd examples/rvf
cargo run --example generate_all
ls output/   # 46 .rvf files ready to inspect
rvf status output/sealed_engine.rvf
rvf inspect output/linux_microkernel.rvf

📋 What RVF Contains

An RVF file is a sequence of typed segments. Each segment is self-describing, 64-byte aligned, and independently integrity-checked. The format supports 24 segment types that together constitute a complete cognitive runtime:

.rvf file (Sealed Cognitive Engine)
  |
  +-- MANIFEST_SEG .... 4 KB root manifest, segment directory, instant boot
  +-- VEC_SEG ......... Vector embeddings (fp16/fp32/int8/int4/binary)
  +-- INDEX_SEG ....... HNSW progressive index (Layer A/B/C)
  +-- OVERLAY_SEG ..... LoRA adapter deltas, incremental updates
  +-- GRAPH_SEG ....... GNN adjacency, edge weights, graph state
  +-- QUANT_SEG ....... Quantization codebooks (scalar/PQ/binary)
  +-- SKETCH_SEG ...... Access sketches, VQE snapshots, quantum state
  +-- META_SEG ........ Key-value metadata, observation-state
  +-- WITNESS_SEG ..... Tamper-evident audit trails, attestation records
  +-- CRYPTO_SEG ...... ML-DSA-65 / Ed25519 signatures, sealed keys
  +-- WASM_SEG ........ 5.5 KB query microkernel (Tier 1: browser/edge)
  +-- EBPF_SEG ........ eBPF fast-path program (Tier 2: kernel acceleration)
  +-- KERNEL_SEG ...... Compressed unikernel (Tier 3: self-booting service)
  +-- PROFILE_SEG ..... Domain profile (RVDNA/RVText/RVGraph/RVVision)
  +-- HOT_SEG ......... Temperature-promoted hot data
  +-- META_IDX_SEG .... Metadata inverted indexes for filtered search
  +-- COW_MAP_SEG ..... Cluster ownership map for COW branching (0x20)
  +-- REFCOUNT_SEG .... Cluster reference counts, rebuildable (0x21)
  +-- MEMBERSHIP_SEG .. Vector visibility filter for branches (0x22)
  +-- DELTA_SEG ....... Sparse delta patches / LoRA overlays (0x23)
  +-- TRANSFER_PRIOR .. Transfer learning priors (0x30)
  +-- POLICY_KERNEL ... Thompson Sampling policy state (0x31)
  +-- COST_CURVE ...... Cost/reward curves for solver (0x32)

🧠 Sealed Cognitive Engines

When an RVF file combines vectors, models, compute, and trust segments, it becomes a deployable intelligence capsule:

Example: Domain Intelligence Unit

ClinicalOncologyEngine.rvdna           (one file, ~50 MB)
  Contains:
  -- Medical corpus embeddings          VEC_SEG      384-dim, 2M vectors
  -- MicroLoRA oncology fine-tune       OVERLAY_SEG  adapter deltas
  -- Biological pathway GNN             GRAPH_SEG    pathway modeling
  -- Molecular similarity state         SKETCH_SEG   quantum-enhanced
  -- Linux microkernel service          KERNEL_SEG   boots on Firecracker
  -- Browser query runtime              WASM_SEG     5.5 KB, no backend
  -- eBPF drug lookup accelerator       EBPF_SEG     sub-microsecond
  -- Attested execution proof           WITNESS_SEG  tamper-evident chain
  -- Post-quantum signature             CRYPTO_SEG   ML-DSA-65

This is not a database. It is a sealed, auditable, self-booting domain expert. Copy it to a Firecracker VM and it boots a Linux service. Open it in a browser and WASM serves queries locally. Ship it air-gapped and it produces identical results under audit.

🔌 RuVector Ecosystem Integration

RVF is the canonical binary format across 87+ Rust crates in the RuVector ecosystem:

Domain	Crates	RVF Segment
LLM Inference	`ruvllm`, `ruvllm-cli`, `ruvllm-wasm`	VEC_SEG (KV cache), OVERLAY_SEG (LoRA)
Attention	`ruvector-attention`, coherence-gated transformer	VEC_SEG, INDEX_SEG
GNN	`ruvector-gnn`, `ruvector-graph`, graph-node/wasm	GRAPH_SEG
Quantum	`ruQu`, `ruqu-core`, `ruqu-algorithms`, `ruqu-exotic`	SKETCH_SEG (VQE, syndrome tables)
Min-Cut Coherence	`ruvector-mincut`, mincut-gated-transformer	GRAPH_SEG, INDEX_SEG
Delta Tracking	`ruvector-delta-core`, delta-graph, delta-index	OVERLAY_SEG, JOURNAL_SEG
Neural Routing	`ruvector-tiny-dancer-core` (FastGRNN)	VEC_SEG, META_SEG
Sparse Inference	`ruvector-sparse-inference`	VEC_SEG, QUANT_SEG
Temporal Tensors	`ruvector-temporal-tensor`	VEC_SEG, META_SEG
Cognitum Silicon	`cognitum-gate-kernel`, `cognitum-gate-tilezero`	WASM_SEG (64 KB tiles)
SONA Learning	`sona` (self-optimizing neural arch)	VEC_SEG, WITNESS_SEG
Agent Memory	claude-flow, agentdb, agentic-flow, ospipe	All segments via adapters

The same .rvf file format runs on cloud servers, Firecracker microVMs, TEE enclaves, edge devices, Cognitum tiles, and in the browser.

✨ Features

Storage & Indexing

Feature	Description
Append-only segments	Crash-safe without WAL. Every write is atomic with per-segment integrity checksums.
Progressive indexing	Three-tier HNSW (Layer A/B/C). First query at 70% recall before full index loads.
Temperature-tiered quantization	Hot vectors stay fp16, warm use product quantization, cold use binary — automatically.
Metadata filtering	Filtered k-NN with boolean expressions (AND/OR/NOT/IN/RANGE).
4 KB instant boot	Root manifest fits in one page read. Cold boot < 5 ms.
24 segment types	VEC, INDEX, MANIFEST, QUANT, WITNESS, CRYPTO, KERNEL, EBPF, WASM, COW_MAP, MEMBERSHIP, DELTA, TRANSFER_PRIOR, POLICY_KERNEL, COST_CURVE, and 9 more.

COW Branching (RVCOW)

Feature	Description
COW branching	Git-like copy-on-write at cluster granularity. Derive child stores that share parent data; only changed clusters are copied.
Membership filters	Shared HNSW index across branches with bitmap visibility control. Include/exclude modes.
Snapshot freeze	Immutable snapshot at any generation. Metadata-only operation, no data copy.
Delta segments	Sparse patches for LoRA overlays. Hot-path guard upgrades to full slab.
Rebuildable refcounts	No WAL. Refcounts derived from COW map chain during compaction.

Ecosystem & Tooling

Feature	Description
Domain profiles	`.rvdna`, `.rvtext`, `.rvgraph`, `.rvvis` extensions map to optimized profiles.
Unified CLI	17 subcommands: create, ingest, query, delete, status, inspect, compact, derive, serve, launch, embed-kernel, embed-ebpf, filter, freeze, verify-witness, verify-attestation, rebuild-refcounts.
6 library adapters	Drop-in integration for claude-flow, agentdb, ospipe, agentic-flow, rvlite, sona.
MCP server	Model Context Protocol integration for Claude Code, Cursor, and AI agents.
Node.js bindings	N-API bindings with lineage, kernel/eBPF, and inspection support.

🏗️ Architecture

  +-----------------------------------------------------------------+
  |                    Cognitive Layer                                |
  |  ruvllm (LLM)  | ruvector-gnn (GNN) | ruQu (Quantum)           |
  |  ruvector-attention | sona (SONA) | ruvector-mincut             |
  +---+------------------+-----------------+-----------+------------+
      |                  |                 |           |
  +---v------------------v-----------------v-----------v------------+
  |                    Agent & Application Layer                     |
  |  claude-flow | agentdb | agentic-flow | ospipe | rvlite         |
  +---+------------------+-----------------+-----------+------------+
      |                  |                 |           |
  +---v------------------v-----------------v-----------v------------+
  |                    RVF SDK Layer                                  |
  |  rvf-runtime | rvf-index | rvf-quant | rvf-crypto | rvf-wire    |
  |  rvf-manifest | rvf-types | rvf-import | rvf-adapters            |
  +---+--------+---------+----------+-----------+------------------+
      |        |         |          |           |
  +---v---+ +--v----+ +--v-----+ +-v--------+ +v-----------+ +v------+
  |server | | node  | | wasm   | | kernel   | | ebpf       | | cli   |
  |HTTP   | | N-API | | ~46 KB | |bzImage+  | |clang BPF   | |17 cmds|
  |REST+  | |       | |        | |initramfs | |XDP/TC/sock | |       |
  |TCP    | |       | |        | +----------+ +------------+ +-------+
  +-------+ +-------+ +--------+ +-v--------+
                                  | launch   |
                                  |QEMU+QMP  |
                                  +----------+

Segment Model

An .rvf file is a sequence of 64-byte-aligned segments. Each segment has a self-describing header:

+--------+------+-------+--------+-----------+-------+----------+
| Magic  | Ver  | Type  | Flags  | SegmentID | Size  | Hash     |
| 4B     | 1B   | 1B    | 2B     | 8B        | 8B    | 16B ...  |
+--------+------+-------+--------+-----------+-------+----------+
| Payload (variable length, 64-byte aligned)                     |
+----------------------------------------------------------------+

Crate Map

Crate	Lines	Purpose
`rvf-types`	7,000+	24 segment types, AGI container, quality, security, WASM bootstrap, QR seed (`no_std`)
`rvf-wire`	2,011	Wire format read/write (`no_std`)
`rvf-manifest`	1,700+	Two-level manifest with 4 KB root, FileIdentity codec, COW pointers, double-root scheme
`rvf-index`	2,691	HNSW progressive indexing (Layer A/B/C)
`rvf-quant`	1,443	Scalar, product, and binary quantization
`rvf-crypto`	1,725	SHAKE-256, Ed25519, witness chains, attestation, seed crypto
`rvf-runtime`	8,000+	Full store API, COW engine, AGI containers, QR seeds, safety net, adversarial defense
`rvf-kernel`	2,400+	Real Linux kernel builder, cpio/newc initramfs, Docker build, SHA3-256 verification
`rvf-launch`	1,200+	QEMU microvm launcher, KVM/TCG detection, QMP shutdown protocol
`rvf-ebpf`	1,100+	Real BPF C compiler (XDP, socket filter, TC), vmlinux.h generation
`rvf-wasm`	1,700+	WASM control plane: in-memory store, query, segment inspection, witness chain verification (~46 KB)
`rvf-solver-wasm`	1,500+	Thompson Sampling temporal solver, PolicyKernel, three-loop architecture (`no_std`)
`rvf-node`	852	Node.js N-API bindings with lineage, kernel/eBPF, and inspection
`rvf-cli`	1,800+	Unified CLI with 17 subcommands (create, ingest, query, delete, status, inspect, compact, derive, serve, launch, embed-kernel, embed-ebpf, filter, freeze, verify-witness, verify-attestation, rebuild-refcounts)
`rvf-server`	1,165	HTTP REST + TCP streaming server
`rvf-import`	980	JSON, CSV, NumPy (.npy) importers
Adapters	6,493	6 library integrations (see below)

⚡ Performance

Metric	Target	Achieved
Cold boot (4 KB manifest read)	< 5 ms	1.6 us
First query recall@10 (Layer A only)	>= 0.70	>= 0.70
Full quality recall@10 (Layer C)	>= 0.95	>= 0.95
WASM binary (tile microkernel)	< 8 KB	~5.5 KB
WASM binary (control plane)	< 50 KB	~46 KB
Segment header size	64 bytes	64 bytes
Minimum file overhead	< 1 KB	< 256 bytes
COW branch creation (10K vecs)	< 10 ms	2.6 ms (child = 162 bytes)
COW branch creation (100K vecs)	< 50 ms	6.8 ms (child = 162 bytes)
COW read (local cluster, pread)	< 5 us	1,348 ns/vector
COW read (inherited from parent)	< 5 us	1,442 ns/vector
Write coalescing (32 vecs, 1 cluster)	1 COW event	654 us, 1 event
CowMap lookup	< 100 ns	28 ns
Membership filter contains()	< 100 ns	23-33 ns
Snapshot freeze	< 100 ns	30-52 ns

Progressive Loading

RVF doesn't make you wait for the full index:

Stage	Data Loaded	Recall@10	Latency
Layer A	Entry points + centroids	>= 0.70	< 5 ms
Layer B	Hot region adjacency	>= 0.85	~10 ms
Layer C	Full HNSW graph	>= 0.95	~50 ms

📊 Comparison

Feature	RVF	Annoy	FAISS	Qdrant	Milvus
Single-file format	Yes	Yes	No	No	No
Crash-safe (no WAL)	Yes	No	No	Needs WAL	Needs WAL
Progressive loading	Yes (3 layers)	No	No	No	No
COW branching	Yes (cluster-level)	No	No	No	No
Membership filters	Yes (shared HNSW)	No	No	No	No
Snapshot freeze	Yes (zero-copy)	No	No	No	No
WASM support	Yes (5.5 KB)	No	No	No	No
Self-booting kernel	Yes (real Linux)	No	No	No	No
eBPF acceleration	Yes (XDP/TC/socket)	No	No	No	No
`no_std` compatible	Yes	No	No	No	No
Post-quantum sigs	Yes (ML-DSA-65)	No	No	No	No
TEE attestation	Yes	No	No	No	No
Metadata filtering	Yes	No	Yes	Yes	Yes
Temperature tiering	Automatic	No	Manual	No	No
Quantization	3-tier auto	No	Yes (manual)	Yes	Yes
Lineage provenance	Yes (DNA-style)	No	No	No	No
Domain profiles	5 profiles	No	No	No	No
Append-only	Yes	Build-once	Build-once	Log-based	Log-based

vs Docker / OCI Containers

	RVF Cognitive Container	Docker / OCI
File format	Single `.rvf` file	Layered tarball images
Boot target	QEMU microVM (microvm machine)	Container runtime (runc, containerd)
Vector data	Native segment, HNSW-indexed	External volume mount
Branching	Vector-native COW at cluster granularity	Layer-based COW (filesystem)
eBPF	Embedded in file, verified	Separate deployment
Attestation	Witness chain + KernelBinding	External signing (cosign, notary)
Size (hello world)	~17 KB (with initramfs + vectors)	~5 MB (Alpine)

vs Traditional Vector Databases

	RVF	Pinecone / Milvus / Qdrant
Deployment	Single file, zero dependencies	Server process + storage
Branching	Native COW, 2.6 ms for 10K vectors	Copy entire collection
Multi-tenant	Membership filter on shared index	Separate collections
Edge deploy	`scp file.rvf host:` + boot	Install + configure + import
Provenance	Cryptographic witness chain	External audit logs
Compute	Embedded kernel + eBPF	N/A

vs Git LFS / DVC

	RVF COW	Git LFS / DVC
Granularity	Vector cluster (256 KB)	Whole file
Index sharing	Shared HNSW + membership filter	No index awareness
Query during branch	Yes, sub-microsecond	No query capability
Delta encoding	Sparse row patches (LoRA)	Binary diff

vs SQLite / DuckDB

	RVF	SQLite	DuckDB
Vector-native	Yes (HNSW, quantization, COW)	No (extension needed)	No (extension needed)
Self-booting	Yes (KERNEL_SEG)	No	No
eBPF acceleration	Yes (XDP, TC, socket)	No	No
Cryptographic audit	Yes (witness chains)	No	No
Progressive loading	3-tier HNSW (70% → 95% recall)	N/A	N/A
WASM support	5.5 KB microkernel	Yes (via wasm)	No
Single file	Yes	Yes	Yes

🧬 Lineage Provenance

RVF supports DNA-style derivation chains for tracking how files were produced from one another. Each .rvf file carries a 68-byte FileIdentity recording its unique ID, its parent's ID, and a cryptographic hash of the parent's manifest. This enables tamper-evident provenance verification from any file back to its root ancestor.

  parent.rvf          child.rvf          grandchild.rvf
  (depth=0)           (depth=1)          (depth=2)
  file_id: AAA        file_id: BBB       file_id: CCC
  parent_id: 000      parent_id: AAA     parent_id: BBB
  parent_hash: 000    parent_hash: H(A)  parent_hash: H(B)
       |                   |                   |
       +-------derive------+-------derive------+

Domain Profiles & Extension Aliasing

Domain-specific extensions are automatically mapped to optimized profiles. The authoritative profile lives in the Level0Root.profile_id field; the file extension is a convenience hint:

Extension	Domain Profile	Optimized For
`.rvf`	Generic	General-purpose vectors
`.rvdna`	RVDNA	Genomic sequence embeddings
`.rvtext`	RVText	Language model embeddings
`.rvgraph`	RVGraph	Graph/network node embeddings
`.rvvis`	RVVision	Image/vision model embeddings

Deriving a Child Store

use rvf_runtime::{RvfStore, options::{RvfOptions, DistanceMetric}};
use rvf_types::DerivationType;
use std::path::Path;

let options = RvfOptions {
    dimension: 384,
    metric: DistanceMetric::Cosine,
    ..Default::default()
};
let parent = RvfStore::create(Path::new("parent.rvf"), options)?;

// Derive a filtered child -- inherits dimensions and options
let child = parent.derive(
    Path::new("child.rvf"),
    DerivationType::Filter,
    None,
)?;
assert_eq!(child.lineage_depth(), 1);
assert_eq!(child.parent_id(), parent.file_id());

🖥️ Self-Booting RVF (Cognitive Container)

RVF supports an optional three-tier execution model that allows a single .rvf file to carry executable compute alongside its vector data. A file can serve queries from a browser (Tier 1 WASM), accelerate hot-path lookups in the Linux kernel (Tier 2 eBPF), or boot as a standalone microservice inside a Firecracker microVM or TEE enclave (Tier 3 unikernel) -- all from the same file.

Tier	Segment	Size	Environment	Boot Time	Use Case
1: WASM	WASM_SEG (existing)	5.5 KB	Browser, edge, IoT	<1 ms	Portable queries everywhere
2: eBPF	EBPF_SEG (`0x0F`)	10-50 KB	Linux kernel (XDP, TC)	<20 ms	Sub-microsecond hot cache hits
3: Unikernel	KERNEL_SEG (`0x0E`)	200 KB - 2 MB	Firecracker, TEE, bare metal	<125 ms	Zero-dependency self-booting service

Readers that do not recognize KERNEL_SEG or EBPF_SEG skip them per the RVF forward-compatibility rule. The computational capability is purely additive.

Embedding a Kernel

use rvf_runtime::RvfStore;
use rvf_types::kernel::{KernelArch, KernelType};
use std::path::Path;

let mut store = RvfStore::open(Path::new("vectors.rvf"))?;

// Embed a compressed unikernel image
store.embed_kernel(
    KernelArch::X86_64 as u8,       // arch
    KernelType::Hermit as u8,        // kernel type
    0x0018,                          // flags: HAS_QUERY_API | HAS_NETWORKING
    &compressed_kernel_image,        // kernel binary
    8080,                            // API port
    Some("console=ttyS0 quiet"),     // cmdline (optional)
)?;

// Later, extract it
if let Some((header, image_data)) = store.extract_kernel()? {
    println!("Kernel: {:?} ({} bytes)", header.kernel_arch(), image_data.len());
}

Embedding an eBPF Program

use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType};

// Embed an eBPF XDP program for fast-path vector lookup
store.embed_ebpf(
    EbpfProgramType::XdpDistance as u8,   // program type
    EbpfAttachType::XdpIngress as u8,     // attach point
    384,                                   // max vector dimension
    &ebpf_bytecode,                        // BPF ELF object
    Some(&btf_section),                    // BTF data (optional)
)?;

if let Some((header, program_data)) = store.extract_ebpf()? {
    println!("eBPF: {:?} ({} bytes)", header.program_type, program_data.len());
}

Security Model

7-step fail-closed verification: hash, signature, TEE measurement, all must pass before kernel boot
Authority boundary: guest kernel owns auth/audit/witness; host eBPF is acceleration-only
Signing: Ed25519 for development, ML-DSA-65 (FIPS 204) for production
TEE priority: SEV-SNP first, SGX second, ARM CCA third
Size limits: kernel images capped at 128 MiB, eBPF programs at 16 MiB

For the full specification including wire formats, attestation binding, and implementation phases, see ADR-030: RVF Cognitive Container.

End-to-End: Claude Code Appliance

The claude_code_appliance example builds a complete self-booting AI development environment as a single .rvf file. It uses real infrastructure — a Docker-built Linux kernel, Ed25519 SSH keys, a BPF C socket filter, and a cryptographic witness chain.

Prerequisites: Docker (for kernel build), Rust 1.87+

# Build and run the example
cd examples/rvf
cargo run --example claude_code_appliance

What it produces (5.1 MB file):

claude_code_appliance.rvf
  ├── KERNEL_SEG    Linux 6.8.12 bzImage (5.2 MB, x86_64)
  ├── EBPF_SEG      Socket filter — allows ports 2222, 8080 only
  ├── VEC_SEG       20 package embeddings (128-dim)
  ├── INDEX_SEG     HNSW graph for package search
  ├── WITNESS_SEG   6-entry tamper-evident audit trail
  ├── CRYPTO_SEG    3 Ed25519 SSH user keys (root, deploy, claude)
  ├── MANIFEST_SEG  4 KB root with segment directory
  └── Snapshot      v1 derived image with lineage tracking

Boot sequence (once launched on Firecracker/QEMU):

1. Firecracker loads KERNEL_SEG → Linux boots (<125 ms)
2. SSH server starts on port 2222
3. curl -fsSL https://claude.ai/install.sh | bash
4. RVF query server starts on port 8080
5. Claude Code ready for use

Connect and use:

# Boot the file (requires QEMU or Firecracker)
rvf launch claude_code_appliance.rvf

# SSH in
ssh -p 2222 deploy@localhost

# Query the package database
curl -s localhost:8080/query -d '{"vector":[0.1,...], "k":5}'

# Or use the CLI
rvf query claude_code_appliance.rvf --vector "0.1,0.2,..." --k 5

Verified output from the example run:

=== Claude Code Appliance Summary ===
  File size:       5,260,093 bytes (5.1 MB)
  Segments:        8
  Packages:        20 (203.1 MB manifest)
  KERNEL_SEG:      MicroLinux x86_64 (5,243,904 bytes)
  EBPF_SEG:        SocketFilter (3,805 bytes)
  SSH users:       3 (Ed25519 signed, all verified)
  Witness chain:   6 entries (tamper-evident, all verified)
  Lineage:         base + v1 snapshot (parent hash matches)

Final file: 5.1 MB single .rvf — boots Linux, serves queries, runs Claude Code.

One file. Boots Linux. Runs SSH. Serves vectors. Installs Claude Code. Proves every step.

Launching with QEMU

# CLI launcher (auto-detects KVM or falls back to TCG)
rvf launch vectors.rvf

# Manual QEMU (if you want control)
rvf launch vectors.rvf --memory 512M --cpus 2 --port-forward 2222:22,8080:8080

# Extract kernel for external use
rvf inspect vectors.rvf --segment kernel --output kernel.bin
qemu-system-x86_64 -M microvm -kernel kernel.bin -append "console=ttyS0" -nographic

Building Your Own Bootable RVF

Step-by-step to create a self-booting .rvf from scratch:

# 1. Create a vector store
rvf create myservice.rvf --dimension 384

# 2. Ingest your data
rvf ingest myservice.rvf --input embeddings.json --format json

# 3. Build and embed a Linux kernel (uses Docker)
rvf embed-kernel myservice.rvf --arch x86_64

# 4. Optionally embed an eBPF filter
rvf embed-ebpf myservice.rvf --program filter.c

# 5. Verify the result
rvf inspect myservice.rvf
# MANIFEST_SEG, VEC_SEG, INDEX_SEG, KERNEL_SEG, EBPF_SEG, WITNESS_SEG

# 6. Boot it
rvf launch myservice.rvf

🔗 Library Adapters

RVF provides drop-in adapters for 6 libraries in the RuVector ecosystem:

Adapter	Purpose	Key Feature
`rvf-adapter-claude-flow`	AI agent memory	WITNESS_SEG audit trails
`rvf-adapter-agentdb`	Agent vector database	Progressive HNSW indexing
`rvf-adapter-ospipe`	Observation-State pipeline	META_SEG for state vectors
`rvf-adapter-agentic-flow`	Swarm coordination	Inter-agent memory sharing
`rvf-adapter-rvlite`	Lightweight embedded store	Minimal API, edge-friendly
`rvf-adapter-sona`	Neural architecture	Experience replay + trajectories

🤖 AGI Cognitive Container (ADR-036)

An AGI container packages a complete AI agent runtime into a single sealed .rvf file. Where the Self-Booting RVF section covers the compute tiers (WASM/eBPF/Kernel), the AGI container adds the intelligence layer on top: model identity, orchestration config, tool registries, evaluation harnesses, authority controls, and coherence gates.

AGI Cognitive Container (.rvf)
├── Identity ────── container UUID, build UUID, model ID hash
├── Orchestrator ── Claude Code / Claude Flow config (JSON)
├── Tools ──────── MCP tool adapter registry
├── Agent Prompts ─ role definitions per agent type
├── Eval Harness ── task suite + grading rules
├── Skills ──────── promoted skill library
├── Policy ──────── governance rules + authority config
├── Coherence ───── min score, contradiction rate, rollback ratio
├── Resources ───── time/token/cost budgets with clamping
├── Replay ──────── automation script for deterministic re-execution
├── Kernel Config ─ boot parameters, network, SSH
├── Domain Profile ─ coding / research / ops specialization
└── Signature ───── HMAC-SHA256 or Ed25519 tamper seal

Execution Modes

Mode	Purpose	Requires
Replay	Deterministic re-execution from witness logs	Witness chain
Verify	Validate container integrity and run eval harness	Kernel + world model, or WASM + vectors
Live	Full autonomous operation with tool use	Kernel + world model

Authority Levels

Authority is hierarchical — each level permits everything below it:

Level	Allows
`ReadOnly`	Read vectors, run queries
`WriteMemory`	+ Write to vector store, update index
`ExecuteTools`	+ Invoke MCP tools, run commands
`WriteExternal`	+ Network access, file I/O, push to git

Default authority per mode: Replay → ReadOnly, Verify → ExecuteTools, Live → WriteMemory.

Resource Budgets

Every container carries hard limits that are clamped to safety maximums:

Resource	Max	Default
Time	3,600 sec	300 sec
Tokens	1,000,000	100,000
Cost	$10.00	$1.00
Tool calls	500	100
External writes	50	10

Coherence Gates

Coherence thresholds halt execution when the agent's world model drifts:

min_coherence_score (0.0–1.0) — minimum quality gate
max_contradiction_rate (0.0–1.0) — tolerable contradiction frequency
max_rollback_ratio (0.0–1.0) — ratio of rolled-back decisions

Building a Container

use rvf_runtime::agi_container::AgiContainerBuilder;
use rvf_types::agi_container::*;

let (payload, header) = AgiContainerBuilder::new(container_id, build_id)
    .with_model_id("claude-opus-4-6")
    .with_orchestrator(b"{\"max_turns\":100}")
    .with_tool_registry(b"[{\"name\":\"search\",\"type\":\"rvf_query\"}]")
    .with_eval_tasks(b"[{\"id\":1,\"spec\":\"fix bug\"}]")
    .with_eval_graders(b"[{\"type\":\"test_pass\"}]")
    .with_authority_config(b"{\"level\":\"WriteMemory\"}")
    .with_coherence_config(b"{\"min_cut\":0.7,\"rollback\":true}")
    .with_project_instructions(b"# CLAUDE.md\nFix bugs, run tests.")
    .with_segments(ContainerSegments {
        kernel_present: true, manifest_present: true,
        world_model_present: true, ..Default::default()
    })
    .build_and_sign(signing_key)?;

// Parse and validate
let manifest = ParsedAgiManifest::parse(&payload)?;
assert_eq!(manifest.model_id_str(), Some("claude-opus-4-6"));
assert!(manifest.is_autonomous_capable());
assert!(header.is_signed());

See ADR-036 for the full specification.

📱 QR Cognitive Seed (ADR-034)

A QR Cognitive Seed (RVQS) encodes a portable intelligence capsule into a scannable QR code. It carries bootstrap hosts, layer hashes, and cryptographic signatures in a compact binary format.

use rvf_runtime::seed_crypto;

let hash = seed_crypto::seed_content_hash(data);       // 8-byte SHAKE-256
let sig  = seed_crypto::sign_seed(key, payload);        // 32-byte HMAC
let ok   = seed_crypto::verify_seed(key, payload, &sig);

Types: SeedHeader, HostEntry, LayerEntry (rvf-types), plus qr_encode for QR matrix generation (rvf-runtime).

🔒 Quality & Safety Net

The quality system tracks retrieval fidelity across progressive index layers and enforces graceful degradation when budgets are exceeded.

RetrievalQuality — Full / Partial / Degraded / Failed
ResponseQuality — per-query quality metadata with evidence
SafetyNetBudget — time, token, and cost budgets with automatic clamping
DegradationReport — structured fallback path and reason tracking

🛡️ Security Modules

Module	Crate	Purpose
`SecurityPolicy` / `HardeningFields`	rvf-types	Declarative per-file security configuration
`adversarial`	rvf-runtime	Input validation, dimension/size checks at write boundary
`dos`	rvf-runtime	Rate limiting, resource exhaustion guards
`KernelBinding`	rvf-types	Binds signed kernels to specific manifest hashes
`verify_witness_chain`	rvf-crypto	SHAKE-256 chain integrity verification

🧬 WASM Self-Bootstrapping (0x10)

WASM_SEG enables an RVF file to carry its own WASM interpreter, creating a three-layer bootstrap stack:

Raw bytes → WASM interpreter → microkernel → vector data

Types: WasmRole (Interpreter/Microkernel/Solver), WasmTarget (Browser/Node/Edge/Embedded), WasmHeader (rvf-types/wasm_bootstrap).

The rvf-solver-wasm crate implements a Thompson Sampling temporal solver as a no_std WASM module with dlmalloc, producing segment types TRANSFER_PRIOR (0x30), POLICY_KERNEL (0x31), and COST_CURVE (0x32).

46 Runnable Examples

Every example uses real RVF APIs end-to-end — no mocks, no stubs. Run any example with:

cd examples/rvf
cargo run --example <name>

Core Fundamentals (6)

#	Example	What It Demonstrates
1	`basic_store`	Create, insert 100 vectors, k-NN query, close, reopen, verify persistence
2	`progressive_index`	Build three-layer HNSW, measure recall@10 progression (0.70 → 0.95)
3	`quantization`	Scalar, product, and binary quantization with temperature tiering
4	`wire_format`	Raw 64-byte segment I/O, CRC32c hash validation, manifest tail-scan
5	`crypto_signing`	Ed25519 segment signing, SHAKE-256 witness chains, tamper detection
6	`filtered_search`	Metadata-filtered queries: Eq, Ne, Gt, Range, In, And, Or

Agentic AI (6)

#	Example	What It Demonstrates
7	`agent_memory`	Persistent agent memory across sessions with witness audit trail
8	`swarm_knowledge`	Multi-agent shared knowledge base, cross-agent semantic search
9	`reasoning_trace`	Chain-of-thought lineage: parent → child → grandchild derivation
10	`tool_cache`	Tool call result caching with TTL expiry, delete_by_filter, compaction
11	`agent_handoff`	Transfer agent state between instances via derive + clone
12	`experience_replay`	Reinforcement learning replay buffer with priority sampling

Production Patterns (5)

#	Example	What It Demonstrates
13	`semantic_search`	Document search engine with 4 filter workflows
14	`recommendation`	Collaborative filtering with genre and quality filters
15	`rag_pipeline`	5-step RAG: chunk, embed, retrieve, rerank, assemble context
16	`embedding_cache`	Zipf access patterns, 3-tier quantization, memory savings
17	`dedup_detector`	Near-duplicate detection, clustering, compaction

Industry Verticals (4)

#	Example	What It Demonstrates
18	`genomic_pipeline`	DNA k-mer search with `.rvdna` profile and lineage tracking
19	`financial_signals`	Market signals with Ed25519 signing and TEE attestation
20	`medical_imaging`	Radiology embedding search with `.rvvis` profile
21	`legal_discovery`	Legal document similarity with `.rvtext` profile

Cognitive Containers (5)

#	Example	What It Demonstrates
22	`self_booting`	Embed/extract unikernel (KERNEL_SEG), header verification
23	`ebpf_accelerator`	Embed/extract eBPF (EBPF_SEG), XDP program co-existence
24	`hyperbolic_taxonomy`	Hierarchy-aware Poincaré embeddings, depth-filtered search
25	`multimodal_fusion`	Cross-modal text + image search with modality filtering
26	`sealed_engine`	Capstone: vectors + kernel + eBPF + witness + lineage in one file

Runtime Targets (5)

#	Example	What It Demonstrates
27	`browser_wasm`	WASM-compatible API surface, raw wire segments, size budget
28	`edge_iot`	Constrained IoT device with binary quantization
29	`serverless_function`	Cold-start optimization, manifest tail-scan, progressive loading
30	`ruvllm_inference`	LLM KV cache + LoRA adapters + policy store via RVF
31	`postgres_bridge`	PostgreSQL export/import with lineage and witness audit

Network & Security (4)

#	Example	What It Demonstrates
32	`network_sync`	Peer-to-peer vector store synchronization
33	`tee_attestation`	TEE platform attestation, sealed keys, computation proof
34	`access_control`	Role-based vector access control with audit trails
35	`zero_knowledge`	Zero-knowledge proofs for privacy-preserving vector ops

Systems & Integration (5)

#	Example	What It Demonstrates
36	`ruvbot`	Autonomous agent with RVF memory, planning, and tool use
37	`posix_fileops`	POSIX raw I/O, atomic rename, advisory locking, segment access
38	`linux_microkernel`	20-package Linux distro with SSH keys and kernel embed
39	`mcp_in_rvf`	MCP server runtime + eBPF filter embedded in RVF
40	`network_interfaces`	6-chassis / 60-interface network telemetry with anomaly detection

COW Branching & Generation (3)

#	Example	What It Demonstrates
41	`cow_branching`	COW derive, cluster-level copy, write coalescing, parent inheritance
42	`membership_filter`	Include/exclude bitmap filters for shared HNSW traversal
43	`snapshot_freeze`	Generation snapshots, immutable freeze, generation tracking

Appliance & Generation (3)

#	Example	What It Demonstrates
44	`claude_code_appliance`	Bootable AI dev environment: real kernel + eBPF + vectors + witness + crypto
45	`live_boot_proof`	Docker-boot an `.rvf`, SSH in, verify segments are live and operational
46	`generate_all`	Batch generation of all example `.rvf` files

See the examples README for tutorials, usage patterns, and detailed walkthroughs.

Importing Data

From NumPy (.npy)

use rvf_import::numpy::{parse_npy_file, NpyConfig};
use std::path::Path;

let records = parse_npy_file(
    Path::new("embeddings.npy"),
    &NpyConfig { start_id: 0 },
)?;
// records: Vec<VectorRecord> with id, vector, metadata

From CSV

use rvf_import::csv_import::{parse_csv_file, CsvConfig};
use std::path::Path;

let config = CsvConfig {
    id_column: Some("id".into()),
    dimension: 128,
    ..Default::default()
};
let records = parse_csv_file(Path::new("vectors.csv"), &config)?;

From JSON

use rvf_import::json::{parse_json_file, JsonConfig};
use std::path::Path;

let config = JsonConfig {
    id_field: "id".into(),
    vector_field: "embedding".into(),
    ..Default::default()
};
let records = parse_json_file(Path::new("vectors.json"), &config)?;

CLI Import Tool

# Using rvf-import binary directly
cargo run --bin rvf-import -- \
    --input data.npy \
    --output vectors.rvf \
    --format npy \
    --dimension 384

# Or via the unified rvf CLI
rvf create vectors.rvf --dimension 384
rvf ingest vectors.rvf --input data.json --format json

HTTP Server API

Starting the Server

cargo run --bin rvf-server -- --path vectors.rvf --port 8080

REST Endpoints

Ingest vectors:

curl -X POST http://localhost:8080/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "vectors": [[0.1, 0.2, 0.3, 0.4], [0.5, 0.6, 0.7, 0.8]],
    "ids": [1, 2]
  }'

Query nearest neighbors:

curl -X POST http://localhost:8080/query \
  -H "Content-Type: application/json" \
  -d '{
    "vector": [0.1, 0.2, 0.3, 0.4],
    "k": 10
  }'

Delete vectors:

curl -X POST http://localhost:8080/delete \
  -H "Content-Type: application/json" \
  -d '{"ids": [1, 2]}'

Get status:

curl http://localhost:8080/status

Compact (reclaim space):

curl -X POST http://localhost:8080/compact

MCP Server (Model Context Protocol)

Overview

The @ruvector/rvf-mcp-server package exposes RVF stores to AI agents via the Model Context Protocol. Supports stdio and SSE transports.

Starting the MCP Server

# stdio transport (for Claude Code, Cursor, etc.)
npx @ruvector/rvf-mcp-server --transport stdio

# SSE transport (for web clients)
npx @ruvector/rvf-mcp-server --transport sse --port 3100

Claude Code Integration

Add to your Claude Code MCP config:

{
  "mcpServers": {
    "rvf": {
      "command": "npx",
      "args": ["@ruvector/rvf-mcp-server", "--transport", "stdio"]
    }
  }
}

Available MCP Tools

Tool	Description
`rvf_create_store`	Create a new RVF vector store
`rvf_open_store`	Open an existing store (read-write or read-only)
`rvf_close_store`	Close a store and release the writer lock
`rvf_ingest`	Insert vectors with optional metadata
`rvf_query`	k-NN similarity search with metadata filters
`rvf_delete`	Delete vectors by ID
`rvf_delete_filter`	Delete vectors matching a metadata filter
`rvf_compact`	Compact store to reclaim dead space
`rvf_status`	Get store status (dimensions, vector count, etc.)
`rvf_list_stores`	List all open stores

MCP Resources

URI	Description
`rvf://stores`	JSON listing of all open stores and their status

MCP Prompts

Prompt	Description
`rvf-search`	Natural language similarity search
`rvf-ingest`	Data ingestion with auto-embedding

Confidential Core Attestation

Overview

RVF can record hardware TEE (Trusted Execution Environment) attestation quotes alongside vector data. This proves that vector operations occurred inside a verified secure enclave.

Supported Platforms

Platform	Enum Value	Quote Format
Intel SGX	`TeePlatform::Sgx` (0)	DCAP quote
AMD SEV-SNP	`TeePlatform::SevSnp` (1)	VCEK attestation report
Intel TDX	`TeePlatform::Tdx` (2)	TD quote
ARM CCA	`TeePlatform::ArmCca` (3)	CCA token
Software (testing)	`TeePlatform::SoftwareTee` (0xFE)	Synthetic

Attestation Types

Type	Witness Code	Purpose
Platform Attestation	`0x05`	TEE identity and measurement verification
Key Binding	`0x06`	Encryption keys sealed to TEE measurement
Computation Proof	`0x07`	Proof that operations ran inside the enclave
Data Provenance	`0x08`	Chain of custody: model to TEE to RVF

Recording an Attestation

use rvf_crypto::attestation::*;
use rvf_types::attestation::*;

// Build attestation header
let mut header = AttestationHeader::new(
    TeePlatform::SoftwareTee as u8,
    AttestationWitnessType::PlatformAttestation as u8,
);
header.measurement = shake256_256(b"my-enclave-code");
header.nonce = [0x42; 16];
header.quote_length = 64;
header.timestamp_ns = 1_700_000_000_000_000_000;

// Encode the full record
let report_data = b"model=all-MiniLM-L6-v2";
let quote = vec![0xAA; 64]; // platform-specific quote bytes
let record = encode_attestation_record(&header, report_data, &quote);

// Create a witness chain entry binding this attestation
let entry = attestation_witness_entry(
    &record,
    header.timestamp_ns,
    AttestationWitnessType::PlatformAttestation,
);
// entry.action_hash == SHAKE-256-256(record)

Key Binding to TEE

use rvf_crypto::attestation::*;
use rvf_types::attestation::*;

let key = TeeBoundKeyRecord {
    key_type: KEY_TYPE_TEE_BOUND,
    algorithm: 0, // Ed25519
    sealed_key_length: 32,
    key_id: shake256_128(b"my-public-key"),
    measurement: shake256_256(b"my-enclave"),
    platform: TeePlatform::Sgx as u8,
    reserved: [0; 3],
    valid_from: 0,
    valid_until: 0, // no expiry
    sealed_key: vec![0xBB; 32],
};

// Verify the key is accessible in the current environment
verify_key_binding(
    &key,
    TeePlatform::Sgx,
    &shake256_256(b"my-enclave"),
    current_time_ns,
)?; // Ok(()) if platform + measurement match

Attested Segment Flag

Any segment produced inside a TEE can set the ATTESTED flag for fast scanning:

use rvf_types::SegmentFlags;

let flags = SegmentFlags::empty()
    .with(SegmentFlags::SIGNED)
    .with(SegmentFlags::ATTESTED);
// bit 2 (SIGNED) + bit 10 (ATTESTED) = 0x0404

Progressive Indexing

How It Works

Traditional vector databases make you wait for the full index before you can query. RVF uses a three-layer progressive model:

Layer A (Coarse Routing)

Contains entry points and partition centroids
Loads in microseconds from the manifest
Provides approximate results immediately (recall >= 0.70)

Layer B (Hot Region)

Contains adjacency lists for frequently-accessed vectors
Loaded based on temperature heuristics
Improves recall to >= 0.85

Layer C (Full Graph)

Complete HNSW adjacency for all vectors
Full recall >= 0.95
Loaded in the background while queries are already being served

Using Progressive Indexing

use rvf_index::progressive::ProgressiveIndex;
use rvf_index::layers::IndexLayer;

let mut adapter = RvfIndexAdapter::new(IndexAdapterConfig::default());
adapter.build(vectors, ids);

// Start with Layer A only (fastest)
adapter.load_progressive(&[IndexLayer::A]);
let fast_results = adapter.search(&query, 10);

// Add layers as they load
adapter.load_progressive(&[IndexLayer::A, IndexLayer::B, IndexLayer::C]);
let precise_results = adapter.search(&query, 10);

Quantization Tiers

Temperature-Based Quantization

RVF automatically assigns vectors to quantization tiers based on access frequency:

Tier	Temperature	Method	Memory	Recall
Hot	Frequently accessed	fp16 / scalar	2x per dim	~0.999
Warm	Moderate access	Product quantization	8-16x compression	~0.95
Cold	Rarely accessed	Binary quantization	32x compression	~0.80

How It Works

A Count-Min Sketch tracks access frequency per vector
Vectors are assigned to tiers based on configurable thresholds
Hot vectors stay at full precision for fast, accurate retrieval
Cold vectors are heavily compressed but still searchable
Tier assignment is stored in SKETCH_SEG and updated periodically

Using Quantization

use rvf_quant::scalar::ScalarQuantizer;
use rvf_quant::product::ProductQuantizer;
use rvf_quant::binary::{encode_binary, hamming_distance};
use rvf_quant::traits::Quantizer;

// Scalar quantization (Hot tier)
let sq = ScalarQuantizer::train(&vectors);
let encoded = sq.encode(&vector);
let decoded = sq.decode(&encoded);

// Product quantization (Warm tier)
let pq = ProductQuantizer::train(&vectors, 8); // 8 subquantizers
let code = pq.encode(&vector);

// Binary quantization (Cold tier)
let bits = encode_binary(&vector);
let dist = hamming_distance(&bits_a, &bits_b);

Wire Format Specification

Segment Header (64 bytes, `repr(C)`)

Offset	Size	Field	Description
0x00	4	`magic`	`0x52564653` ("RVFS")
0x04	1	`version`	Format version (currently 1)
0x05	1	`seg_type`	Segment type (see enum below)
0x06	2	`flags`	Bitfield (COMPRESSED, ENCRYPTED, SIGNED, SEALED, ATTESTED, ...)
0x08	8	`segment_id`	Monotonically increasing ID
0x10	8	`payload_length`	Byte length of payload
0x18	8	`timestamp_ns`	Nanosecond UNIX timestamp
0x20	1	`checksum_algo`	0=CRC32C, 1=XXH3-128, 2=SHAKE-256
0x21	1	`compression`	0=none, 1=LZ4, 2=ZSTD
0x22	2	`reserved_0`	Must be zero
0x24	4	`reserved_1`	Must be zero
0x28	16	`content_hash`	First 128 bits of payload hash
0x38	4	`uncompressed_len`	Original size before compression
0x3C	4	`alignment_pad`	Padding to 64-byte boundary

Segment Types

Code	Name	Description
`0x01`	VEC	Raw vector embeddings
`0x02`	INDEX	HNSW adjacency and routing
`0x03`	OVERLAY	Graph overlay deltas
`0x04`	JOURNAL	Metadata mutations, deletions
`0x05`	MANIFEST	Segment directory, epoch state
`0x06`	QUANT	Quantization dictionaries
`0x07`	META	Key-value metadata
`0x08`	HOT	Temperature-promoted data
`0x09`	SKETCH	Access counter sketches
`0x0A`	WITNESS	Audit trails, attestation proofs
`0x0B`	PROFILE	Domain profile declarations
`0x0C`	CRYPTO	Key material, signature chains
`0x0D`	META_IDX	Metadata inverted indexes
`0x0E`	KERNEL	Compressed unikernel image (self-booting)
`0x0F`	EBPF	eBPF program for kernel-level acceleration
`0x10`	WASM	WASM microkernel / self-bootstrapping bytecode
`0x20`	COW_MAP	Cluster ownership map (local vs parent)
`0x21`	REFCOUNT	Cluster reference counts (rebuildable)
`0x22`	MEMBERSHIP	Vector visibility filter for branches
`0x23`	DELTA	Sparse delta patches (LoRA overlays)
`0x30`	TRANSFER_PRIOR	Transfer learning prior distributions
`0x31`	POLICY_KERNEL	Thompson Sampling policy kernels
`0x32`	COST_CURVE	Cost/reward curves for solver

Segment Flags

Bit	Name	Description
0	COMPRESSED	Payload is compressed
1	ENCRYPTED	Payload is encrypted
2	SIGNED	Signature footer follows payload
3	SEALED	Immutable (compaction output)
4	PARTIAL	Streaming/partial write
5	TOMBSTONE	Logical deletion
6	HOT	Temperature-promoted
7	OVERLAY	Contains delta data
8	SNAPSHOT	Full snapshot
9	CHECKPOINT	Safe rollback point
10	ATTESTED	Produced inside attested TEE
11	HAS_LINEAGE	File carries FileIdentity lineage data

Crash Safety

RVF uses a two-fsync protocol:

Write data segment + payload, then fsync
Write MANIFEST_SEG with updated state, then fsync

If the process crashes between fsyncs, the incomplete segment is ignored on recovery (no valid manifest references it). No write-ahead log is needed.

Signature Footer

When SIGNED flag is set, a signature footer follows the payload:

Offset	Size	Field
0x00	2	`sig_algo` (0=Ed25519, 1=ML-DSA-65, 2=SLH-DSA-128s)
0x02	2	`sig_length`
0x04	var	`signature` (64 to 7,856 bytes)
var	4	`footer_length` (for backward scan)

Witness Chains & Audit Trails

How Witness Chains Work

A witness chain is a tamper-evident linked list of events, stored in WITNESS_SEG payloads. Each entry is 73 bytes:

Field	Size	Description
`prev_hash`	32	SHAKE-256 of previous entry (zero for genesis)
`action_hash`	32	SHAKE-256 of the action being witnessed
`timestamp_ns`	8	Nanosecond timestamp
`witness_type`	1	Event type discriminator

Changing any byte in any entry causes all subsequent prev_hash values to fail verification. This provides tamper-evidence without a blockchain.

Witness Types

Code	Name	Usage
`0x01`	PROVENANCE	Data origin tracking
`0x02`	COMPUTATION	Operation recording
`0x03`	SEARCH	Query audit logging
`0x04`	DELETION	Deletion audit logging
`0x05`	PLATFORM_ATTESTATION	TEE attestation quote
`0x06`	KEY_BINDING	Key sealed to TEE
`0x07`	COMPUTATION_PROOF	Verified enclave computation
`0x08`	DATA_PROVENANCE	Model-to-TEE-to-RVF chain
`0x09`	DERIVATION	File lineage derivation event
`0x0A`	LINEAGE_MERGE	Multi-parent lineage merge
`0x0B`	LINEAGE_SNAPSHOT	Lineage snapshot checkpoint
`0x0C`	LINEAGE_TRANSFORM	Lineage transform operation
`0x0D`	LINEAGE_VERIFY	Lineage verification event
`0x0E`	CLUSTER_COW	COW cluster copy event
`0x0F`	CLUSTER_DELTA	Delta patch applied to cluster

Creating a Witness Chain

use rvf_crypto::{create_witness_chain, verify_witness_chain, WitnessEntry};
use rvf_crypto::shake256_256;

let entries = vec![
    WitnessEntry {
        prev_hash: [0; 32],
        action_hash: shake256_256(b"inserted 1000 vectors"),
        timestamp_ns: 1_700_000_000_000_000_000,
        witness_type: 0x01,
    },
    WitnessEntry {
        prev_hash: [0; 32], // set by create_witness_chain
        action_hash: shake256_256(b"queried top-10"),
        timestamp_ns: 1_700_000_001_000_000_000,
        witness_type: 0x03,
    },
];

let chain_bytes = create_witness_chain(&entries);
let verified = verify_witness_chain(&chain_bytes)?;
assert_eq!(verified.len(), 2);

Building from Source

Prerequisites

Rust 1.87+ (rustup update stable)
For WASM: rustup target add wasm32-unknown-unknown
For Node.js bindings: Node.js 18+ and npm

Build All Crates

cd crates/rvf
cargo build --workspace

Run All Tests

cargo test --workspace

Run Clippy

cargo clippy --all-targets --workspace --exclude rvf-wasm

Build WASM Microkernel

cargo build --target wasm32-unknown-unknown -p rvf-wasm --release
ls target/wasm32-unknown-unknown/release/rvf_wasm.wasm

Build CLI

cargo build -p rvf-cli
./target/debug/rvf --help

Build Node.js Bindings

cd rvf-node
npm install
npm run build

Run Benchmarks

cargo bench --bench rvf_benchmarks

Domain Profiles

What Are Profiles?

Domain profiles optimize RVF behavior for specific data types:

Profile	Code	Optimized For
Generic	`0x00`	General-purpose vectors
RVDNA	`0x01`	Genomic sequence embeddings
RVText	`0x02`	Language model embeddings (default for agentdb)
RVGraph	`0x03`	Graph/network node embeddings
RVVision	`0x04`	Image/vision model embeddings

Hardware Profiles

Profile	Level	Description
Generic	0	Minimal features, fits anywhere
Core	1	Moderate resources, good defaults
Hot	2	Memory-rich, high-performance
Full	3	All features enabled

File Format Reference

File Extension

.rvf — Standard RuVector Format file
.rvf.cold.N — Cold shard N (multi-file mode)
.rvf.idx.N — Index shard N (multi-file mode)

MIME Type

application/x-ruvector-format

Magic Number

0x52564653 (ASCII: "RVFS")

Byte Order

All multi-byte integers are little-endian.

Alignment

All segments are 64-byte aligned (cache-line friendly).

Root Manifest

The root manifest (Level 0) occupies the last 4,096 bytes of the most recent MANIFEST_SEG. This enables instant location via seek(EOF - scan) and provides:

Segment directory (offsets to all segments)
Hotset pointers (entry points, top layer, centroids, quant dicts)
Epoch counter
Vector count and dimension
Profile identifiers

🌿 RVCOW: Vector-Native Copy-on-Write Branching

RVF supports copy-on-write branching at cluster granularity (ADR-031). Instead of copying an entire file to create a variant, a derived file stores only the clusters that changed. This enables Git-like branching for vector databases.

COW Branching

A COW child inherits all vector data from its parent by reference. Writes only allocate local clusters as needed (one slab copy per modified cluster). A 1M-vector parent (~512 MB) with 100 modified vectors produces a child of ~10 clusters (~2.5 MB).

use rvf_runtime::RvfStore;

// Create parent with vectors
let parent = RvfStore::create(Path::new("parent.rvf"), options)?;
// ... ingest vectors ...

// Derive a COW child — inherits all data, stores only changes
let child = parent.branch(Path::new("child.rvf"))?;

// COW statistics
if let Some(stats) = child.cow_stats() {
    println!("Clusters: {} total, {} local", stats.cluster_count, stats.local_cluster_count);
}

Membership Filters

Branches share the parent's HNSW index. A membership filter (dense bitmap) controls which vectors are visible per branch. Excluded nodes still serve as routing waypoints during graph traversal but are never returned in results.

Include mode (default): vector visible iff filter.contains(id). Empty filter = empty view (fail-safe).
Exclude mode: vector visible iff !filter.contains(id). Empty filter = full view.

use rvf_runtime::membership::MembershipFilter;

let mut filter = MembershipFilter::new_include(1_000_000);
filter.add(42);        // vector 42 is now visible
filter.contains(42);   // true
filter.contains(100);  // false

Snapshot Freeze

Freeze creates an immutable snapshot of the current generation. Further writes require deriving a new branch. Freeze is a metadata-only operation (no data copy).

let mut branch = parent.branch(Path::new("snapshot.rvf"))?;
branch.freeze()?;

// Writes now fail:
assert!(branch.ingest_batch(&[&vec], &[1], None).is_err());

// Continue on a new branch:
let next = parent.branch(Path::new("next.rvf"))?;

Kernel Binding (128 bytes)

The KernelBinding footer (128 bytes, padded) cryptographically ties a KERNEL_SEG to its manifest. This prevents segment-swap attacks where a signed kernel from one file is embedded into a different file.

use rvf_types::kernel_binding::KernelBinding;

let binding = KernelBinding {
    manifest_root_hash: manifest_hash,   // SHAKE-256-256 of Level0Root
    policy_hash: policy_hash,            // SHAKE-256-256 of security policy
    binding_version: 1,
    ..Default::default()
};

store.embed_kernel_with_binding(arch, ktype, flags, &image, port, cmdline, &binding)?;

New Segment Types

Code	Name	Size	Purpose
`0x20`	COW_MAP	64B header	Cluster ownership map (local vs parent)
`0x21`	REFCOUNT	32B header	Cluster reference counts (rebuildable)
`0x22`	MEMBERSHIP	96B header	Vector visibility filter for branches
`0x23`	DELTA	64B header	Sparse delta patches between clusters

New CLI Commands

rvf launch <file>                        # Boot RVF in QEMU microVM
rvf embed-kernel <file> [--arch x86_64]  # Embed kernel image
rvf embed-ebpf <file> --program <src.c>  # Compile and embed eBPF
rvf filter <file> --include <id-list>    # Create membership filter
rvf freeze <file>                        # Snapshot-freeze current state
rvf verify-witness <file>                # Verify witness chain
rvf verify-attestation <file>            # Verify KernelBinding + attestation
rvf rebuild-refcounts <file>             # Recompute refcounts from COW map

For the full specification, see ADR-031: RVCOW Branching and Real Cognitive Containers.

🔬 Proof of Operations

Verified end-to-end workflows that demonstrate real capabilities:

CLI: Full Lifecycle

# Create a store, ingest 100 vectors, query, derive a child
rvf create demo.rvf --dimension 128
rvf ingest demo.rvf --input data.json --format json
rvf query demo.rvf --vector "0.1,0.2,0.3,..." --k 5
rvf derive demo.rvf child.rvf --type filter
rvf inspect demo.rvf
# MANIFEST_SEG (4 KB), VEC_SEG (51 KB), INDEX_SEG (12 KB)

Self-Booting: Vectors + Kernel in One File

cargo run --example self_booting
# Output:
#   Ingested 50 vectors (128 dims)
#   Pre-kernel query: top-5 results OK (nearest ID=25)
#   Kernel: 4,640 bytes embedded (x86_64, Hermit)
#   Extracted kernel: arch=X86_64, api_port=8080
#   Witness chain: 5 entries, all verified ✓
#   File size: 31 KB — data + kernel + witness in one file

Linux Microkernel: Bootable OS Image

cargo run --example linux_microkernel
# Output:
#   20 packages installed as vector embeddings
#   Kernel: Linux x86_64 (4,640 bytes)
#   SSH: Ed25519 keys signed and verified ✓
#   Witness chain: 22 entries, all verified ✓
#   Package search: "build tool" → found gcc, make, cmake
#   File size: 14 KB — bootable system image

Claude Code Appliance: Sealed AI Dev Environment

cargo run --example claude_code_appliance
# Output:
#   20 dev packages (rust, node, python, docker, ...)
#   Kernel: Linux x86_64 with SSH on port 2222
#   eBPF: XDP distance program embedded
#   Witness chain: 6 entries, all verified ✓
#   Ed25519 signed, tamper-evident
#   File size: 17 KB — sealed cognitive container

Test Suite: 1,156 Passing

cargo test --workspace
# agi_e2e .................. 12 passed
# adr033_integration ....... 34 passed
# qr_seed_e2e .............. 11 passed
# witness_e2e .............. 10 passed
# attestation .............. 6 passed
# crypto ................... 10 passed
# computational_container .. 8 passed
# cow_branching ............ 8 passed
# cross_platform ........... 6 passed
# lineage .................. 4 passed
# smoke .................... 4 passed
# + unit tests across all crates
# Total: 1,156 tests passed

Generate All 46 Example Files

cd examples/rvf && cargo run --example generate_all
ls output/  # 45 .rvf files (~11 MB total)
rvf inspect output/sealed_engine.rvf
rvf inspect output/linux_microkernel.rvf

🤝 Contributing

git clone https://github.com/ruvnet/ruvector
cd ruvector/crates/rvf
cargo test --workspace

All contributions must pass cargo clippy --all-targets with zero warnings and maintain the existing test count (currently 1,156+).

Architecture Decision Records

ADR	Title
ADR-030	RVF Cognitive Container (Kernel, eBPF, WASM tiers)
ADR-031	RVCOW Branching & Real Cognitive Containers
ADR-033	Progressive Indexing Hardening
ADR-034	QR Cognitive Seed (RVQS)
ADR-035	Capability Report
ADR-036	AGI Cognitive Container
ADR-037	Publishable RVF Acceptance Tests
ADR-038	npx ruvector rvlite Witness Integration
ADR-039	RVF Solver WASM AGI Integration

📄 License

Dual-licensed under MIT or Apache-2.0 at your option.

_{Built with Rust. Not a database — a portable cognitive runtime.}

FilesExpand file tree

rvf

Directory actions

More options