RVF Examples — Learn by Running
Hands-on examples for the unified agentic AI format — store it, send it, run it
Quick Start • Examples • Features • Performance • Comparison
RVF (RuVector Format) is the unified agentic AI file format. One .rvf file does three jobs:
- Store — vectors, indexes, metadata, and cryptographic proofs live in one file. No database server required.
- Transfer — the same file streams over a network. Query, insert, and delete operations work over the wire with zero conversion.
- Run — pack model weights, graph neural networks, WASM code, or even a bootable OS kernel into the file. Now it's not just data — it's a self-contained intelligence unit you can deploy anywhere.
Today, an AI agent's state is scattered: embeddings in one database, model weights in another, graph structure in a third, config in a fourth. Nothing talks to anything else. Moving between tools means re-indexing from scratch. There's no standard way to prove any of it was computed securely — and no way to hand an agent its complete knowledge as a single portable artifact.
RVF solves this. It gives agentic AI a universal substrate — one file that works everywhere:
| What it does | Where it runs | What you get |
|---|---|---|
| Stores vectors | Server (HNSW index) | Sub-millisecond search over millions of vectors |
| Stores vectors | Browser (5.5 KB WASM) | Same file, no backend needed |
| Stores vectors | Edge / IoT / mobile | Lightweight API, tiny footprint |
| Transfers data | Over the network | Batched query/ingest/delete via TCP |
| Runs code | Inside a TEE | Cryptographic proof of secure computation |
| Runs code | Bare metal / VM | File boots itself as a microservice |
| Runs code | Linux kernel (eBPF) | Sub-microsecond hot-path acceleration |
| Runs intelligence | Anywhere | Model + data + graph + trust chain in one file |
- Crash-safe — no write-ahead log needed; if power dies mid-write, the file stays consistent
- Self-describing — the schema is in the file; no external catalog required
- Progressive loading — start answering queries before the full index is loaded
- Domain profiles —
.rvdnafor genomics,.rvtextfor language,.rvgraphfor networks,.rvvisfor vision — same format underneath - Lineage tracking — every derived file records its parent's hash, like DNA inheritance
- Tamper-evident — witness chains and post-quantum signatures prove nothing was altered
These examples walk you through every major feature, from the simplest "insert and query" to wire format inspection, witness chains, and sealed cognitive engines.
| Use case | What goes in the file | Result |
|---|---|---|
| Semantic search | Vectors + HNSW index | Single-file vector database, no server needed |
| Agent memory | Vectors + metadata + witness chain | Portable, auditable AI agent knowledge base |
| Sealed LoRA distribution | Base embeddings + OVERLAY_SEG adapter deltas | Ship fine-tuned models as one versioned file |
| Portable graph intelligence | Node embeddings + GRAPH_SEG adjacency | GNN state that transfers between systems |
| Self-booting AI service | Vectors + index + KERNEL_SEG unikernel | File boots as a microservice on bare metal or Firecracker |
| Kernel-accelerated cache | Hot vectors + EBPF_SEG XDP program | Sub-microsecond lookups in the Linux kernel data path |
| Confidential AI | Any of the above + TEE attestation | Cryptographic proof everything ran inside a secure enclave |
| Genomic analysis | DNA k-mer embeddings + variant tensors | .rvdna file with lineage tracking across analysis pipeline |
| Firmware-style AI versioning | Full cognitive state + lineage chain | Parent → child derivation with hash verification, like DNA |
# Clone the repo
git clone https://github.com/ruvnet/ruvector
cd ruvector/examples/rvf
# Run your first example
cargo run --example basic_storeThat's it. You'll see a store created, 100 vectors inserted, nearest neighbors found, and persistence verified — all in under a second.
You can also work with RVF stores from the command line without writing any Rust:
# Build the CLI
cd crates/rvf && cargo build -p rvf-cli
# Create a store, ingest data, and query
rvf create vectors.rvf --dimension 384
rvf ingest vectors.rvf --input data.json --format json
rvf query vectors.rvf --vector "0.1,0.2,..." --k 10
rvf status vectors.rvf
rvf inspect vectors.rvf
rvf compact vectors.rvf
# Derive a child store with lineage tracking
rvf derive parent.rvf child.rvf --type filter
# All commands support --json for machine-readable output
rvf status vectors.rvf --jsonRun All 40 Examples
Core (6):
cargo run --example basic_store # Store lifecycle + k-NN
cargo run --example progressive_index # Three-layer HNSW recall
cargo run --example quantization # Scalar / product / binary
cargo run --example wire_format # Raw segment I/O
cargo run --example crypto_signing # Ed25519 + witness chains
cargo run --example filtered_search # Metadata-filtered queriesAgentic AI (6):
cargo run --example agent_memory # Persistent agent memory + witness audit
cargo run --example swarm_knowledge # Multi-agent shared knowledge base
cargo run --example reasoning_trace # Chain-of-thought with lineage derivation
cargo run --example tool_cache # Tool call result cache with TTL
cargo run --example agent_handoff # Transfer agent state between instances
cargo run --example experience_replay # RL experience replay bufferPractical Production (5):
cargo run --example semantic_search # Document search with metadata filters
cargo run --example recommendation # Item recommendations (collaborative filtering)
cargo run --example rag_pipeline # Retrieval-augmented generation pipeline
cargo run --example embedding_cache # LRU cache with temperature tiering
cargo run --example dedup_detector # Near-duplicate detection + compactionVertical Domains (4):
cargo run --example genomic_pipeline # DNA k-mer search (.rvdna profile)
cargo run --example financial_signals # Market signals with TEE attestation
cargo run --example medical_imaging # Radiology search (.rvvis profile)
cargo run --example legal_discovery # Legal doc similarity (.rvtext profile)Exotic Capabilities (5):
cargo run --example self_booting # RVF with embedded unikernel
cargo run --example ebpf_accelerator # eBPF hot-path acceleration
cargo run --example hyperbolic_taxonomy # Hierarchy-aware search
cargo run --example multimodal_fusion # Cross-modal text + image search
cargo run --example sealed_engine # Full cognitive engine (capstone)Runtime Targets (4) + Postgres (1):
cargo run --example browser_wasm # Browser-side WASM vector search
cargo run --example edge_iot # IoT device with binary quantization
cargo run --example serverless_function # Cold-start optimized for Lambda
cargo run --example ruvllm_inference # LLM KV cache + LoRA via RVF
cargo run --example postgres_bridge # PostgreSQL ↔ RVF export/importNetwork & Security (4):
cargo run --example network_sync # Peer-to-peer vector store sync
cargo run --example tee_attestation # TEE attestation + sealed keys
cargo run --example access_control # Role-based vector access control
cargo run --example zero_knowledge # Zero-knowledge proof integrationAutonomous Agent (1):
cargo run --example ruvbot # Autonomous RVF-powered agent botPOSIX & Systems (3):
cargo run --example posix_fileops # POSIX file operations with RVF
cargo run --example linux_microkernel # Linux microkernel distribution
cargo run --example mcp_in_rvf # MCP server embedded in RVFNetwork Operations (1):
cargo run --example network_interfaces # Network OS telemetry (60 interfaces)- Rust 1.87+ — install via rustup
- No other dependencies needed — everything builds from source
- All examples use deterministic pseudo-random data, so results are reproducible across runs
Examples at a Glance (40 examples)
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 1 | basic_store | Beginner | Create, insert, query, persist, reopen |
| 2 | progressive_index | Intermediate | Three-layer HNSW, recall measurement |
| 3 | quantization | Intermediate | Scalar/product/binary quantization, tiering |
| 4 | wire_format | Advanced | Raw segment I/O, hash validation, tail-scan |
| 5 | crypto_signing | Advanced | Ed25519 signing, witness chains, tamper detection |
| 6 | filtered_search | Intermediate | Metadata filters: Eq, Range, AND/OR/IN |
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 7 | agent_memory | Intermediate | Persistent agent memory, session recall, witness audit |
| 8 | swarm_knowledge | Intermediate | Multi-agent shared knowledge, cross-agent search |
| 9 | reasoning_trace | Advanced | Chain-of-thought lineage (parent → child → grandchild) |
| 10 | tool_cache | Intermediate | Tool call caching, TTL, delete_by_filter, compaction |
| 11 | agent_handoff | Advanced | Transfer agent state, derive clone, lineage verification |
| 12 | experience_replay | Intermediate | RL replay buffer, priority sampling, tiering |
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 13 | semantic_search | Beginner | Document search engine, 4 filter workflows |
| 14 | recommendation | Intermediate | Collaborative filtering, genre/quality filters |
| 15 | rag_pipeline | Advanced | 5-step RAG: chunk, embed, retrieve, rerank, assemble |
| 16 | embedding_cache | Advanced | Zipf access patterns, 3-tier quantization, memory savings |
| 17 | dedup_detector | Intermediate | Near-duplicate detection, clustering, compaction |
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 18 | genomic_pipeline | Advanced | DNA k-mer search, .rvdna profile, lineage |
| 19 | financial_signals | Advanced | Market signals, Ed25519 signing, attestation |
| 20 | medical_imaging | Intermediate | Radiology search, .rvvis profile, audit trail |
| 21 | legal_discovery | Intermediate | Legal similarity, .rvtext profile, discovery audit |
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 22 | self_booting | Advanced | Embed/extract unikernel, kernel header verification |
| 23 | ebpf_accelerator | Advanced | Embed/extract eBPF, XDP program, co-existence |
| 24 | hyperbolic_taxonomy | Intermediate | Hierarchy-aware embeddings, depth-filtered search |
| 25 | multimodal_fusion | Intermediate | Cross-modal text+image search, modality filtering |
| 26 | sealed_engine | Advanced | Capstone: vectors + kernel + eBPF + witness + lineage |
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 27 | browser_wasm | Intermediate | WASM-compatible API, raw wire segments, size targets |
| 28 | edge_iot | Beginner | Constrained device, binary quantization, memory budget |
| 29 | serverless_function | Intermediate | Cold start, manifest tail-scan, progressive loading |
| 30 | ruvllm_inference | Advanced | KV cache + LoRA adapters + policy store via RVF |
| 31 | postgres_bridge | Intermediate | PG export/import, offline query, lineage, witness audit |
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 32 | network_sync | Advanced | Peer-to-peer sync, vector exchange, conflict resolution |
| 33 | tee_attestation | Advanced | TEE platform attestation, sealed keys, computation proof |
| 34 | access_control | Intermediate | Role-based access, permission checks, audit trails |
| 35 | zero_knowledge | Advanced | ZK proofs for vector operations, privacy-preserving search |
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 36 | ruvbot | Advanced | Autonomous agent with RVF memory, planning, tool use |
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 37 | posix_fileops | Intermediate | Raw I/O, atomic rename, locking, segment random access |
| 38 | linux_microkernel | Advanced | Package management, SSH keys, kernel embed, lineage updates |
| 39 | mcp_in_rvf | Advanced | MCP server runtime embedded in RVF, eBPF filter, tools |
| # | Example | Difficulty | What You'll Learn |
|---|---|---|---|
| 40 | network_interfaces | Intermediate | Multi-chassis telemetry, anomaly detection, filtered queries |
Features Covered
| Feature | Example | Description |
|---|---|---|
| k-NN Search | basic_store | Find nearest neighbors by L2 or cosine distance |
| Persistence | basic_store | Close a store, reopen it, verify results match |
| Metadata Filters | filtered_search | Eq, Ne, Gt, Lt, Range, In, And, Or expressions |
| Combined Filters | filtered_search | Multi-condition queries (category + score range) |
| Feature | Example | Description |
|---|---|---|
| Progressive Indexing | progressive_index | Three-tier HNSW: Layer A (fast), B (better), C (best) |
| Recall Measurement | progressive_index | Compare approximate results against brute-force ground truth |
| Feature | Example | Description |
|---|---|---|
| Scalar Quantization | quantization | fp32 → u8 (4x compression, Hot tier) |
| Product Quantization | quantization | fp32 → PQ codes (8-32x compression, Warm tier) |
| Binary Quantization | quantization | fp32 → 1-bit (32x compression, Cold tier) |
| Temperature Tiering | quantization | Count-Min Sketch access tracking + automatic tier assignment |
| Feature | Example | Description |
|---|---|---|
| Segment I/O | wire_format | Write/read 64-byte-aligned segments with type/flags/hash |
| Hash Validation | wire_format | CRC32c / XXH3 integrity checks on every segment |
| Tail-Scan | wire_format | Find latest manifest by scanning backward from EOF |
| Feature | Example | Description |
|---|---|---|
| Ed25519 Signing | crypto_signing | Sign segments, verify signatures, detect tampering |
| Witness Chains | crypto_signing | SHAKE-256 linked audit trails (73-byte entries) |
| Tamper Detection | crypto_signing | Any byte flip breaks chain verification |
| Feature | Example | Description |
|---|---|---|
| DNA-Style Lineage | (API) | Every derived file records its parent's hash and derivation type |
| Domain Profiles | (API) | .rvdna, .rvtext, .rvgraph, .rvvis — same format, domain-specific hints |
| Computational Container | claude_code_appliance |
Embed a WASM microkernel, eBPF program, or bootable unikernel |
| Self-Booting Appliance | claude_code_appliance |
5.1 MB .rvf — boots Linux, serves queries, runs Claude Code |
| Import (JSON/CSV/NumPy) | (API) | Load embeddings from .json, .csv, or .npy files via rvf-import or rvf ingest CLI |
| Unified CLI | rvf |
9 subcommands: create, ingest, query, delete, status, inspect, compact, derive, serve |
| Compaction | (API) | Garbage-collect tombstoned vectors and reclaim disk space |
| Batch Delete | (API) | Delete vectors by ID with tombstone markers |
The claude_code_appliance example builds a complete self-booting AI development environment as a single .rvf file. It uses real infrastructure — a Docker-built Linux kernel, Ed25519 SSH keys, a BPF C socket filter, and a cryptographic witness chain.
cd examples/rvf
cargo run --example claude_code_applianceWhat it produces (5.1 MB file):
claude_code_appliance.rvf
├── KERNEL_SEG Linux 6.8.12 bzImage (5.2 MB, x86_64)
├── EBPF_SEG Socket filter — allows ports 2222, 8080 only
├── VEC_SEG 20 package embeddings (128-dim)
├── INDEX_SEG HNSW graph for package search
├── WITNESS_SEG 6-entry tamper-evident audit trail
├── CRYPTO_SEG 3 Ed25519 SSH user keys (root, deploy, claude)
├── MANIFEST_SEG 4 KB root with segment directory
└── Snapshot v1 derived image with lineage tracking
Boot and connect:
rvf launch claude_code_appliance.rvf # Boot on QEMU/Firecracker
ssh -p 2222 deploy@localhost # SSH in
curl -s localhost:8080/query -d '{"vector":[0.1,...], "k":5}'Final file: 5.1 MB single .rvf — boots Linux, serves queries, runs Claude Code.
What RVF Contains
An RVF file is built from segments — self-describing blocks that can be combined freely. Here are all 16 types, grouped by purpose:
Data Indexing Compression Runtime
+-----------+ +-----------+ +-----------+ +-----------+
| VEC 0x01 | | INDEX 0x02| | QUANT 0x06| | WASM |
| (vectors) | | (HNSW) | | (SQ/PQ/BQ)| | (5.5 KB) |
+-----------+ +-----------+ +-----------+ +-----------+
| META 0x07 | | META_IDX | | HOT 0x08 | | KERNEL |
| (key-val) | | 0x0D | | (promoted) | | 0x0E |
+-----------+ +-----------+ +-----------+ +-----------+
| JOURNAL | | OVERLAY | | SKETCH | | EBPF |
| 0x04 | | 0x03 | | 0x09 | | 0x0F |
+-----------+ +-----------+ +-----------+ +-----------+
Trust State Domain
+-----------+ +-----------+ +-----------+
| WITNESS | | MANIFEST | | PROFILE |
| 0x0A | | 0x05 | | 0x0B |
+-----------+ +-----------+ +-----------+
| CRYPTO |
| 0x0C |
+-----------+
Any segment you don't need is simply absent. A basic vector store uses VEC + INDEX + MANIFEST. A sealed cognitive engine might use all 16.
RVF is the universal substrate for the entire RuVector ecosystem. Here's how the 75+ Rust crates map onto RVF segments:
| Domain | Crates | RVF Segments Used |
|---|---|---|
| LLM inference | ruvllm, ruvllm-cli |
VEC (KV cache), OVERLAY (LoRA), WITNESS (audit) |
| Self-optimizing learning | sona |
OVERLAY (micro-LoRA), META (EWC++ weights) |
| Graph neural networks | ruvector-gnn, ruvector-graph |
INDEX (HNSW topology), META (edge weights) |
| Quantum computing | ruQu, ruqu-core, ruqu-algorithms |
SKETCH (VQE snapshots), META (syndrome tables) |
| Attention mechanisms | ruvector-attention, ruvector-mincut-gated-transformer |
VEC (attention matrices), QUANT (INT4/FP16) |
| Coherence systems | cognitum-gate-kernel, prime-radiant |
WITNESS (tile witnesses), WASM (64 KB tiles) |
| Neuromorphic | ruvector-nervous-system, micro-hnsw-wasm |
VEC (spike trains), INDEX (spiking HNSW) |
| Agent memory | agentdb, claude-flow, agentic-flow |
VEC + INDEX + WITNESS (full agent state) |
| Edge / browser | rvlite, rvf-wasm |
VEC + INDEX via 5.5 KB WASM microkernel |
| Hyperbolic geometry | ruvector-hyperbolic-hnsw, ruvector-math |
INDEX (Poincaré ball HNSW) |
| Routing / inference | ruvector-tiny-dancer-core, ruvector-sparse-inference |
VEC (feature vectors), META (routing policies) |
| Observation pipeline | ospipe |
META (state vectors), WITNESS (provenance) |
Performance & Comparison
RVF is designed for speed at every layer:
| Metric | Value | Example |
|---|---|---|
| Cold boot (4 KB manifest) | < 5 ms | wire_format |
| First query (Layer A only) | recall >= 0.70 | progressive_index |
| Full recall (Layer C) | >= 0.95 | progressive_index |
| WASM binary size | ~5.5 KB | — |
| Segment header | 64 bytes | wire_format |
| Witness chain entry | 73 bytes | crypto_signing |
| Scalar quantization | 4x compression | quantization |
| Product quantization | 8-32x compression | quantization |
| Binary quantization | 32x compression | quantization |
Instead of waiting for the full index, RVF serves queries immediately:
Layer A ─────> Layer B ─────> Layer C
(microsecs) (~10 ms) (~50 ms)
recall ~0.70 recall ~0.85 recall ~0.95
The progressive_index example measures this recall progression with brute-force ground truth.
| Feature | RVF | Annoy | FAISS | Qdrant | Milvus |
|---|---|---|---|---|---|
| Single-file format | Yes | Yes | No | No | No |
| Crash-safe (no WAL) | Yes | No | No | WAL | WAL |
| Progressive loading | 3 layers | No | No | No | No |
| WASM support | 5.5 KB | No | No | No | No |
no_std compatible |
Yes | No | No | No | No |
| Post-quantum sigs | ML-DSA-65 | No | No | No | No |
| TEE attestation | Yes | No | No | No | No |
| Metadata filtering | Yes | No | Yes | Yes | Yes |
| Auto quantization | 3-tier | No | Manual | Yes | Yes |
| Append-only | Yes | Build-once | Build-once | Log | Log |
| Witness chains | Yes | No | No | No | No |
| Lineage provenance | Yes (DNA-style) | No | No | No | No |
| Computational container | Yes (WASM/eBPF/unikernel) | No | No | No | No |
| Domain profiles | 5 profiles | No | No | No | No |
| Language bindings | Rust, Node, WASM | C++, Python | C++, Python | Rust, Python | Go, Python |
RVF replaces multiple tools because it carries data, model, graph, runtime, and trust chain together:
| Capability | RVF | GGUF | ONNX | SafeTensors | Neo4j | Docker/OCI |
|---|---|---|---|---|---|---|
| Vector storage + search | Yes | No | No | No | No | No |
| Model weight deltas (LoRA) | OVERLAY_SEG | Full weights | Full graph | Weights only | No | No |
| Graph neural state | GRAPH_SEG | No | No | No | Yes | No |
| Cryptographic audit trail | WITNESS_SEG | No | No | No | No | No |
| Self-booting runtime | KERNEL_SEG | No | No | No | No | Yes |
| Kernel-level acceleration | EBPF_SEG | No | No | No | No | No |
| File lineage / versioning | DNA-style | No | No | No | No | Image layers |
| TEE attestation | Built-in | No | No | No | No | No |
| Single portable file | Yes | Yes | Yes | Yes | No | Image tarball |
| Runs in browser | 5.5 KB WASM | No | ONNX.js | No | No | No |
Usage Patterns (8 patterns)
The most common use case. Create a store, add embeddings, query nearest neighbors.
use rvf_runtime::{RvfStore, RvfOptions, QueryOptions};
use rvf_runtime::options::DistanceMetric;
let options = RvfOptions {
dimension: 384,
metric: DistanceMetric::L2,
..Default::default()
};
let mut store = RvfStore::create("vectors.rvf", options)?;
// Insert embeddings
store.ingest_batch(&[&embedding], &[1], None)?;
// Query top-10 nearest neighbors
let results = store.query(&query, 10, &QueryOptions::default())?;
for r in &results {
println!("id={}, distance={:.4}", r.id, r.distance);
}See: basic_store.rs
Attach metadata to vectors, then filter during queries.
use rvf_runtime::{FilterExpr, MetadataEntry, MetadataValue};
use rvf_runtime::filter::FilterValue;
// Add metadata during ingestion
let metadata = vec![
MetadataEntry { field_id: 0, value: MetadataValue::String("science".into()) },
MetadataEntry { field_id: 1, value: MetadataValue::U64(95) },
];
store.ingest_batch(&[&vec], &[42], Some(&metadata))?;
// Query with filter: category == "science" AND score > 80
let filter = FilterExpr::And(vec![
FilterExpr::Eq(0, FilterValue::String("science".into())),
FilterExpr::Gt(1, FilterValue::U64(80)),
]);
let opts = QueryOptions { filter: Some(filter), ..Default::default() };
let results = store.query(&query, 10, &opts)?;See: filtered_search.rs
Start serving queries instantly, improve quality as more data loads.
use rvf_index::{build_full_index, build_layer_a, build_layer_c, ProgressiveIndex};
// Build HNSW graph
let graph = build_full_index(&store, n, &config, &rng, &l2_distance);
// Layer A: instant but approximate
let layer_a = build_layer_a(&graph, ¢roids, &assignments, n as u64);
let idx = ProgressiveIndex { layer_a: Some(layer_a), layer_b: None, layer_c: None };
let fast_results = idx.search(&query, 10, 200, &store); // recall ~0.70
// Layer C: full precision
let layer_c = build_layer_c(&graph);
let idx_full = ProgressiveIndex { layer_a: Some(layer_a), layer_b: None, layer_c: Some(layer_c) };
let precise_results = idx_full.search(&query, 10, 200, &store); // recall ~0.95See: progressive_index.rs
Sign segments and build tamper-evident audit trails.
use rvf_crypto::{sign_segment, verify_segment, create_witness_chain, WitnessEntry, shake256_256};
use ed25519_dalek::SigningKey;
// Sign a segment
let footer = sign_segment(&header, &payload, &signing_key);
// Verify signature
assert!(verify_segment(&header, &payload, &footer, &verifying_key));
// Build an audit trail
let entries = vec![WitnessEntry {
prev_hash: [0; 32],
action_hash: shake256_256(b"inserted 1000 vectors"),
timestamp_ns: 1_700_000_000_000_000_000,
witness_type: 0x01, // PROVENANCE
}];
let chain = create_witness_chain(&entries);See: crypto_signing.rs
Load embeddings from common formats without writing a parser.
use rvf_import::{import_json, import_csv, import_npy};
// From a JSON array of vectors
import_json("embeddings.json", &mut store)?;
// From a CSV file (one vector per row)
import_csv("embeddings.csv", &mut store)?;
// From a NumPy .npy file
import_npy("embeddings.npy", &mut store)?;Remove vectors by ID, then reclaim disk space.
// Delete specific vectors (marks as tombstones)
store.delete_batch(&[42, 99, 1001])?;
// Compact: rewrite the file without tombstoned data
store.compact()?;Create derived files that track their ancestry.
use rvf_types::DerivationType;
// Create a parent store
let parent = RvfStore::create("parent.rvf", options)?;
// Derive a filtered child — records parent's hash automatically
let child = parent.derive("child.rvf", DerivationType::Filter, None)?;
assert_eq!(child.lineage_depth(), 1);
assert_eq!(child.parent_id(), parent.file_id());
// Derive a grandchild
let grandchild = child.derive("grandchild.rvdna", DerivationType::Quantize, None)?;
assert_eq!(grandchild.lineage_depth(), 2);Pack a bootable kernel or eBPF program into the file.
use rvf_types::kernel::{KernelArch, KernelType};
use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType};
// Embed a unikernel — file can now boot as a standalone service
store.embed_kernel(KernelArch::X86_64, KernelType::HermitOs, &kernel_image, 8080)?;
// Embed an eBPF program — enables kernel-level acceleration
store.embed_ebpf(EbpfProgramType::Xdp, EbpfAttachType::XdpIngress, 384, &bytecode, &btf)?;
// Extract later
let (hdr, img) = store.extract_kernel()?.unwrap();
let (hdr, prog) = store.extract_ebpf()?.unwrap();Tutorial: Your First RVF Store (Step by Step)
Create a new Rust project and add the dependency:
cargo new my_vectors
cd my_vectorsAdd to Cargo.toml:
[dependencies]
rvf-runtime = { path = "../crates/rvf/rvf-runtime" }
tempfile = "3"use rvf_runtime::{RvfStore, RvfOptions, QueryOptions};
use rvf_runtime::options::DistanceMetric;
use tempfile::TempDir;
fn main() {
let tmp = TempDir::new().unwrap();
let path = tmp.path().join("my.rvf");
let opts = RvfOptions {
dimension: 128,
metric: DistanceMetric::L2,
..Default::default()
};
let mut store = RvfStore::create(&path, opts).unwrap();Vectors are inserted in batches. Each vector needs a unique u64 ID.
let vec_a = vec![0.1f32; 128];
let vec_b = vec![0.2f32; 128];
let vecs: Vec<&[f32]> = vec![&vec_a, &vec_b];
let ids = vec![1u64, 2];
let result = store.ingest_batch(&vecs, &ids, None).unwrap();
println!("Accepted: {}, Rejected: {}", result.accepted, result.rejected); let query = vec![0.15f32; 128];
let results = store.query(&query, 5, &QueryOptions::default()).unwrap();
for r in &results {
println!(" id={}, dist={:.6}", r.id, r.distance);
} store.close().unwrap();
let reopened = RvfStore::open(&path).unwrap();
let results2 = reopened.query(&query, 5, &QueryOptions::default()).unwrap();
assert_eq!(results.len(), results2.len());
println!("Persistence verified!");
}Accepted: 2, Rejected: 0
id=1, dist=0.064000
id=2, dist=0.032000
Persistence verified!
Tutorial: Understanding Quantization Tiers
A million 384-dim vectors at full precision (fp32) takes 1.5 GB of RAM. Not all vectors are accessed equally — most are rarely touched. Why keep them all at full precision?
RVF assigns vectors to three compression levels based on how often they're accessed:
| Tier | Access Pattern | Compression | Memory per Vector (384d) |
|---|---|---|---|
| Hot | Frequently queried | Scalar (fp32 -> u8) | 384 bytes (4x smaller) |
| Warm | Occasionally queried | Product quantization | 48 bytes (32x smaller) |
| Cold | Rarely accessed | Binary (1-bit) | 48 bytes (32x smaller) |
| Raw | No compression | fp32 | 1,536 bytes |
1. Track access patterns using a Count-Min Sketch (a probabilistic counter):
let mut sketch = CountMinSketch::default_sketch();
// Every time a vector is accessed, increment its counter
sketch.increment(vector_id);
// Check how often a vector has been accessed
let count = sketch.estimate(vector_id);2. Assign tiers based on configurable thresholds:
let tier = assign_tier(count);
// Hot: count >= 100
// Warm: count >= 10
// Cold: count < 103. Encode at the appropriate level:
// Hot: Scalar (fast, low error)
let sq = ScalarQuantizer::train(&vectors);
let encoded = sq.encode_vec(&vector); // 384 bytes
// Warm: Product (balanced)
let pq = ProductQuantizer::train(&vectors, 48, 64, 20);
let encoded = pq.encode_vec(&vector); // 48 bytes
// Cold: Binary (smallest, approximate)
let bits = encode_binary(&vector); // 48 bytescargo run --example quantizationYou'll see a comparison table showing compression ratio, reconstruction error (MSE), and bytes per vector for each tier.
Tutorial: Building Witness Chains for Audit Trails
A witness chain is a tamper-evident log of events. Each entry links to the previous one through a cryptographic hash. If any entry is modified, all subsequent hash links break — making tampering detectable without a blockchain.
Entry 0 (genesis) Entry 1 Entry 2
+-------------------+ +-------------------+ +-------------------+
| prev_hash: 0x00.. | | prev_hash: H(E0) | | prev_hash: H(E1) |
| action: H(data) | | action: H(data) | | action: H(data) |
| timestamp: T0 | | timestamp: T1 | | timestamp: T2 |
| type: PROVENANCE | | type: COMPUTATION | | type: SEARCH |
+-------------------+ +-------------------+ +-------------------+
73 bytes 73 bytes 73 bytes
- prev_hash: SHAKE-256 hash of the previous entry (zeroed for genesis)
- action_hash: SHAKE-256 hash of whatever action is being recorded
- timestamp_ns: Nanosecond UNIX timestamp
- witness_type: What kind of event (see table below)
| Code | Name | When to Use |
|---|---|---|
0x01 |
PROVENANCE | Data origin tracking (e.g., "loaded from model X") |
0x02 |
COMPUTATION | Operation recording (e.g., "built HNSW index") |
0x03 |
SEARCH | Query audit (e.g., "searched for query Q, got results R") |
0x04 |
DELETION | Deletion audit (e.g., "deleted vectors 1-100") |
0x05 |
PLATFORM_ATTESTATION | TEE attestation (e.g., "enclave measured as M") |
0x06 |
KEY_BINDING | Sealed key (e.g., "key K bound to enclave M") |
0x07 |
COMPUTATION_PROOF | Verified computation (e.g., "search ran inside enclave") |
0x08 |
DATA_PROVENANCE | Full chain (e.g., "model -> TEE -> RVF file") |
0x09 |
DERIVATION | File lineage derivation event |
0x0A |
LINEAGE_MERGE | Multi-parent lineage merge |
0x0B |
LINEAGE_SNAPSHOT | Lineage snapshot checkpoint |
0x0C |
LINEAGE_TRANSFORM | Lineage transform operation |
0x0D |
LINEAGE_VERIFY | Lineage verification event |
use rvf_crypto::{create_witness_chain, verify_witness_chain, WitnessEntry, shake256_256};
// Record three events
let entries = vec![
WitnessEntry {
prev_hash: [0; 32], // genesis
action_hash: shake256_256(b"loaded embeddings from model-v2"),
timestamp_ns: 1_700_000_000_000_000_000,
witness_type: 0x01,
},
WitnessEntry {
prev_hash: [0; 32], // filled by create_witness_chain
action_hash: shake256_256(b"built HNSW index (M=16, ef=200)"),
timestamp_ns: 1_700_000_001_000_000_000,
witness_type: 0x02,
},
WitnessEntry {
prev_hash: [0; 32],
action_hash: shake256_256(b"query: top-10 for user request #42"),
timestamp_ns: 1_700_000_002_000_000_000,
witness_type: 0x03,
},
];
let chain_bytes = create_witness_chain(&entries);
let verified = verify_witness_chain(&chain_bytes).unwrap();
assert_eq!(verified.len(), 3);Flip any byte in the chain and verification fails:
let mut tampered = chain_bytes.clone();
tampered[100] ^= 0xFF; // flip one byte
assert!(verify_witness_chain(&tampered).is_err()); // detected!cargo run --example crypto_signingThe example creates a 5-entry chain, verifies it, then demonstrates tamper and truncation detection.
Tutorial: Wire Format Deep Dive
Every piece of data in an RVF file is wrapped in a self-describing segment. The header is always exactly 64 bytes:
Offset Size Field Description
------ ---- ----- -----------
0x00 4 magic 0x52564653 ("RVFS")
0x04 1 version Format version (currently 1)
0x05 1 seg_type Segment type (VEC, INDEX, MANIFEST, ...)
0x06 2 flags Bitfield (COMPRESSED, SIGNED, ATTESTED, ...)
0x08 8 segment_id Monotonically increasing ID
0x10 8 payload_length Byte length of payload
0x18 8 timestamp_ns Nanosecond UNIX timestamp
0x20 1 checksum_algo 0=CRC32C, 1=XXH3-128, 2=SHAKE-256
0x21 1 compression 0=none, 1=LZ4, 2=ZSTD
0x22 2 reserved_0 Must be zero
0x24 4 reserved_1 Must be zero
0x28 16 content_hash First 128 bits of payload hash
0x38 4 uncompressed_len Original size before compression
0x3C 4 alignment_pad Padding to 64-byte boundary
| Code | Name | Purpose |
|---|---|---|
0x01 |
VEC | Raw vector embeddings |
0x02 |
INDEX | HNSW adjacency and routing tables |
0x03 |
OVERLAY | Graph overlay deltas |
0x04 |
JOURNAL | Metadata mutations, deletions |
0x05 |
MANIFEST | Segment directory, epoch state |
0x06 |
QUANT | Quantization dictionaries (scalar/PQ/binary) |
0x07 |
META | Key-value metadata |
0x08 |
HOT | Temperature-promoted data |
0x09 |
SKETCH | Access counter sketches (Count-Min) |
0x0A |
WITNESS | Audit trails, attestation proofs |
0x0B |
PROFILE | Domain profile declarations |
0x0C |
CRYPTO | Key material, signature chains |
0x0D |
META_IDX | Metadata inverted indexes |
0x0E |
KERNEL | Compressed unikernel image (self-booting) |
0x0F |
EBPF | eBPF program for kernel-level acceleration |
| Bit | Name | Description |
|---|---|---|
| 0 | COMPRESSED | Payload is compressed (LZ4 or ZSTD) |
| 1 | ENCRYPTED | Payload is encrypted |
| 2 | SIGNED | Signature footer follows payload |
| 3 | SEALED | Immutable (compaction output) |
| 4 | PARTIAL | Streaming / partial write |
| 5 | TOMBSTONE | Logical deletion marker |
| 6 | HOT | Temperature-promoted |
| 7 | OVERLAY | Contains delta data |
| 8 | SNAPSHOT | Full snapshot |
| 9 | CHECKPOINT | Safe rollback point |
| 10 | ATTESTED | Produced inside attested TEE |
| 11 | HAS_LINEAGE | File carries FileIdentity lineage data |
RVF doesn't need a write-ahead log. Instead:
- Write data segment + payload, then
fsync - Write MANIFEST_SEG with updated state, then
fsync
If the process crashes between fsyncs, the incomplete segment has no manifest reference — it's ignored on recovery. Simple, safe, fast.
To find the current state, scan backward from the end of the file for the latest MANIFEST_SEG. The root manifest fits in 4 KB, so cold boot takes < 5 ms.
cargo run --example wire_formatYou'll see three segments written, read back, hash-validated, corruption detected, and a tail-scan for the manifest.
Tutorial: Metadata Filtering Patterns
| Expression | Syntax | Description |
|---|---|---|
Eq |
FilterExpr::Eq(field_id, value) |
Exact match |
Ne |
FilterExpr::Ne(field_id, value) |
Not equal |
Gt |
FilterExpr::Gt(field_id, value) |
Greater than |
Lt |
FilterExpr::Lt(field_id, value) |
Less than |
Range |
FilterExpr::Range(field_id, low, high) |
Value in [low, high) |
In |
FilterExpr::In(field_id, values) |
Value is one of |
And |
FilterExpr::And(vec![...]) |
All conditions must match |
Or |
FilterExpr::Or(vec![...]) |
Any condition matches |
| Type | Rust | Use Case |
|---|---|---|
String |
MetadataValue::String("cat".into()) |
Categories, labels, tags |
U64 |
MetadataValue::U64(95) |
Scores, counts, timestamps |
Bytes |
MetadataValue::Bytes(vec![...]) |
Binary data, hashes |
Category filter:
FilterExpr::Eq(0, FilterValue::String("science".into()))Score range:
FilterExpr::Range(1, FilterValue::U64(30), FilterValue::U64(90))Multi-category:
FilterExpr::In(0, vec![
FilterValue::String("science".into()),
FilterValue::String("tech".into()),
])Combined (AND):
FilterExpr::And(vec![
FilterExpr::Eq(0, FilterValue::String("science".into())),
FilterExpr::Gt(1, FilterValue::U64(80)),
])cargo run --example filtered_searchThe example creates 500 vectors with category and score metadata, then runs 7 different filter queries showing selectivity and verification.
Tutorial: Progressive Index Recall Measurement
Recall@K measures how many of the true K nearest neighbors your approximate algorithm actually returns. A recall of 0.95 means 95% of results are correct.
recall@K = |approximate_results ∩ exact_results| / K
RVF builds an HNSW (Hierarchical Navigable Small World) graph, then splits it into three loadable layers:
Layer A: Coarse Routing
- Entry points (topmost HNSW nodes)
- Partition centroids for guided search
- Loads in microseconds
- Recall: ~0.40-0.70
Layer B: Hot Region
- Adjacency lists for the most frequently accessed vectors
- Covers the "working set" of your data
- Recall: ~0.70-0.85
Layer C: Full Graph
- Complete HNSW adjacency for all vectors
- Loaded in background while queries are already being served
- Recall: >= 0.95
The progressive_index example:
- Generates 5,000 vectors (128 dims)
- Builds the full HNSW graph (M=16, ef_construction=200)
- Splits into Layer A, B, C
- Runs 50 queries at each stage
- Computes recall@10 against brute-force ground truth
cargo run --example progressive_indexExpected output:
=== Recall Progression Summary ===
Layers Recall@10
A only 0.xxx
A + B 0.xxx
A + B + C 0.9xx
The ef_search parameter controls how many candidates HNSW explores during search. Higher values improve recall at the cost of latency:
| ef_search | Recall@10 | Relative Speed |
|---|---|---|
| 10 | ~0.75 | Fastest |
| 50 | ~0.90 | Balanced |
| 200 | ~0.97 | Most accurate |
Technical Reference: Signature Footer Format
When the SIGNED flag is set on a segment, a signature footer follows the payload:
| Offset | Size | Field |
|---|---|---|
| 0x00 | 2 | sig_algo (0=Ed25519, 1=ML-DSA-65, 2=SLH-DSA-128s) |
| 0x02 | 2 | sig_length |
| 0x04 | var | signature (64 to 7,856 bytes) |
| var | 4 | footer_length (for backward scan) |
| Algorithm | Signature Size | Security Level | Standard |
|---|---|---|---|
| Ed25519 | 64 bytes | 128-bit classical | RFC 8032 |
| ML-DSA-65 | 3,309 bytes | NIST Level 3 (post-quantum) | FIPS 204 |
| SLH-DSA-128s | 7,856 bytes | NIST Level 1 (post-quantum, stateless) | FIPS 205 |
- Serialize the segment header (64 bytes) and payload into a signing buffer
- Compute SHAKE-256 hash of the buffer
- Sign the hash with the chosen algorithm
- Append the signature footer after the payload (before padding)
- Set the
SIGNEDflag in the header
- Read segment header and payload
- Recompute SHAKE-256 hash of header + payload
- Read signature footer (scan backward from segment end using
footer_length) - Verify signature against the public key
Technical Reference: Confidential Core Attestation
RVF can record hardware TEE (Trusted Execution Environment) attestation quotes alongside vector data. This provides cryptographic proof that:
- The platform is genuine (e.g., real Intel SGX hardware)
- The code running inside the enclave matches a known measurement
- Encryption keys are sealed to the enclave identity
- Vector operations were computed inside the secure environment
| Platform | Enum Value | Quote Format |
|---|---|---|
| Intel SGX | TeePlatform::Sgx (0) |
DCAP attestation quote |
| AMD SEV-SNP | TeePlatform::SevSnp (1) |
VCEK attestation report |
| Intel TDX | TeePlatform::Tdx (2) |
TD quote |
| ARM CCA | TeePlatform::ArmCca (3) |
CCA token |
| Software (testing) | TeePlatform::SoftwareTee (0xFE) |
Synthetic (no hardware) |
Offset Size Field
------ ---- -----
0x00 1 platform TeePlatform enum value
0x01 1 attestation_type AttestationWitnessType enum value
0x02 4 quote_length Length of the platform-specific quote
0x06 2 reserved
0x08 32 measurement SHAKE-256 hash of enclave code
0x28 32 signer_id SHAKE-256 hash of signing identity
0x48 8 timestamp_ns Nanosecond UNIX timestamp
0x50 16 nonce Anti-replay nonce
0x60 2 svn Security Version Number
0x62 1 sig_algo Signature algorithm for the quote
0x63 1 flags Attestation flags
0x64 4 report_data_len Length of additional report data
0x68 8 reserved
| Type | Witness Code | Purpose |
|---|---|---|
| Platform Attestation | 0x05 |
TEE identity + measurement verification |
| Key Binding | 0x06 |
Keys sealed to enclave measurement |
| Computation Proof | 0x07 |
Proof that operations ran inside enclave |
| Data Provenance | 0x08 |
Full chain: model -> TEE -> RVF file |
Any segment produced inside a TEE should set bit 10 (ATTESTED) in the segment header flags. This enables fast scanning to identify attested segments without parsing payloads.
The verification interface is pluggable:
pub trait QuoteVerifier {
fn platform(&self) -> TeePlatform;
fn verify_quote(
&self,
quote: &[u8],
report_data: &[u8],
expected_measurement: &[u8; 32],
) -> Result<(), String>;
}Implement this trait for your TEE platform to enable hardware-backed verification. The SoftwareTee variant allows testing without real hardware.
Technical Reference: Computational Container (Self-Booting RVF)
RVF files can optionally carry executable compute alongside vector data:
| Tier | Segment | Size | Environment | Boot Time | Use Case |
|---|---|---|---|---|---|
| 1: WASM | WASM_SEG (existing) | 5.5 KB | Browser, edge, IoT | <1 ms | Portable queries everywhere |
| 2: eBPF | EBPF_SEG (0x0F) |
10-50 KB | Linux kernel (XDP, TC) | <20 ms | Sub-microsecond hot cache hits |
| 3: Unikernel | KERNEL_SEG (0x0E) |
200 KB - 2 MB | Firecracker, TEE, bare metal | <125 ms | Zero-dependency self-booting service |
| Field | Size | Description |
|---|---|---|
kernel_magic |
4 | 0x52564B4E ("RVKN") |
header_version |
2 | Currently 1 |
kernel_arch |
1 | x86_64 (0), AArch64 (1), RISC-V (2), WASM (3) |
kernel_type |
1 | HermitOS (0), Unikraft (1), Custom (2), TestStub (0xFE) |
image_size |
4 | Uncompressed kernel size |
compressed_size |
4 | Compressed (ZSTD) size |
image_hash |
32 | SHAKE-256-256 of uncompressed image |
api_port |
2 | HTTP API port (network byte order) |
api_transport |
1 | HTTP (0), gRPC (1), virtio-vsock (2) |
kernel_flags |
8 | Feature flags (read-only, metrics, TEE, etc.) |
cmdline_len |
2 | Length of kernel command line |
| Field | Size | Description |
|---|---|---|
ebpf_magic |
4 | 0x52564250 ("RVBP") |
program_type |
1 | XDP (0), TC (1), Tracepoint (2), Socket (3) |
attach_type |
1 | XdpIngress (0), TcIngress (1), etc. |
max_dimension |
4 | Maximum vector dimension (eBPF verifier loop bound) |
bytecode_size |
4 | Size of BPF ELF object |
btf_size |
4 | Size of BTF section |
map_count |
4 | Number of BPF maps |
use rvf_runtime::RvfStore;
use rvf_types::kernel::{KernelArch, KernelType};
use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType};
let mut store = RvfStore::open("vectors.rvf")?;
// Embed a kernel
store.embed_kernel(KernelArch::X86_64, KernelType::HermitOs, &image, 8080)?;
// Embed an eBPF program
store.embed_ebpf(EbpfProgramType::Xdp, EbpfAttachType::XdpIngress, 384, &bytecode, &btf)?;
// Extract later
let (kernel_hdr, kernel_img) = store.extract_kernel()?.unwrap();
let (ebpf_hdr, ebpf_prog) = store.extract_ebpf()?.unwrap();Files with KERNEL_SEG or EBPF_SEG work with older readers -- unknown segment types are skipped per the RVF forward-compatibility rule. The computational capability is purely additive.
See ADR-030 for the full specification.
Technical Reference: DNA-Style Lineage Provenance
Every RVF file carries a 68-byte FileIdentity in its root manifest:
| Field | Size | Description |
|---|---|---|
file_id |
16 | Unique UUID for this file |
parent_id |
16 | UUID of the parent file (all zeros for root) |
parent_hash |
32 | SHAKE-256-256 of parent's manifest |
lineage_depth |
4 | Generation count (0 for root) |
Parent.rvf ──derive()──> Child.rvf ──derive()──> Grandchild.rvdna
file_id: A file_id: B file_id: C
parent_id: [0;16] parent_id: A parent_id: B
parent_hash: [0;32] parent_hash: hash(A) parent_hash: hash(B)
depth: 0 depth: 1 depth: 2
| Code | Type | Description |
|---|---|---|
| 0 | Clone | Exact copy |
| 1 | Filter | Subset of parent's vectors |
| 2 | Merge | Multi-parent merge |
| 3 | Quantize | Re-quantized version |
| 4 | Reindex | Re-indexed with different parameters |
| 5 | Transform | Transformed embeddings |
| 6 | Snapshot | Point-in-time snapshot |
| 0xFF | UserDefined | Application-specific derivation |
use rvf_runtime::RvfStore;
use rvf_types::DerivationType;
let parent = RvfStore::create("parent.rvf", options)?;
// Derive a filtered child
let child = parent.derive("child.rvf", DerivationType::Filter, None)?;
assert_eq!(child.lineage_depth(), 1);
assert_eq!(child.parent_id(), parent.file_id());| Extension | Domain Profile | Optimized For |
|---|---|---|
.rvf |
Generic | General-purpose vectors |
.rvdna |
RVDNA | Genomic sequence embeddings |
.rvtext |
RVText | Language model embeddings |
.rvgraph |
RVGraph | Graph/network node embeddings |
.rvvis |
RVVision | Image/vision model embeddings |
See ADR-029 for the full format specification.
Technical Reference: Crate Architecture
+-----------------------------------------+
| Cognitive Layer |
| ruvllm | gnn | ruQu | attention | sona |
| mincut | prime-radiant | nervous-system |
+---+-------------+---------------+-------+
| | |
+-----------------------------------------+
| Application Layer |
| claude-flow | agentdb | agentic-flow |
| ospipe | rvlite | sona | your-app |
+---+-------------+---------------+-------+
| | |
+---v-------------v---------------v-------+
| RVF SDK Layer |
| rvf-runtime | rvf-index | rvf-quant |
| rvf-manifest | rvf-crypto | rvf-wire |
+---+-------------+---------------+-------+
| | |
+--------v------+ +---v--------+ +----v-------+ +----v------+
| rvf-server | | rvf-node | | rvf-wasm | | rvf-cli |
| HTTP + TCP | | N-API | | ~46 KB | | clap |
+---------------+ +------------+ +------------+ +-----------+
| Crate | Lines | no_std | Purpose |
|---|---|---|---|
rvf-types |
3,184 | Yes | Segment types, kernel/eBPF headers, lineage, enums |
rvf-wire |
2,011 | Yes | Wire format read/write, hash validation |
rvf-manifest |
1,580 | No | Two-level manifest with 4 KB root, FileIdentity codec |
rvf-index |
2,691 | No | HNSW progressive indexing (Layer A/B/C) |
rvf-quant |
1,443 | No | Scalar, product, and binary quantization |
rvf-crypto |
1,725 | Partial | SHAKE-256, Ed25519, witness chains, attestation, lineage |
rvf-runtime |
3,607 | No | Full store API, compaction, lineage, kernel/eBPF embed |
rvf-import |
980 | No | JSON, CSV, NumPy (.npy) importers |
rvf-wasm |
1,616 | Yes | WASM control plane: in-memory store, query, segment inspection |
rvf-node |
852 | No | Node.js N-API bindings with lineage, kernel/eBPF, inspection |
rvf-cli |
665 | No | Unified CLI: create, ingest, query, delete, status, inspect, compact, derive, serve |
rvf-server |
1,165 | No | HTTP REST + TCP streaming server |
| Adapter | Purpose | Key Feature |
|---|---|---|
rvf-adapter-claude-flow |
AI agent memory | WITNESS_SEG audit trails |
rvf-adapter-agentdb |
Agent vector database | Progressive HNSW indexing |
rvf-adapter-ospipe |
Observation-State pipeline | META_SEG for state vectors |
rvf-adapter-agentic-flow |
Swarm coordination | Inter-agent memory sharing |
rvf-adapter-rvlite |
Lightweight embedded store | Minimal API, edge-friendly |
rvf-adapter-sona |
Neural architecture | Experience replay + trajectories |
Technical Reference: File Format Specification
| Extension | Usage |
|---|---|
.rvf |
Standard RuVector Format file |
.rvf.cold.N |
Cold shard N (multi-file mode) |
.rvf.idx.N |
Index shard N (multi-file mode) |
application/x-ruvector-format
0x52564653 (ASCII: "RVFS")
All multi-byte integers are little-endian.
All segments are 64-byte aligned (cache-line friendly). Payloads are padded to the next 64-byte boundary.
The root manifest (Level 0) occupies the last 4,096 bytes of the most recent MANIFEST_SEG. This enables instant location via backward scan:
let (offset, header) = find_latest_manifest(&file_data)?;The root manifest provides:
- Segment directory (offsets to all segments)
- Hotset pointers (entry points, top layer, centroids, quant dicts)
- Epoch counter
- Vector count and dimension
- Profile identifiers
| Profile | Code | Optimized For |
|---|---|---|
| Generic | 0x00 |
General-purpose vectors |
| RVDNA | 0x01 |
Genomic sequence embeddings |
| RVText | 0x02 |
Language model embeddings |
| RVGraph | 0x03 |
Graph/network node embeddings |
| RVVision | 0x04 |
Image/vision model embeddings |
Building from Source
- Rust 1.87+ via rustup (
rustup update stable) - For WASM:
rustup target add wasm32-unknown-unknown - For Node.js bindings: Node.js 18+ and
npm
cd examples/rvf
cargo buildcd crates/rvf
cargo build --workspacecd crates/rvf
cargo test --workspacecd crates/rvf
cargo clippy --all-targets --workspace --exclude rvf-wasmcd crates/rvf
cargo build --target wasm32-unknown-unknown -p rvf-wasm --release
ls target/wasm32-unknown-unknown/release/rvf_wasm.wasmcd crates/rvf/rvf-node
npm install && npm run buildcd crates/rvf
cargo bench --bench rvf_benchmarksProject Structure
examples/rvf/
Cargo.toml # Standalone workspace
src/lib.rs # Shared utilities
examples/
# Core (6)
basic_store.rs # Store lifecycle, insert, query, persistence
progressive_index.rs # Three-layer HNSW, recall measurement
quantization.rs # Scalar, product, binary quantization + tiering
wire_format.rs # Raw segment I/O, hash validation, tail-scan
crypto_signing.rs # Ed25519 signing, witness chains, tamper detection
filtered_search.rs # Metadata-filtered vector search
# Agentic AI (6)
agent_memory.rs # Persistent agent memory + witness audit
swarm_knowledge.rs # Multi-agent shared knowledge base
reasoning_trace.rs # Chain-of-thought with lineage derivation
tool_cache.rs # Tool call result cache with TTL + compaction
agent_handoff.rs # Transfer agent state between instances
experience_replay.rs # RL experience replay buffer
# Practical Production (5)
semantic_search.rs # Document search engine (4 filter workflows)
recommendation.rs # Item recommendations (collaborative filtering)
rag_pipeline.rs # Retrieval-augmented generation pipeline
embedding_cache.rs # LRU cache with temperature tiering
dedup_detector.rs # Near-duplicate detection + compaction
# Vertical Domains (4)
genomic_pipeline.rs # DNA k-mer search (.rvdna profile)
financial_signals.rs # Market signals with attestation
medical_imaging.rs # Radiology embedding search (.rvvis)
legal_discovery.rs # Legal document similarity (.rvtext)
# Exotic Capabilities (5)
self_booting.rs # RVF with embedded unikernel
ebpf_accelerator.rs # eBPF hot-path acceleration
hyperbolic_taxonomy.rs # Hierarchy-aware search
multimodal_fusion.rs # Cross-modal text + image search
sealed_engine.rs # Full cognitive engine (capstone)
# Runtime Targets + Postgres (5)
browser_wasm.rs # Browser-side WASM vector search
edge_iot.rs # IoT device with binary quantization
serverless_function.rs # Cold-start optimized for Lambda
ruvllm_inference.rs # LLM KV cache + LoRA via RVF
postgres_bridge.rs # PostgreSQL ↔ RVF export/import
# Network & Security (4)
network_sync.rs # Peer-to-peer vector store sync
tee_attestation.rs # TEE attestation + sealed keys
access_control.rs # Role-based vector access control
zero_knowledge.rs # Zero-knowledge proof integration
# Autonomous Agent (1)
ruvbot.rs # Autonomous RVF-powered agent bot
# POSIX & Systems (3)
posix_fileops.rs # POSIX file operations with RVF
linux_microkernel.rs # Linux microkernel distribution
mcp_in_rvf.rs # MCP server embedded in RVF
# Network Operations (1)
network_interfaces.rs # Network OS telemetry (60 interfaces)
| Resource | Description |
|---|---|
| RVF Format Specification | Full format documentation, architecture, and API reference |
| ADR-029 | Architecture decision record for the canonical format |
| ADR-030 | Computational container (KERNEL_SEG, EBPF_SEG) specification |
| ADR-031 | Example repository design (this collection of 40 examples) |
| Benchmarks | Performance benchmarks (HNSW build, quantization, wire I/O) |
| Integration Tests | E2E test suite (progressive recall, quantization, wire interop) |
git clone https://github.com/ruvnet/ruvector
cd ruvector/examples/rvf
cargo build && cargo run --example basic_storeAll contributions must pass cargo clippy with zero warnings and maintain the existing test count (currently 543+).
Dual-licensed under MIT or Apache-2.0 at your option.
Built with Rust. One file — store it, send it, run it.