Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

RVF Examples — Learn by Running

Hands-on examples for the unified agentic AI format — store it, send it, run it

Quick StartExamplesFeaturesPerformanceComparison

Examples Rust License Tests no_std Crates


What is RVF?

RVF (RuVector Format) is the unified agentic AI file format. One .rvf file does three jobs:

  1. Store — vectors, indexes, metadata, and cryptographic proofs live in one file. No database server required.
  2. Transfer — the same file streams over a network. Query, insert, and delete operations work over the wire with zero conversion.
  3. Run — pack model weights, graph neural networks, WASM code, or even a bootable OS kernel into the file. Now it's not just data — it's a self-contained intelligence unit you can deploy anywhere.

Why does this matter?

Today, an AI agent's state is scattered: embeddings in one database, model weights in another, graph structure in a third, config in a fourth. Nothing talks to anything else. Moving between tools means re-indexing from scratch. There's no standard way to prove any of it was computed securely — and no way to hand an agent its complete knowledge as a single portable artifact.

RVF solves this. It gives agentic AI a universal substrate — one file that works everywhere:

What it does Where it runs What you get
Stores vectors Server (HNSW index) Sub-millisecond search over millions of vectors
Stores vectors Browser (5.5 KB WASM) Same file, no backend needed
Stores vectors Edge / IoT / mobile Lightweight API, tiny footprint
Transfers data Over the network Batched query/ingest/delete via TCP
Runs code Inside a TEE Cryptographic proof of secure computation
Runs code Bare metal / VM File boots itself as a microservice
Runs code Linux kernel (eBPF) Sub-microsecond hot-path acceleration
Runs intelligence Anywhere Model + data + graph + trust chain in one file

Key properties

  • Crash-safe — no write-ahead log needed; if power dies mid-write, the file stays consistent
  • Self-describing — the schema is in the file; no external catalog required
  • Progressive loading — start answering queries before the full index is loaded
  • Domain profiles.rvdna for genomics, .rvtext for language, .rvgraph for networks, .rvvis for vision — same format underneath
  • Lineage tracking — every derived file records its parent's hash, like DNA inheritance
  • Tamper-evident — witness chains and post-quantum signatures prove nothing was altered

These examples walk you through every major feature, from the simplest "insert and query" to wire format inspection, witness chains, and sealed cognitive engines.

What you can build with RVF

Use case What goes in the file Result
Semantic search Vectors + HNSW index Single-file vector database, no server needed
Agent memory Vectors + metadata + witness chain Portable, auditable AI agent knowledge base
Sealed LoRA distribution Base embeddings + OVERLAY_SEG adapter deltas Ship fine-tuned models as one versioned file
Portable graph intelligence Node embeddings + GRAPH_SEG adjacency GNN state that transfers between systems
Self-booting AI service Vectors + index + KERNEL_SEG unikernel File boots as a microservice on bare metal or Firecracker
Kernel-accelerated cache Hot vectors + EBPF_SEG XDP program Sub-microsecond lookups in the Linux kernel data path
Confidential AI Any of the above + TEE attestation Cryptographic proof everything ran inside a secure enclave
Genomic analysis DNA k-mer embeddings + variant tensors .rvdna file with lineage tracking across analysis pipeline
Firmware-style AI versioning Full cognitive state + lineage chain Parent → child derivation with hash verification, like DNA

Quick Start

# Clone the repo
git clone https://github.com/ruvnet/ruvector
cd ruvector/examples/rvf

# Run your first example
cargo run --example basic_store

That's it. You'll see a store created, 100 vectors inserted, nearest neighbors found, and persistence verified — all in under a second.

Using the CLI

You can also work with RVF stores from the command line without writing any Rust:

# Build the CLI
cd crates/rvf && cargo build -p rvf-cli

# Create a store, ingest data, and query
rvf create vectors.rvf --dimension 384
rvf ingest vectors.rvf --input data.json --format json
rvf query vectors.rvf --vector "0.1,0.2,..." --k 10
rvf status vectors.rvf
rvf inspect vectors.rvf
rvf compact vectors.rvf

# Derive a child store with lineage tracking
rvf derive parent.rvf child.rvf --type filter

# All commands support --json for machine-readable output
rvf status vectors.rvf --json
Run All 40 Examples

Core (6):

cargo run --example basic_store          # Store lifecycle + k-NN
cargo run --example progressive_index    # Three-layer HNSW recall
cargo run --example quantization         # Scalar / product / binary
cargo run --example wire_format          # Raw segment I/O
cargo run --example crypto_signing       # Ed25519 + witness chains
cargo run --example filtered_search      # Metadata-filtered queries

Agentic AI (6):

cargo run --example agent_memory         # Persistent agent memory + witness audit
cargo run --example swarm_knowledge      # Multi-agent shared knowledge base
cargo run --example reasoning_trace      # Chain-of-thought with lineage derivation
cargo run --example tool_cache           # Tool call result cache with TTL
cargo run --example agent_handoff        # Transfer agent state between instances
cargo run --example experience_replay    # RL experience replay buffer

Practical Production (5):

cargo run --example semantic_search      # Document search with metadata filters
cargo run --example recommendation       # Item recommendations (collaborative filtering)
cargo run --example rag_pipeline         # Retrieval-augmented generation pipeline
cargo run --example embedding_cache      # LRU cache with temperature tiering
cargo run --example dedup_detector       # Near-duplicate detection + compaction

Vertical Domains (4):

cargo run --example genomic_pipeline     # DNA k-mer search (.rvdna profile)
cargo run --example financial_signals    # Market signals with TEE attestation
cargo run --example medical_imaging      # Radiology search (.rvvis profile)
cargo run --example legal_discovery      # Legal doc similarity (.rvtext profile)

Exotic Capabilities (5):

cargo run --example self_booting         # RVF with embedded unikernel
cargo run --example ebpf_accelerator     # eBPF hot-path acceleration
cargo run --example hyperbolic_taxonomy  # Hierarchy-aware search
cargo run --example multimodal_fusion    # Cross-modal text + image search
cargo run --example sealed_engine        # Full cognitive engine (capstone)

Runtime Targets (4) + Postgres (1):

cargo run --example browser_wasm         # Browser-side WASM vector search
cargo run --example edge_iot             # IoT device with binary quantization
cargo run --example serverless_function  # Cold-start optimized for Lambda
cargo run --example ruvllm_inference     # LLM KV cache + LoRA via RVF
cargo run --example postgres_bridge      # PostgreSQL ↔ RVF export/import

Network & Security (4):

cargo run --example network_sync         # Peer-to-peer vector store sync
cargo run --example tee_attestation      # TEE attestation + sealed keys
cargo run --example access_control       # Role-based vector access control
cargo run --example zero_knowledge       # Zero-knowledge proof integration

Autonomous Agent (1):

cargo run --example ruvbot               # Autonomous RVF-powered agent bot

POSIX & Systems (3):

cargo run --example posix_fileops        # POSIX file operations with RVF
cargo run --example linux_microkernel    # Linux microkernel distribution
cargo run --example mcp_in_rvf           # MCP server embedded in RVF

Network Operations (1):

cargo run --example network_interfaces   # Network OS telemetry (60 interfaces)

Prerequisites

  • Rust 1.87+ — install via rustup
  • No other dependencies needed — everything builds from source
  • All examples use deterministic pseudo-random data, so results are reproducible across runs

Examples at a Glance (40 examples)

Core

# Example Difficulty What You'll Learn
1 basic_store Beginner Create, insert, query, persist, reopen
2 progressive_index Intermediate Three-layer HNSW, recall measurement
3 quantization Intermediate Scalar/product/binary quantization, tiering
4 wire_format Advanced Raw segment I/O, hash validation, tail-scan
5 crypto_signing Advanced Ed25519 signing, witness chains, tamper detection
6 filtered_search Intermediate Metadata filters: Eq, Range, AND/OR/IN

Agentic AI

# Example Difficulty What You'll Learn
7 agent_memory Intermediate Persistent agent memory, session recall, witness audit
8 swarm_knowledge Intermediate Multi-agent shared knowledge, cross-agent search
9 reasoning_trace Advanced Chain-of-thought lineage (parent → child → grandchild)
10 tool_cache Intermediate Tool call caching, TTL, delete_by_filter, compaction
11 agent_handoff Advanced Transfer agent state, derive clone, lineage verification
12 experience_replay Intermediate RL replay buffer, priority sampling, tiering

Practical Production

# Example Difficulty What You'll Learn
13 semantic_search Beginner Document search engine, 4 filter workflows
14 recommendation Intermediate Collaborative filtering, genre/quality filters
15 rag_pipeline Advanced 5-step RAG: chunk, embed, retrieve, rerank, assemble
16 embedding_cache Advanced Zipf access patterns, 3-tier quantization, memory savings
17 dedup_detector Intermediate Near-duplicate detection, clustering, compaction

Vertical Domains

# Example Difficulty What You'll Learn
18 genomic_pipeline Advanced DNA k-mer search, .rvdna profile, lineage
19 financial_signals Advanced Market signals, Ed25519 signing, attestation
20 medical_imaging Intermediate Radiology search, .rvvis profile, audit trail
21 legal_discovery Intermediate Legal similarity, .rvtext profile, discovery audit

Exotic Capabilities

# Example Difficulty What You'll Learn
22 self_booting Advanced Embed/extract unikernel, kernel header verification
23 ebpf_accelerator Advanced Embed/extract eBPF, XDP program, co-existence
24 hyperbolic_taxonomy Intermediate Hierarchy-aware embeddings, depth-filtered search
25 multimodal_fusion Intermediate Cross-modal text+image search, modality filtering
26 sealed_engine Advanced Capstone: vectors + kernel + eBPF + witness + lineage

Runtime Targets + Postgres

# Example Difficulty What You'll Learn
27 browser_wasm Intermediate WASM-compatible API, raw wire segments, size targets
28 edge_iot Beginner Constrained device, binary quantization, memory budget
29 serverless_function Intermediate Cold start, manifest tail-scan, progressive loading
30 ruvllm_inference Advanced KV cache + LoRA adapters + policy store via RVF
31 postgres_bridge Intermediate PG export/import, offline query, lineage, witness audit

Network & Security

# Example Difficulty What You'll Learn
32 network_sync Advanced Peer-to-peer sync, vector exchange, conflict resolution
33 tee_attestation Advanced TEE platform attestation, sealed keys, computation proof
34 access_control Intermediate Role-based access, permission checks, audit trails
35 zero_knowledge Advanced ZK proofs for vector operations, privacy-preserving search

Autonomous Agent

# Example Difficulty What You'll Learn
36 ruvbot Advanced Autonomous agent with RVF memory, planning, tool use

POSIX & Systems

# Example Difficulty What You'll Learn
37 posix_fileops Intermediate Raw I/O, atomic rename, locking, segment random access
38 linux_microkernel Advanced Package management, SSH keys, kernel embed, lineage updates
39 mcp_in_rvf Advanced MCP server runtime embedded in RVF, eBPF filter, tools

Network Operations

# Example Difficulty What You'll Learn
40 network_interfaces Intermediate Multi-chassis telemetry, anomaly detection, filtered queries

Features Covered

Storage — vectors in, answers out

Feature Example Description
k-NN Search basic_store Find nearest neighbors by L2 or cosine distance
Persistence basic_store Close a store, reopen it, verify results match
Metadata Filters filtered_search Eq, Ne, Gt, Lt, Range, In, And, Or expressions
Combined Filters filtered_search Multi-condition queries (category + score range)

Indexing — speed vs. accuracy trade-offs

Feature Example Description
Progressive Indexing progressive_index Three-tier HNSW: Layer A (fast), B (better), C (best)
Recall Measurement progressive_index Compare approximate results against brute-force ground truth

Compression — fit more vectors in less memory

Feature Example Description
Scalar Quantization quantization fp32 → u8 (4x compression, Hot tier)
Product Quantization quantization fp32 → PQ codes (8-32x compression, Warm tier)
Binary Quantization quantization fp32 → 1-bit (32x compression, Cold tier)
Temperature Tiering quantization Count-Min Sketch access tracking + automatic tier assignment

Wire format — what the bytes look like on disk and over the network

Feature Example Description
Segment I/O wire_format Write/read 64-byte-aligned segments with type/flags/hash
Hash Validation wire_format CRC32c / XXH3 integrity checks on every segment
Tail-Scan wire_format Find latest manifest by scanning backward from EOF

Trust — signatures, audit trails, and tamper detection

Feature Example Description
Ed25519 Signing crypto_signing Sign segments, verify signatures, detect tampering
Witness Chains crypto_signing SHAKE-256 linked audit trails (73-byte entries)
Tamper Detection crypto_signing Any byte flip breaks chain verification

Agentic AI — lineage, domains, and self-booting intelligence

Feature Example Description
DNA-Style Lineage (API) Every derived file records its parent's hash and derivation type
Domain Profiles (API) .rvdna, .rvtext, .rvgraph, .rvvis — same format, domain-specific hints
Computational Container claude_code_appliance Embed a WASM microkernel, eBPF program, or bootable unikernel
Self-Booting Appliance claude_code_appliance 5.1 MB .rvf — boots Linux, serves queries, runs Claude Code
Import (JSON/CSV/NumPy) (API) Load embeddings from .json, .csv, or .npy files via rvf-import or rvf ingest CLI
Unified CLI rvf 9 subcommands: create, ingest, query, delete, status, inspect, compact, derive, serve
Compaction (API) Garbage-collect tombstoned vectors and reclaim disk space
Batch Delete (API) Delete vectors by ID with tombstone markers

Self-Booting RVF — Claude Code Appliance

The claude_code_appliance example builds a complete self-booting AI development environment as a single .rvf file. It uses real infrastructure — a Docker-built Linux kernel, Ed25519 SSH keys, a BPF C socket filter, and a cryptographic witness chain.

cd examples/rvf
cargo run --example claude_code_appliance

What it produces (5.1 MB file):

claude_code_appliance.rvf
  ├── KERNEL_SEG    Linux 6.8.12 bzImage (5.2 MB, x86_64)
  ├── EBPF_SEG      Socket filter — allows ports 2222, 8080 only
  ├── VEC_SEG       20 package embeddings (128-dim)
  ├── INDEX_SEG     HNSW graph for package search
  ├── WITNESS_SEG   6-entry tamper-evident audit trail
  ├── CRYPTO_SEG    3 Ed25519 SSH user keys (root, deploy, claude)
  ├── MANIFEST_SEG  4 KB root with segment directory
  └── Snapshot      v1 derived image with lineage tracking

Boot and connect:

rvf launch claude_code_appliance.rvf        # Boot on QEMU/Firecracker
ssh -p 2222 deploy@localhost                 # SSH in
curl -s localhost:8080/query -d '{"vector":[0.1,...], "k":5}'

Final file: 5.1 MB single .rvf — boots Linux, serves queries, runs Claude Code.

What RVF Contains

An RVF file is built from segments — self-describing blocks that can be combined freely. Here are all 16 types, grouped by purpose:

 Data              Indexing           Compression        Runtime
+-----------+     +-----------+     +-----------+     +-----------+
| VEC  0x01 |     | INDEX 0x02|     | QUANT 0x06|     | WASM      |
| (vectors) |     | (HNSW)    |     | (SQ/PQ/BQ)|     | (5.5 KB)  |
+-----------+     +-----------+     +-----------+     +-----------+
| META 0x07 |     | META_IDX  |     | HOT  0x08 |     | KERNEL    |
| (key-val) |     | 0x0D      |     | (promoted) |     | 0x0E      |
+-----------+     +-----------+     +-----------+     +-----------+
| JOURNAL   |     | OVERLAY   |     | SKETCH    |     | EBPF      |
| 0x04      |     | 0x03      |     | 0x09      |     | 0x0F      |
+-----------+     +-----------+     +-----------+     +-----------+

 Trust             State              Domain
+-----------+     +-----------+     +-----------+
| WITNESS   |     | MANIFEST  |     | PROFILE   |
| 0x0A      |     | 0x05      |     | 0x0B      |
+-----------+     +-----------+     +-----------+
| CRYPTO    |
| 0x0C      |
+-----------+

Any segment you don't need is simply absent. A basic vector store uses VEC + INDEX + MANIFEST. A sealed cognitive engine might use all 16.

RuVector Ecosystem Integration

RVF is the universal substrate for the entire RuVector ecosystem. Here's how the 75+ Rust crates map onto RVF segments:

Domain Crates RVF Segments Used
LLM inference ruvllm, ruvllm-cli VEC (KV cache), OVERLAY (LoRA), WITNESS (audit)
Self-optimizing learning sona OVERLAY (micro-LoRA), META (EWC++ weights)
Graph neural networks ruvector-gnn, ruvector-graph INDEX (HNSW topology), META (edge weights)
Quantum computing ruQu, ruqu-core, ruqu-algorithms SKETCH (VQE snapshots), META (syndrome tables)
Attention mechanisms ruvector-attention, ruvector-mincut-gated-transformer VEC (attention matrices), QUANT (INT4/FP16)
Coherence systems cognitum-gate-kernel, prime-radiant WITNESS (tile witnesses), WASM (64 KB tiles)
Neuromorphic ruvector-nervous-system, micro-hnsw-wasm VEC (spike trains), INDEX (spiking HNSW)
Agent memory agentdb, claude-flow, agentic-flow VEC + INDEX + WITNESS (full agent state)
Edge / browser rvlite, rvf-wasm VEC + INDEX via 5.5 KB WASM microkernel
Hyperbolic geometry ruvector-hyperbolic-hnsw, ruvector-math INDEX (Poincaré ball HNSW)
Routing / inference ruvector-tiny-dancer-core, ruvector-sparse-inference VEC (feature vectors), META (routing policies)
Observation pipeline ospipe META (state vectors), WITNESS (provenance)
Performance & Comparison

RVF is designed for speed at every layer:

Metric Value Example
Cold boot (4 KB manifest) < 5 ms wire_format
First query (Layer A only) recall >= 0.70 progressive_index
Full recall (Layer C) >= 0.95 progressive_index
WASM binary size ~5.5 KB
Segment header 64 bytes wire_format
Witness chain entry 73 bytes crypto_signing
Scalar quantization 4x compression quantization
Product quantization 8-32x compression quantization
Binary quantization 32x compression quantization

Progressive Loading

Instead of waiting for the full index, RVF serves queries immediately:

Layer A ─────> Layer B ─────> Layer C
(microsecs)    (~10 ms)       (~50 ms)
recall ~0.70   recall ~0.85   recall ~0.95

The progressive_index example measures this recall progression with brute-force ground truth.

Comparison

vs. vector databases

Feature RVF Annoy FAISS Qdrant Milvus
Single-file format Yes Yes No No No
Crash-safe (no WAL) Yes No No WAL WAL
Progressive loading 3 layers No No No No
WASM support 5.5 KB No No No No
no_std compatible Yes No No No No
Post-quantum sigs ML-DSA-65 No No No No
TEE attestation Yes No No No No
Metadata filtering Yes No Yes Yes Yes
Auto quantization 3-tier No Manual Yes Yes
Append-only Yes Build-once Build-once Log Log
Witness chains Yes No No No No
Lineage provenance Yes (DNA-style) No No No No
Computational container Yes (WASM/eBPF/unikernel) No No No No
Domain profiles 5 profiles No No No No
Language bindings Rust, Node, WASM C++, Python C++, Python Rust, Python Go, Python

vs. model registries, graph DBs, and container formats

RVF replaces multiple tools because it carries data, model, graph, runtime, and trust chain together:

Capability RVF GGUF ONNX SafeTensors Neo4j Docker/OCI
Vector storage + search Yes No No No No No
Model weight deltas (LoRA) OVERLAY_SEG Full weights Full graph Weights only No No
Graph neural state GRAPH_SEG No No No Yes No
Cryptographic audit trail WITNESS_SEG No No No No No
Self-booting runtime KERNEL_SEG No No No No Yes
Kernel-level acceleration EBPF_SEG No No No No No
File lineage / versioning DNA-style No No No No Image layers
TEE attestation Built-in No No No No No
Single portable file Yes Yes Yes Yes No Image tarball
Runs in browser 5.5 KB WASM No ONNX.js No No No
Usage Patterns (8 patterns)

Pattern 1: Simple Vector Store

The most common use case. Create a store, add embeddings, query nearest neighbors.

use rvf_runtime::{RvfStore, RvfOptions, QueryOptions};
use rvf_runtime::options::DistanceMetric;

let options = RvfOptions {
    dimension: 384,
    metric: DistanceMetric::L2,
    ..Default::default()
};
let mut store = RvfStore::create("vectors.rvf", options)?;

// Insert embeddings
store.ingest_batch(&[&embedding], &[1], None)?;

// Query top-10 nearest neighbors
let results = store.query(&query, 10, &QueryOptions::default())?;
for r in &results {
    println!("id={}, distance={:.4}", r.id, r.distance);
}

See: basic_store.rs

Pattern 2: Filtered Search

Attach metadata to vectors, then filter during queries.

use rvf_runtime::{FilterExpr, MetadataEntry, MetadataValue};
use rvf_runtime::filter::FilterValue;

// Add metadata during ingestion
let metadata = vec![
    MetadataEntry { field_id: 0, value: MetadataValue::String("science".into()) },
    MetadataEntry { field_id: 1, value: MetadataValue::U64(95) },
];
store.ingest_batch(&[&vec], &[42], Some(&metadata))?;

// Query with filter: category == "science" AND score > 80
let filter = FilterExpr::And(vec![
    FilterExpr::Eq(0, FilterValue::String("science".into())),
    FilterExpr::Gt(1, FilterValue::U64(80)),
]);
let opts = QueryOptions { filter: Some(filter), ..Default::default() };
let results = store.query(&query, 10, &opts)?;

See: filtered_search.rs

Pattern 3: Progressive Recall

Start serving queries instantly, improve quality as more data loads.

use rvf_index::{build_full_index, build_layer_a, build_layer_c, ProgressiveIndex};

// Build HNSW graph
let graph = build_full_index(&store, n, &config, &rng, &l2_distance);

// Layer A: instant but approximate
let layer_a = build_layer_a(&graph, &centroids, &assignments, n as u64);
let idx = ProgressiveIndex { layer_a: Some(layer_a), layer_b: None, layer_c: None };
let fast_results = idx.search(&query, 10, 200, &store); // recall ~0.70

// Layer C: full precision
let layer_c = build_layer_c(&graph);
let idx_full = ProgressiveIndex { layer_a: Some(layer_a), layer_b: None, layer_c: Some(layer_c) };
let precise_results = idx_full.search(&query, 10, 200, &store); // recall ~0.95

See: progressive_index.rs

Pattern 4: Cryptographic Integrity

Sign segments and build tamper-evident audit trails.

use rvf_crypto::{sign_segment, verify_segment, create_witness_chain, WitnessEntry, shake256_256};
use ed25519_dalek::SigningKey;

// Sign a segment
let footer = sign_segment(&header, &payload, &signing_key);

// Verify signature
assert!(verify_segment(&header, &payload, &footer, &verifying_key));

// Build an audit trail
let entries = vec![WitnessEntry {
    prev_hash: [0; 32],
    action_hash: shake256_256(b"inserted 1000 vectors"),
    timestamp_ns: 1_700_000_000_000_000_000,
    witness_type: 0x01, // PROVENANCE
}];
let chain = create_witness_chain(&entries);

See: crypto_signing.rs

Pattern 5: Import from JSON / CSV / NumPy

Load embeddings from common formats without writing a parser.

use rvf_import::{import_json, import_csv, import_npy};

// From a JSON array of vectors
import_json("embeddings.json", &mut store)?;

// From a CSV file (one vector per row)
import_csv("embeddings.csv", &mut store)?;

// From a NumPy .npy file
import_npy("embeddings.npy", &mut store)?;

Pattern 6: Delete and Compact

Remove vectors by ID, then reclaim disk space.

// Delete specific vectors (marks as tombstones)
store.delete_batch(&[42, 99, 1001])?;

// Compact: rewrite the file without tombstoned data
store.compact()?;

Pattern 7: File Lineage (Parent → Child Derivation)

Create derived files that track their ancestry.

use rvf_types::DerivationType;

// Create a parent store
let parent = RvfStore::create("parent.rvf", options)?;

// Derive a filtered child — records parent's hash automatically
let child = parent.derive("child.rvf", DerivationType::Filter, None)?;
assert_eq!(child.lineage_depth(), 1);
assert_eq!(child.parent_id(), parent.file_id());

// Derive a grandchild
let grandchild = child.derive("grandchild.rvdna", DerivationType::Quantize, None)?;
assert_eq!(grandchild.lineage_depth(), 2);

Pattern 8: Embed a Computational Container

Pack a bootable kernel or eBPF program into the file.

use rvf_types::kernel::{KernelArch, KernelType};
use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType};

// Embed a unikernel — file can now boot as a standalone service
store.embed_kernel(KernelArch::X86_64, KernelType::HermitOs, &kernel_image, 8080)?;

// Embed an eBPF program — enables kernel-level acceleration
store.embed_ebpf(EbpfProgramType::Xdp, EbpfAttachType::XdpIngress, 384, &bytecode, &btf)?;

// Extract later
let (hdr, img) = store.extract_kernel()?.unwrap();
let (hdr, prog) = store.extract_ebpf()?.unwrap();
Tutorial: Your First RVF Store (Step by Step)

Step 1: Set Up

Create a new Rust project and add the dependency:

cargo new my_vectors
cd my_vectors

Add to Cargo.toml:

[dependencies]
rvf-runtime = { path = "../crates/rvf/rvf-runtime" }
tempfile = "3"

Step 2: Create a Store

use rvf_runtime::{RvfStore, RvfOptions, QueryOptions};
use rvf_runtime::options::DistanceMetric;
use tempfile::TempDir;

fn main() {
    let tmp = TempDir::new().unwrap();
    let path = tmp.path().join("my.rvf");

    let opts = RvfOptions {
        dimension: 128,
        metric: DistanceMetric::L2,
        ..Default::default()
    };
    let mut store = RvfStore::create(&path, opts).unwrap();

Step 3: Insert Vectors

Vectors are inserted in batches. Each vector needs a unique u64 ID.

    let vec_a = vec![0.1f32; 128];
    let vec_b = vec![0.2f32; 128];
    let vecs: Vec<&[f32]> = vec![&vec_a, &vec_b];
    let ids = vec![1u64, 2];

    let result = store.ingest_batch(&vecs, &ids, None).unwrap();
    println!("Accepted: {}, Rejected: {}", result.accepted, result.rejected);

Step 4: Query

    let query = vec![0.15f32; 128];
    let results = store.query(&query, 5, &QueryOptions::default()).unwrap();

    for r in &results {
        println!("  id={}, dist={:.6}", r.id, r.distance);
    }

Step 5: Verify Persistence

    store.close().unwrap();

    let reopened = RvfStore::open(&path).unwrap();
    let results2 = reopened.query(&query, 5, &QueryOptions::default()).unwrap();
    assert_eq!(results.len(), results2.len());
    println!("Persistence verified!");
}

Expected Output

Accepted: 2, Rejected: 0
  id=1, dist=0.064000
  id=2, dist=0.032000
Persistence verified!
Tutorial: Understanding Quantization Tiers

The Problem

A million 384-dim vectors at full precision (fp32) takes 1.5 GB of RAM. Not all vectors are accessed equally — most are rarely touched. Why keep them all at full precision?

The Solution: Temperature Tiering

RVF assigns vectors to three compression levels based on how often they're accessed:

Tier Access Pattern Compression Memory per Vector (384d)
Hot Frequently queried Scalar (fp32 -> u8) 384 bytes (4x smaller)
Warm Occasionally queried Product quantization 48 bytes (32x smaller)
Cold Rarely accessed Binary (1-bit) 48 bytes (32x smaller)
Raw No compression fp32 1,536 bytes

How It Works

1. Track access patterns using a Count-Min Sketch (a probabilistic counter):

let mut sketch = CountMinSketch::default_sketch();

// Every time a vector is accessed, increment its counter
sketch.increment(vector_id);

// Check how often a vector has been accessed
let count = sketch.estimate(vector_id);

2. Assign tiers based on configurable thresholds:

let tier = assign_tier(count);
// Hot:  count >= 100
// Warm: count >= 10
// Cold: count < 10

3. Encode at the appropriate level:

// Hot: Scalar (fast, low error)
let sq = ScalarQuantizer::train(&vectors);
let encoded = sq.encode_vec(&vector);  // 384 bytes

// Warm: Product (balanced)
let pq = ProductQuantizer::train(&vectors, 48, 64, 20);
let encoded = pq.encode_vec(&vector);  // 48 bytes

// Cold: Binary (smallest, approximate)
let bits = encode_binary(&vector);     // 48 bytes

Run the Example

cargo run --example quantization

You'll see a comparison table showing compression ratio, reconstruction error (MSE), and bytes per vector for each tier.

Tutorial: Building Witness Chains for Audit Trails

What Is a Witness Chain?

A witness chain is a tamper-evident log of events. Each entry links to the previous one through a cryptographic hash. If any entry is modified, all subsequent hash links break — making tampering detectable without a blockchain.

Chain Structure

  Entry 0 (genesis)         Entry 1                  Entry 2
+-------------------+   +-------------------+   +-------------------+
| prev_hash: 0x00.. |   | prev_hash: H(E0)  |   | prev_hash: H(E1)  |
| action:   H(data) |   | action:   H(data) |   | action:   H(data) |
| timestamp: T0     |   | timestamp: T1     |   | timestamp: T2     |
| type: PROVENANCE  |   | type: COMPUTATION |   | type: SEARCH      |
+-------------------+   +-------------------+   +-------------------+
        73 bytes                73 bytes                73 bytes
  • prev_hash: SHAKE-256 hash of the previous entry (zeroed for genesis)
  • action_hash: SHAKE-256 hash of whatever action is being recorded
  • timestamp_ns: Nanosecond UNIX timestamp
  • witness_type: What kind of event (see table below)

Witness Types

Code Name When to Use
0x01 PROVENANCE Data origin tracking (e.g., "loaded from model X")
0x02 COMPUTATION Operation recording (e.g., "built HNSW index")
0x03 SEARCH Query audit (e.g., "searched for query Q, got results R")
0x04 DELETION Deletion audit (e.g., "deleted vectors 1-100")
0x05 PLATFORM_ATTESTATION TEE attestation (e.g., "enclave measured as M")
0x06 KEY_BINDING Sealed key (e.g., "key K bound to enclave M")
0x07 COMPUTATION_PROOF Verified computation (e.g., "search ran inside enclave")
0x08 DATA_PROVENANCE Full chain (e.g., "model -> TEE -> RVF file")
0x09 DERIVATION File lineage derivation event
0x0A LINEAGE_MERGE Multi-parent lineage merge
0x0B LINEAGE_SNAPSHOT Lineage snapshot checkpoint
0x0C LINEAGE_TRANSFORM Lineage transform operation
0x0D LINEAGE_VERIFY Lineage verification event

Creating and Verifying

use rvf_crypto::{create_witness_chain, verify_witness_chain, WitnessEntry, shake256_256};

// Record three events
let entries = vec![
    WitnessEntry {
        prev_hash: [0; 32], // genesis
        action_hash: shake256_256(b"loaded embeddings from model-v2"),
        timestamp_ns: 1_700_000_000_000_000_000,
        witness_type: 0x01,
    },
    WitnessEntry {
        prev_hash: [0; 32], // filled by create_witness_chain
        action_hash: shake256_256(b"built HNSW index (M=16, ef=200)"),
        timestamp_ns: 1_700_000_001_000_000_000,
        witness_type: 0x02,
    },
    WitnessEntry {
        prev_hash: [0; 32],
        action_hash: shake256_256(b"query: top-10 for user request #42"),
        timestamp_ns: 1_700_000_002_000_000_000,
        witness_type: 0x03,
    },
];

let chain_bytes = create_witness_chain(&entries);
let verified = verify_witness_chain(&chain_bytes).unwrap();
assert_eq!(verified.len(), 3);

Tamper Detection

Flip any byte in the chain and verification fails:

let mut tampered = chain_bytes.clone();
tampered[100] ^= 0xFF; // flip one byte

assert!(verify_witness_chain(&tampered).is_err()); // detected!

Run the Example

cargo run --example crypto_signing

The example creates a 5-entry chain, verifies it, then demonstrates tamper and truncation detection.

Tutorial: Wire Format Deep Dive

Segment Header (64 bytes)

Every piece of data in an RVF file is wrapped in a self-describing segment. The header is always exactly 64 bytes:

Offset  Size  Field             Description
------  ----  -----             -----------
0x00    4     magic             0x52564653 ("RVFS")
0x04    1     version           Format version (currently 1)
0x05    1     seg_type          Segment type (VEC, INDEX, MANIFEST, ...)
0x06    2     flags             Bitfield (COMPRESSED, SIGNED, ATTESTED, ...)
0x08    8     segment_id        Monotonically increasing ID
0x10    8     payload_length    Byte length of payload
0x18    8     timestamp_ns      Nanosecond UNIX timestamp
0x20    1     checksum_algo     0=CRC32C, 1=XXH3-128, 2=SHAKE-256
0x21    1     compression       0=none, 1=LZ4, 2=ZSTD
0x22    2     reserved_0        Must be zero
0x24    4     reserved_1        Must be zero
0x28    16    content_hash      First 128 bits of payload hash
0x38    4     uncompressed_len  Original size before compression
0x3C    4     alignment_pad     Padding to 64-byte boundary

The 16 Segment Types

Code Name Purpose
0x01 VEC Raw vector embeddings
0x02 INDEX HNSW adjacency and routing tables
0x03 OVERLAY Graph overlay deltas
0x04 JOURNAL Metadata mutations, deletions
0x05 MANIFEST Segment directory, epoch state
0x06 QUANT Quantization dictionaries (scalar/PQ/binary)
0x07 META Key-value metadata
0x08 HOT Temperature-promoted data
0x09 SKETCH Access counter sketches (Count-Min)
0x0A WITNESS Audit trails, attestation proofs
0x0B PROFILE Domain profile declarations
0x0C CRYPTO Key material, signature chains
0x0D META_IDX Metadata inverted indexes
0x0E KERNEL Compressed unikernel image (self-booting)
0x0F EBPF eBPF program for kernel-level acceleration

Segment Flags

Bit Name Description
0 COMPRESSED Payload is compressed (LZ4 or ZSTD)
1 ENCRYPTED Payload is encrypted
2 SIGNED Signature footer follows payload
3 SEALED Immutable (compaction output)
4 PARTIAL Streaming / partial write
5 TOMBSTONE Logical deletion marker
6 HOT Temperature-promoted
7 OVERLAY Contains delta data
8 SNAPSHOT Full snapshot
9 CHECKPOINT Safe rollback point
10 ATTESTED Produced inside attested TEE
11 HAS_LINEAGE File carries FileIdentity lineage data

Crash Safety: Two-fsync Protocol

RVF doesn't need a write-ahead log. Instead:

  1. Write data segment + payload, then fsync
  2. Write MANIFEST_SEG with updated state, then fsync

If the process crashes between fsyncs, the incomplete segment has no manifest reference — it's ignored on recovery. Simple, safe, fast.

Tail-Scan

To find the current state, scan backward from the end of the file for the latest MANIFEST_SEG. The root manifest fits in 4 KB, so cold boot takes < 5 ms.

Run the Example

cargo run --example wire_format

You'll see three segments written, read back, hash-validated, corruption detected, and a tail-scan for the manifest.

Tutorial: Metadata Filtering Patterns

Available Filter Expressions

Expression Syntax Description
Eq FilterExpr::Eq(field_id, value) Exact match
Ne FilterExpr::Ne(field_id, value) Not equal
Gt FilterExpr::Gt(field_id, value) Greater than
Lt FilterExpr::Lt(field_id, value) Less than
Range FilterExpr::Range(field_id, low, high) Value in [low, high)
In FilterExpr::In(field_id, values) Value is one of
And FilterExpr::And(vec![...]) All conditions must match
Or FilterExpr::Or(vec![...]) Any condition matches

Metadata Types

Type Rust Use Case
String MetadataValue::String("cat".into()) Categories, labels, tags
U64 MetadataValue::U64(95) Scores, counts, timestamps
Bytes MetadataValue::Bytes(vec![...]) Binary data, hashes

Common Patterns

Category filter:

FilterExpr::Eq(0, FilterValue::String("science".into()))

Score range:

FilterExpr::Range(1, FilterValue::U64(30), FilterValue::U64(90))

Multi-category:

FilterExpr::In(0, vec![
    FilterValue::String("science".into()),
    FilterValue::String("tech".into()),
])

Combined (AND):

FilterExpr::And(vec![
    FilterExpr::Eq(0, FilterValue::String("science".into())),
    FilterExpr::Gt(1, FilterValue::U64(80)),
])

Run the Example

cargo run --example filtered_search

The example creates 500 vectors with category and score metadata, then runs 7 different filter queries showing selectivity and verification.

Tutorial: Progressive Index Recall Measurement

What Is Recall?

Recall@K measures how many of the true K nearest neighbors your approximate algorithm actually returns. A recall of 0.95 means 95% of results are correct.

recall@K = |approximate_results ∩ exact_results| / K

How Progressive Indexing Achieves This

RVF builds an HNSW (Hierarchical Navigable Small World) graph, then splits it into three loadable layers:

Layer A: Coarse Routing

  • Entry points (topmost HNSW nodes)
  • Partition centroids for guided search
  • Loads in microseconds
  • Recall: ~0.40-0.70

Layer B: Hot Region

  • Adjacency lists for the most frequently accessed vectors
  • Covers the "working set" of your data
  • Recall: ~0.70-0.85

Layer C: Full Graph

  • Complete HNSW adjacency for all vectors
  • Loaded in background while queries are already being served
  • Recall: >= 0.95

Measuring Recall in the Example

The progressive_index example:

  1. Generates 5,000 vectors (128 dims)
  2. Builds the full HNSW graph (M=16, ef_construction=200)
  3. Splits into Layer A, B, C
  4. Runs 50 queries at each stage
  5. Computes recall@10 against brute-force ground truth
cargo run --example progressive_index

Expected output:

=== Recall Progression Summary ===
        Layers  Recall@10
  A only         0.xxx
  A + B          0.xxx
  A + B + C      0.9xx

Tuning ef_search

The ef_search parameter controls how many candidates HNSW explores during search. Higher values improve recall at the cost of latency:

ef_search Recall@10 Relative Speed
10 ~0.75 Fastest
50 ~0.90 Balanced
200 ~0.97 Most accurate
Technical Reference: Signature Footer Format

When the SIGNED flag is set on a segment, a signature footer follows the payload:

Offset Size Field
0x00 2 sig_algo (0=Ed25519, 1=ML-DSA-65, 2=SLH-DSA-128s)
0x02 2 sig_length
0x04 var signature (64 to 7,856 bytes)
var 4 footer_length (for backward scan)

Supported Algorithms

Algorithm Signature Size Security Level Standard
Ed25519 64 bytes 128-bit classical RFC 8032
ML-DSA-65 3,309 bytes NIST Level 3 (post-quantum) FIPS 204
SLH-DSA-128s 7,856 bytes NIST Level 1 (post-quantum, stateless) FIPS 205

Signing Flow

  1. Serialize the segment header (64 bytes) and payload into a signing buffer
  2. Compute SHAKE-256 hash of the buffer
  3. Sign the hash with the chosen algorithm
  4. Append the signature footer after the payload (before padding)
  5. Set the SIGNED flag in the header

Verification Flow

  1. Read segment header and payload
  2. Recompute SHAKE-256 hash of header + payload
  3. Read signature footer (scan backward from segment end using footer_length)
  4. Verify signature against the public key
Technical Reference: Confidential Core Attestation

Overview

RVF can record hardware TEE (Trusted Execution Environment) attestation quotes alongside vector data. This provides cryptographic proof that:

  • The platform is genuine (e.g., real Intel SGX hardware)
  • The code running inside the enclave matches a known measurement
  • Encryption keys are sealed to the enclave identity
  • Vector operations were computed inside the secure environment

Supported TEE Platforms

Platform Enum Value Quote Format
Intel SGX TeePlatform::Sgx (0) DCAP attestation quote
AMD SEV-SNP TeePlatform::SevSnp (1) VCEK attestation report
Intel TDX TeePlatform::Tdx (2) TD quote
ARM CCA TeePlatform::ArmCca (3) CCA token
Software (testing) TeePlatform::SoftwareTee (0xFE) Synthetic (no hardware)

Attestation Header (112 bytes, repr(C))

Offset  Size  Field
------  ----  -----
0x00    1     platform           TeePlatform enum value
0x01    1     attestation_type   AttestationWitnessType enum value
0x02    4     quote_length       Length of the platform-specific quote
0x06    2     reserved
0x08    32    measurement        SHAKE-256 hash of enclave code
0x28    32    signer_id          SHAKE-256 hash of signing identity
0x48    8     timestamp_ns       Nanosecond UNIX timestamp
0x50    16    nonce              Anti-replay nonce
0x60    2     svn                Security Version Number
0x62    1     sig_algo           Signature algorithm for the quote
0x63    1     flags              Attestation flags
0x64    4     report_data_len    Length of additional report data
0x68    8     reserved

Attestation Types

Type Witness Code Purpose
Platform Attestation 0x05 TEE identity + measurement verification
Key Binding 0x06 Keys sealed to enclave measurement
Computation Proof 0x07 Proof that operations ran inside enclave
Data Provenance 0x08 Full chain: model -> TEE -> RVF file

ATTESTED Segment Flag

Any segment produced inside a TEE should set bit 10 (ATTESTED) in the segment header flags. This enables fast scanning to identify attested segments without parsing payloads.

QuoteVerifier Trait

The verification interface is pluggable:

pub trait QuoteVerifier {
    fn platform(&self) -> TeePlatform;
    fn verify_quote(
        &self,
        quote: &[u8],
        report_data: &[u8],
        expected_measurement: &[u8; 32],
    ) -> Result<(), String>;
}

Implement this trait for your TEE platform to enable hardware-backed verification. The SoftwareTee variant allows testing without real hardware.

Technical Reference: Computational Container (Self-Booting RVF)

Three-Tier Execution Model

RVF files can optionally carry executable compute alongside vector data:

Tier Segment Size Environment Boot Time Use Case
1: WASM WASM_SEG (existing) 5.5 KB Browser, edge, IoT <1 ms Portable queries everywhere
2: eBPF EBPF_SEG (0x0F) 10-50 KB Linux kernel (XDP, TC) <20 ms Sub-microsecond hot cache hits
3: Unikernel KERNEL_SEG (0x0E) 200 KB - 2 MB Firecracker, TEE, bare metal <125 ms Zero-dependency self-booting service

KernelHeader (128 bytes)

Field Size Description
kernel_magic 4 0x52564B4E ("RVKN")
header_version 2 Currently 1
kernel_arch 1 x86_64 (0), AArch64 (1), RISC-V (2), WASM (3)
kernel_type 1 HermitOS (0), Unikraft (1), Custom (2), TestStub (0xFE)
image_size 4 Uncompressed kernel size
compressed_size 4 Compressed (ZSTD) size
image_hash 32 SHAKE-256-256 of uncompressed image
api_port 2 HTTP API port (network byte order)
api_transport 1 HTTP (0), gRPC (1), virtio-vsock (2)
kernel_flags 8 Feature flags (read-only, metrics, TEE, etc.)
cmdline_len 2 Length of kernel command line

EbpfHeader (64 bytes)

Field Size Description
ebpf_magic 4 0x52564250 ("RVBP")
program_type 1 XDP (0), TC (1), Tracepoint (2), Socket (3)
attach_type 1 XdpIngress (0), TcIngress (1), etc.
max_dimension 4 Maximum vector dimension (eBPF verifier loop bound)
bytecode_size 4 Size of BPF ELF object
btf_size 4 Size of BTF section
map_count 4 Number of BPF maps

Embedding and Extracting

use rvf_runtime::RvfStore;
use rvf_types::kernel::{KernelArch, KernelType};
use rvf_types::ebpf::{EbpfProgramType, EbpfAttachType};

let mut store = RvfStore::open("vectors.rvf")?;

// Embed a kernel
store.embed_kernel(KernelArch::X86_64, KernelType::HermitOs, &image, 8080)?;

// Embed an eBPF program
store.embed_ebpf(EbpfProgramType::Xdp, EbpfAttachType::XdpIngress, 384, &bytecode, &btf)?;

// Extract later
let (kernel_hdr, kernel_img) = store.extract_kernel()?.unwrap();
let (ebpf_hdr, ebpf_prog) = store.extract_ebpf()?.unwrap();

Forward Compatibility

Files with KERNEL_SEG or EBPF_SEG work with older readers -- unknown segment types are skipped per the RVF forward-compatibility rule. The computational capability is purely additive.

See ADR-030 for the full specification.

Technical Reference: DNA-Style Lineage Provenance

How Lineage Works

Every RVF file carries a 68-byte FileIdentity in its root manifest:

Field Size Description
file_id 16 Unique UUID for this file
parent_id 16 UUID of the parent file (all zeros for root)
parent_hash 32 SHAKE-256-256 of parent's manifest
lineage_depth 4 Generation count (0 for root)

Derivation Chain

Parent.rvf ──derive()──> Child.rvf ──derive()──> Grandchild.rvdna
  file_id: A               file_id: B               file_id: C
  parent_id: [0;16]         parent_id: A              parent_id: B
  parent_hash: [0;32]       parent_hash: hash(A)      parent_hash: hash(B)
  depth: 0                  depth: 1                  depth: 2

Derivation Types

Code Type Description
0 Clone Exact copy
1 Filter Subset of parent's vectors
2 Merge Multi-parent merge
3 Quantize Re-quantized version
4 Reindex Re-indexed with different parameters
5 Transform Transformed embeddings
6 Snapshot Point-in-time snapshot
0xFF UserDefined Application-specific derivation

Using the API

use rvf_runtime::RvfStore;
use rvf_types::DerivationType;

let parent = RvfStore::create("parent.rvf", options)?;

// Derive a filtered child
let child = parent.derive("child.rvf", DerivationType::Filter, None)?;
assert_eq!(child.lineage_depth(), 1);
assert_eq!(child.parent_id(), parent.file_id());

Domain Extensions

Extension Domain Profile Optimized For
.rvf Generic General-purpose vectors
.rvdna RVDNA Genomic sequence embeddings
.rvtext RVText Language model embeddings
.rvgraph RVGraph Graph/network node embeddings
.rvvis RVVision Image/vision model embeddings

See ADR-029 for the full format specification.

Technical Reference: Crate Architecture

Crate Map

                    +-----------------------------------------+
                    |         Cognitive Layer                   |
                    |  ruvllm | gnn | ruQu | attention | sona  |
                    |  mincut | prime-radiant | nervous-system |
                    +---+-------------+---------------+-------+
                        |             |               |
                    +-----------------------------------------+
                    |           Application Layer              |
                    |  claude-flow | agentdb | agentic-flow    |
                    |  ospipe | rvlite | sona | your-app      |
                    +---+-------------+---------------+-------+
                        |             |               |
                    +---v-------------v---------------v-------+
                    |           RVF SDK Layer                   |
                    |  rvf-runtime | rvf-index | rvf-quant      |
                    |  rvf-manifest | rvf-crypto | rvf-wire     |
                    +---+-------------+---------------+-------+
                        |             |               |
               +--------v------+ +---v--------+ +----v-------+ +----v------+
               |  rvf-server   | |  rvf-node  | |  rvf-wasm  | |  rvf-cli  |
               |  HTTP + TCP   | |  N-API     | |  ~46 KB    | |  clap     |
               +---------------+ +------------+ +------------+ +-----------+

Crate Details

Crate Lines no_std Purpose
rvf-types 3,184 Yes Segment types, kernel/eBPF headers, lineage, enums
rvf-wire 2,011 Yes Wire format read/write, hash validation
rvf-manifest 1,580 No Two-level manifest with 4 KB root, FileIdentity codec
rvf-index 2,691 No HNSW progressive indexing (Layer A/B/C)
rvf-quant 1,443 No Scalar, product, and binary quantization
rvf-crypto 1,725 Partial SHAKE-256, Ed25519, witness chains, attestation, lineage
rvf-runtime 3,607 No Full store API, compaction, lineage, kernel/eBPF embed
rvf-import 980 No JSON, CSV, NumPy (.npy) importers
rvf-wasm 1,616 Yes WASM control plane: in-memory store, query, segment inspection
rvf-node 852 No Node.js N-API bindings with lineage, kernel/eBPF, inspection
rvf-cli 665 No Unified CLI: create, ingest, query, delete, status, inspect, compact, derive, serve
rvf-server 1,165 No HTTP REST + TCP streaming server

Library Adapters

Adapter Purpose Key Feature
rvf-adapter-claude-flow AI agent memory WITNESS_SEG audit trails
rvf-adapter-agentdb Agent vector database Progressive HNSW indexing
rvf-adapter-ospipe Observation-State pipeline META_SEG for state vectors
rvf-adapter-agentic-flow Swarm coordination Inter-agent memory sharing
rvf-adapter-rvlite Lightweight embedded store Minimal API, edge-friendly
rvf-adapter-sona Neural architecture Experience replay + trajectories
Technical Reference: File Format Specification

File Extension

Extension Usage
.rvf Standard RuVector Format file
.rvf.cold.N Cold shard N (multi-file mode)
.rvf.idx.N Index shard N (multi-file mode)

MIME Type

application/x-ruvector-format

Magic Number

0x52564653 (ASCII: "RVFS")

Byte Order

All multi-byte integers are little-endian.

Alignment

All segments are 64-byte aligned (cache-line friendly). Payloads are padded to the next 64-byte boundary.

Root Manifest

The root manifest (Level 0) occupies the last 4,096 bytes of the most recent MANIFEST_SEG. This enables instant location via backward scan:

let (offset, header) = find_latest_manifest(&file_data)?;

The root manifest provides:

  • Segment directory (offsets to all segments)
  • Hotset pointers (entry points, top layer, centroids, quant dicts)
  • Epoch counter
  • Vector count and dimension
  • Profile identifiers

Domain Profiles

Profile Code Optimized For
Generic 0x00 General-purpose vectors
RVDNA 0x01 Genomic sequence embeddings
RVText 0x02 Language model embeddings
RVGraph 0x03 Graph/network node embeddings
RVVision 0x04 Image/vision model embeddings
Building from Source

Prerequisites

  • Rust 1.87+ via rustup (rustup update stable)
  • For WASM: rustup target add wasm32-unknown-unknown
  • For Node.js bindings: Node.js 18+ and npm

Build Examples

cd examples/rvf
cargo build

Build All RVF Crates

cd crates/rvf
cargo build --workspace

Run All Tests

cd crates/rvf
cargo test --workspace

Run Clippy

cd crates/rvf
cargo clippy --all-targets --workspace --exclude rvf-wasm

Build WASM Microkernel

cd crates/rvf
cargo build --target wasm32-unknown-unknown -p rvf-wasm --release
ls target/wasm32-unknown-unknown/release/rvf_wasm.wasm

Build Node.js Bindings

cd crates/rvf/rvf-node
npm install && npm run build

Run Benchmarks

cd crates/rvf
cargo bench --bench rvf_benchmarks

Project Structure
examples/rvf/
  Cargo.toml                  # Standalone workspace
  src/lib.rs                  # Shared utilities
  examples/
    # Core (6)
    basic_store.rs            # Store lifecycle, insert, query, persistence
    progressive_index.rs      # Three-layer HNSW, recall measurement
    quantization.rs           # Scalar, product, binary quantization + tiering
    wire_format.rs            # Raw segment I/O, hash validation, tail-scan
    crypto_signing.rs         # Ed25519 signing, witness chains, tamper detection
    filtered_search.rs        # Metadata-filtered vector search
    # Agentic AI (6)
    agent_memory.rs           # Persistent agent memory + witness audit
    swarm_knowledge.rs        # Multi-agent shared knowledge base
    reasoning_trace.rs        # Chain-of-thought with lineage derivation
    tool_cache.rs             # Tool call result cache with TTL + compaction
    agent_handoff.rs          # Transfer agent state between instances
    experience_replay.rs      # RL experience replay buffer
    # Practical Production (5)
    semantic_search.rs        # Document search engine (4 filter workflows)
    recommendation.rs         # Item recommendations (collaborative filtering)
    rag_pipeline.rs           # Retrieval-augmented generation pipeline
    embedding_cache.rs        # LRU cache with temperature tiering
    dedup_detector.rs         # Near-duplicate detection + compaction
    # Vertical Domains (4)
    genomic_pipeline.rs       # DNA k-mer search (.rvdna profile)
    financial_signals.rs      # Market signals with attestation
    medical_imaging.rs        # Radiology embedding search (.rvvis)
    legal_discovery.rs        # Legal document similarity (.rvtext)
    # Exotic Capabilities (5)
    self_booting.rs           # RVF with embedded unikernel
    ebpf_accelerator.rs       # eBPF hot-path acceleration
    hyperbolic_taxonomy.rs    # Hierarchy-aware search
    multimodal_fusion.rs      # Cross-modal text + image search
    sealed_engine.rs          # Full cognitive engine (capstone)
    # Runtime Targets + Postgres (5)
    browser_wasm.rs           # Browser-side WASM vector search
    edge_iot.rs               # IoT device with binary quantization
    serverless_function.rs    # Cold-start optimized for Lambda
    ruvllm_inference.rs       # LLM KV cache + LoRA via RVF
    postgres_bridge.rs        # PostgreSQL ↔ RVF export/import
    # Network & Security (4)
    network_sync.rs           # Peer-to-peer vector store sync
    tee_attestation.rs        # TEE attestation + sealed keys
    access_control.rs         # Role-based vector access control
    zero_knowledge.rs         # Zero-knowledge proof integration
    # Autonomous Agent (1)
    ruvbot.rs                 # Autonomous RVF-powered agent bot
    # POSIX & Systems (3)
    posix_fileops.rs          # POSIX file operations with RVF
    linux_microkernel.rs      # Linux microkernel distribution
    mcp_in_rvf.rs             # MCP server embedded in RVF
    # Network Operations (1)
    network_interfaces.rs     # Network OS telemetry (60 interfaces)

Learn More

Resource Description
RVF Format Specification Full format documentation, architecture, and API reference
ADR-029 Architecture decision record for the canonical format
ADR-030 Computational container (KERNEL_SEG, EBPF_SEG) specification
ADR-031 Example repository design (this collection of 40 examples)
Benchmarks Performance benchmarks (HNSW build, quantization, wire I/O)
Integration Tests E2E test suite (progressive recall, quantization, wire interop)

Contributing

git clone https://github.com/ruvnet/ruvector
cd ruvector/examples/rvf
cargo build && cargo run --example basic_store

All contributions must pass cargo clippy with zero warnings and maintain the existing test count (currently 543+).

License

Dual-licensed under MIT or Apache-2.0 at your option.


Built with Rust. One file — store it, send it, run it.