Build RAG applications without cloud dependencies. All processing happens client-side using WebAssembly and WebGPU.
- 🔒 Private: Data never leaves the browser
- ⚡ Fast: WebGPU acceleration with WASM fallback
- 💰 Free: Zero cloud costs, no API keys
- 📦 Complete: Vector DB + embeddings + LLM + RAG
- 🌐 Offline: Works without internet after initial load
npm install havenimport Haven from 'haven';
// Create database
const db = new Haven({
storage: { dbName: 'my-app' },
index: { dimensions: 384, metric: 'cosine' },
embedding: { model: 'Xenova/all-MiniLM-L6-v2', device: 'wasm' },
});
await db.initialize();
// Add documents
await db.insert({
text: 'Haven is a privacy-first vector database for browsers',
metadata: { category: 'intro' },
});
// Semantic search
const results = await db.search({
text: 'What is Haven?',
k: 5,
});
console.log(results);- Vector Search: Store and search 100K+ documents with semantic understanding
- Local Embeddings: Transformers.js integration with WebGPU acceleration
- Private RAG: Complete retrieval-augmented generation pipeline
- Dual LLM Support: WebLLM (WebGPU) + Wllama (WASM) with automatic fallback
- MCP Integration: Works with Claude Desktop and AI agent ecosystems
- Easy Integration: Clean TypeScript API with full type safety
Privacy-Critical Applications
- Legal tech, healthcare, finance
- GDPR-compliant by design
- Attorney-client privilege protection
Offline-First Apps
- Browser extensions
- Electron applications
- Progressive web apps
Cost-Sensitive Projects
- Zero cloud infrastructure costs
- No API rate limits
- Unlimited usage
// Semantic search with metadata filtering
const results = await db.search({
text: 'machine learning concepts',
k: 10,
filter: { field: 'category', operator: 'eq', value: 'AI' },
});import { RAGPipelineManager, WllamaProvider } from 'haven';
// Setup RAG with local LLM
const llm = new WllamaProvider({ model: '...' });
const rag = new RAGPipelineManager(db, llm, embedding);
// Ask questions with context
const result = await rag.query('What is machine learning?', {
topK: 3,
generateOptions: { maxTokens: 256, temperature: 0.7 },
});
console.log(result.answer);
console.log(result.sources); // Citationsimport { MCPServer } from 'haven';
// Expose as MCP tools for AI agents
const mcp = new MCPServer(db, rag);
// Use with Claude Desktop, ChatGPT, etc.
const tools = mcp.getTools();- Quickstart Guide - Get up and running in 5 minutes
- API Reference - Complete API documentation
- Examples - Code examples and demos
- RAG Pipeline Tutorial - Build RAG applications
- MCP Integration Guide - Integrate with AI assistants
- Performance Tuning - Optimize for production
- Troubleshooting - Common issues and solutions
- Testing Guide - Testing strategies
Haven combines multiple technologies into a cohesive stack:
- Storage: IndexedDB for persistent vector storage
- Indexing: Voy (WASM) for fast k-d tree search
- Embeddings: Transformers.js with WebGPU/WASM
- LLMs: WebLLM (WebGPU) and Wllama (WASM)
- Protocol: MCP for AI agent integration
All components run entirely in the browser with zero server dependencies.
graph TB
subgraph "Application Layer"
APP[User Application]
MCP[MCP Interface]
end
subgraph "API Layer"
API[VectorDB API]
RAG[RAG Pipeline Manager]
end
subgraph "Embedding Layer"
TJS[Transformers.js]
CACHE[Model Cache]
end
subgraph "LLM Layer"
WLLAMA[wllama - WASM]
WEBLLM[WebLLM - WebGPU]
end
subgraph "Index Layer"
VOY[Voy WASM Engine]
IDX[Index Manager]
end
subgraph "Storage Layer"
IDB[IndexedDB]
STORE[Storage Manager]
end
APP --> API
MCP --> API
API --> RAG
API --> IDX
RAG --> TJS
RAG --> WLLAMA
RAG --> WEBLLM
TJS --> CACHE
IDX --> VOY
IDX --> STORE
STORE --> IDB
VOY --> IDB
| Operation | Latency | Throughput |
|---|---|---|
| Search (10K vectors) | <50ms | - |
| Insert (batch) | - | 2000+ docs/sec |
| Embedding generation | 50-200ms | - |
| RAG query (full) | 500ms-5s | - |
Benchmarks on Chrome 120, M1 MacBook Pro
Haven powers privacy-tier features at Lexemo, processing sensitive legal documents for EU law firms without cloud transmission.
# Install dependencies
npm install
# Run tests
npm test
# Run tests in watch mode
npm run test:watch
# Run integration tests (requires network)
npm run test:integration
# Build library
npm run build
# Type check
npm run type-check
# Run benchmarks
npm run benchmark- Chrome/Edge: 90+ (WebGPU: 113+)
- Firefox: 88+ (WebGPU: Not yet supported)
- Safari: 14+ (WebGPU: Not yet supported)
Requirements:
- IndexedDB support
- WebAssembly support
- ES2020+ JavaScript
Optional:
- WebGPU for accelerated inference
- SharedArrayBuffer for multi-threading
- Python bindings (PyScript)
- React hooks package
- Hybrid search (dense + sparse)
- Multi-modal embeddings (CLIP)
- Quantization in browser
Contributions welcome! Please open an issue or PR.
MIT © 2024
Documentation • Examples • GitHub
Built with ❤️ for privacy-conscious developers
