Optimize Browser VectorDB for production workloads. This guide covers performance optimization techniques, benchmarking, and best practices.
Browser VectorDB performance depends on several factors:
- Dataset size: Number of vectors and dimensions
- Hardware: CPU, GPU, memory availability
- Configuration: Index type, cache settings, device selection
- Usage patterns: Batch vs individual operations, query frequency
WebGPU provides significant speedup for embedding generation:
// Check WebGPU support
const hasWebGPU = 'gpu' in navigator;
const db = new VectorDB({
storage: { dbName: 'my-app' },
index: { indexType: 'kdtree', dimensions: 384, metric: 'cosine' },
embedding: {
model: 'Xenova/all-MiniLM-L6-v2',
device: hasWebGPU ? 'webgpu' : 'wasm', // Use GPU if available
},
});Performance Impact:
- Embedding generation: 3-5x faster
- Batch operations: 5-10x faster
Cache models and embeddings to avoid recomputation:
const db = new VectorDB({
storage: { dbName: 'my-app' },
index: { indexType: 'kdtree', dimensions: 384, metric: 'cosine' },
embedding: {
model: 'Xenova/all-MiniLM-L6-v2',
device: 'wasm',
cache: true, // Enable model caching
},
});Performance Impact:
- First load: Downloads model (~23MB)
- Subsequent loads: <1 second from cache
Batch operations are significantly faster than individual operations:
// ✅ Fast - Single transaction
await db.insertBatch(documents);
// ❌ Slow - Multiple transactions
for (const doc of documents) {
await db.insert(doc);
}Performance Impact:
- 10-50x faster for large batches
- Reduced IndexedDB overhead
Smaller dimensions = faster operations:
// Fast: 384 dimensions (all-MiniLM-L6-v2)
embedding: { model: 'Xenova/all-MiniLM-L6-v2' } // 384d
// Slower: 768 dimensions (larger models)
embedding: { model: 'Xenova/bge-base-en-v1.5' } // 768dPerformance Impact:
- Search latency: 2x faster with 384d vs 768d
- Memory usage: 2x less with 384d vs 768d
Only retrieve what you need:
// ✅ Good - Reasonable limit
const results = await db.search({ text: query, k: 10 });
// ❌ Bad - Unnecessary overhead
const results = await db.search({ text: query, k: 1000 });storage: {
dbName: 'my-app',
maxVectors: 100000, // Set reasonable limit
}Recommendations:
- Set
maxVectorsbased on expected dataset size - Monitor storage quota usage
- Export and archive old data periodically
index: {
indexType: 'kdtree',
dimensions: 384,
metric: 'cosine', // Fastest for normalized embeddings
}Metric Performance:
cosine: Fastest for normalized embeddings (recommended)dot: Fast, but requires normalized vectorseuclidean: Slower, use only if needed
embedding: {
model: 'Xenova/all-MiniLM-L6-v2', // Small, fast model
device: 'webgpu', // Use GPU if available
cache: true, // Cache model
quantized: false, // Quantization trades quality for speed
}Model Selection:
| Model | Dimensions | Size | Speed | Quality |
|---|---|---|---|---|
| all-MiniLM-L6-v2 | 384 | 23MB | Fast | Good |
| bge-small-en-v1.5 | 384 | 33MB | Fast | Better |
| bge-base-en-v1.5 | 768 | 109MB | Slower | Best |
Use LRU cache to limit memory usage:
import { LRUCache } from '@vectordb/browser-vectordb';
const cache = new LRUCache<string, Float32Array>({
max: 1000, // Maximum items
maxSize: 100 * 1024 * 1024, // 100MB limit
sizeCalculation: (v) => v.byteLength,
dispose: (value, key) => {
console.log(`Evicted ${key}`);
},
});
// Use cache
cache.set('key1', vector);
const cached = cache.get('key1');Monitor and manage memory usage:
import { MemoryManager } from '@vectordb/browser-vectordb';
const memoryManager = new MemoryManager({
maxMemoryMB: 500, // Maximum memory usage
evictionThreshold: 0.9, // Evict at 90% usage
});
// Check memory usage
const usage = memoryManager.getMemoryUsage();
console.log(`Memory: ${usage.usedMB}MB / ${usage.totalMB}MB`);
// Evict if needed
if (memoryManager.checkMemoryPressure()) {
await memoryManager.evictIfNeeded();
}Load large datasets progressively:
import { ProgressiveLoader } from '@vectordb/browser-vectordb';
const loader = new ProgressiveLoader(db);
// Load in chunks
for await (const chunk of loader.loadVectorsInChunks(1000)) {
console.log(`Loaded ${chunk.length} vectors`);
// Process chunk
}Apply metadata filters during search, not after:
// ✅ Good - Filter during search
const results = await db.search({
text: query,
k: 10,
filter: {
field: 'category',
operator: 'eq',
value: 'tech',
},
});
// ❌ Bad - Filter after search
const allResults = await db.search({ text: query, k: 100 });
const filtered = allResults.filter(r => r.metadata.category === 'tech');Performance Impact:
- 2-10x faster with early filtering
- Reduced memory usage
Cache frequently used query embeddings:
const queryCache = new Map<string, Float32Array>();
async function search(query: string, k: number) {
// Check cache
let vector = queryCache.get(query);
if (!vector) {
// Generate and cache
vector = await db.embedding.embed(query);
queryCache.set(query, vector);
}
// Search with cached vector
return await db.search({ vector, k });
}Process multiple queries in batch:
// Generate embeddings in batch
const queries = ['query 1', 'query 2', 'query 3'];
const vectors = await db.embedding.embedBatch(queries);
// Search in parallel
const results = await Promise.all(
vectors.map(vector => db.search({ vector, k: 10 }))
);Always use batch operations for multiple documents:
// ✅ Fast - Batch insert
const documents = [...]; // 1000 documents
await db.insertBatch(documents);
// ❌ Slow - Individual inserts
for (const doc of documents) {
await db.insert(doc);
}Performance Comparison:
| Documents | Individual | Batch | Speedup |
|---|---|---|---|
| 100 | 5.2s | 0.3s | 17x |
| 1,000 | 52s | 2.1s | 25x |
| 10,000 | 520s | 18s | 29x |
Generate embeddings in advance for faster insertion:
// Generate embeddings in batch
const texts = documents.map(d => d.text);
const vectors = await db.embedding.embedBatch(texts);
// Insert with pre-computed vectors
const records = documents.map((doc, i) => ({
vector: vectors[i],
metadata: doc.metadata,
}));
await db.insertBatch(records);For bulk inserts, consider rebuilding the index after:
// Insert without index updates (if supported)
await db.insertBatch(documents, { updateIndex: false });
// Rebuild index once
await db.rebuildIndex();Balance context quality and performance:
// ✅ Good - Balanced
const result = await rag.query(question, { topK: 3 });
// ❌ Too few - Missing context
const result = await rag.query(question, { topK: 1 });
// ❌ Too many - Slow, noisy
const result = await rag.query(question, { topK: 50 });Recommendations:
- Start with
topK: 3-5 - Increase if answers lack context
- Decrease if generation is slow
Use streaming for better perceived performance:
const stream = rag.queryStream(question, { topK: 3 });
for await (const chunk of stream) {
if (chunk.type === 'generation') {
// Display immediately
process.stdout.write(chunk.content);
}
}Cache answers for common questions:
const ragCache = new Map<string, RAGResult>();
async function queryWithCache(question: string) {
// Check cache
if (ragCache.has(question)) {
return ragCache.get(question);
}
// Query and cache
const result = await rag.query(question);
ragCache.set(question, result);
return result;
}Use the built-in benchmark suite:
import { Benchmark } from '@vectordb/browser-vectordb';
const benchmark = new Benchmark(db);
// Search latency
const searchResults = await benchmark.runSearchBenchmark([
1000, 5000, 10000, 50000
]);
searchResults.forEach(r => {
console.log(`${r.size} vectors: ${r.avgLatency}ms (p95: ${r.p95Latency}ms)`);
});
// Insert throughput
const insertResult = await benchmark.runInsertBenchmark(10000);
console.log(`Insert throughput: ${insertResult.throughput} docs/sec`);
// Memory usage
const memoryResult = await benchmark.runMemoryBenchmark();
console.log(`Memory usage: ${memoryResult.usedMB}MB`);Create custom benchmarks for your use case:
async function benchmarkSearch(db: VectorDB, queries: string[], k: number) {
const latencies: number[] = [];
for (const query of queries) {
const start = performance.now();
await db.search({ text: query, k });
const latency = performance.now() - start;
latencies.push(latency);
}
const avg = latencies.reduce((a, b) => a + b, 0) / latencies.length;
const p95 = latencies.sort()[Math.floor(latencies.length * 0.95)];
return { avg, p95, min: Math.min(...latencies), max: Math.max(...latencies) };
}
const results = await benchmarkSearch(db, testQueries, 10);
console.log('Search performance:', results);Monitor key performance metrics:
class PerformanceMonitor {
private metrics = {
searchLatency: [] as number[],
insertLatency: [] as number[],
cacheHits: 0,
cacheMisses: 0,
};
recordSearch(latency: number) {
this.metrics.searchLatency.push(latency);
}
recordInsert(latency: number) {
this.metrics.insertLatency.push(latency);
}
recordCacheHit() {
this.metrics.cacheHits++;
}
recordCacheMiss() {
this.metrics.cacheMisses++;
}
getStats() {
const avgSearch = this.average(this.metrics.searchLatency);
const avgInsert = this.average(this.metrics.insertLatency);
const cacheHitRate = this.metrics.cacheHits /
(this.metrics.cacheHits + this.metrics.cacheMisses);
return {
avgSearchLatency: avgSearch.toFixed(2) + 'ms',
avgInsertLatency: avgInsert.toFixed(2) + 'ms',
cacheHitRate: (cacheHitRate * 100).toFixed(1) + '%',
};
}
private average(arr: number[]): number {
return arr.reduce((a, b) => a + b, 0) / arr.length;
}
}
const monitor = new PerformanceMonitor();
// Track operations
const start = performance.now();
await db.search({ text: query, k: 10 });
monitor.recordSearch(performance.now() - start);
// View stats
console.log(monitor.getStats());Use browser DevTools for profiling:
- Performance Tab: Record and analyze operations
- Memory Tab: Monitor memory usage and detect leaks
- Network Tab: Check model download times
- Console: Log timing information
// Add timing logs
console.time('search');
await db.search({ text: query, k: 10 });
console.timeEnd('search');
// Memory usage
if (performance.memory) {
console.log('Memory:', {
used: (performance.memory.usedJSHeapSize / 1024 / 1024).toFixed(2) + 'MB',
total: (performance.memory.totalJSHeapSize / 1024 / 1024).toFixed(2) + 'MB',
});
}// ✅ Good - Small, fast model for most use cases
embedding: { model: 'Xenova/all-MiniLM-L6-v2' }
// ⚠️ Use larger models only if quality is critical
embedding: { model: 'Xenova/bge-base-en-v1.5' }// ✅ Good - Reasonable batch size
const batchSize = 100;
for (let i = 0; i < documents.length; i += batchSize) {
const batch = documents.slice(i, i + batchSize);
await db.insertBatch(batch);
}
// ❌ Bad - Too large (memory issues)
await db.insertBatch(documents); // 100,000 documents
// ❌ Bad - Too small (slow)
for (const doc of documents) {
await db.insert(doc);
}// ✅ Good - Cache frequently accessed data
const cache = new LRUCache({ max: 1000 });
async function getCachedEmbedding(text: string) {
let vector = cache.get(text);
if (!vector) {
vector = await db.embedding.embed(text);
cache.set(text, vector);
}
return vector;
}// ✅ Good - Track and log performance
const start = performance.now();
const results = await db.search({ text: query, k: 10 });
const latency = performance.now() - start;
if (latency > 100) {
console.warn(`Slow search: ${latency}ms`);
}// ✅ Good - Dispose when done
try {
await db.initialize();
// Use database
} finally {
await db.dispose();
}| Dataset Size | Target Latency | Acceptable |
|---|---|---|
| 1,000 | <10ms | <50ms |
| 10,000 | <50ms | <100ms |
| 50,000 | <100ms | <200ms |
| 100,000 | <200ms | <500ms |
| Operation | Target | Acceptable |
|---|---|---|
| Single insert | >100/sec | >50/sec |
| Batch insert (100) | >1000/sec | >500/sec |
| Batch insert (1000) | >2000/sec | >1000/sec |
| Dataset Size | Target | Maximum |
|---|---|---|
| 10,000 vectors | <50MB | <100MB |
| 50,000 vectors | <200MB | <500MB |
| 100,000 vectors | <400MB | <1GB |
- Check dataset size - consider reducing
- Enable WebGPU if available
- Use metadata filters to reduce search space
- Reduce result count (k)
- Check for memory pressure
- Reduce cache sizes
- Use progressive loading
- Implement LRU eviction
- Clear unused data
- Export and archive old data
- Use batch operations
- Pre-compute embeddings
- Increase batch size
- Check storage quota
- Optimize IndexedDB transactions
- Enable caching
- Use smaller models
- Check network connection
- Use CDN for models
- Pre-load models on app start
- API Reference - Complete API documentation
- Benchmarking Guide - Detailed benchmarking
- Examples - Performance examples
- Troubleshooting - Common issues
Happy optimizing! 🚀