Skip to content

Latest commit

 

History

History
99 lines (63 loc) · 5.56 KB

File metadata and controls

99 lines (63 loc) · 5.56 KB

Vector Index Optimization Guide

Optimize your vector search performance by choosing the right index strategy for your dataset size and requirements.


Vector Index Strategy

Your choice of index determines the needs of RAM, operational cost and the retrieval latency of your cluster.

Index Type Production Use Case Size Retrieval Speed
HNSW High Performance (Production Recommended) Optimized for Large datasets Fastest
Flat Memory Saver Best for small datasets only. Performs brute-force linear scans; uses zero graph memory. Good
Dynamic Balanced Starts as Flat for efficiency, auto-upgrades to HNSW at a threshold (Default: 10,000). Adaptive

Dynamic Indexing requires ASYNC_INDEXING=true in your environment. This is a one-way switch; once a shard converts to HNSW, it will not revert to Flat even if the object count drops.


Critical Environment Configuration

Category Variable Optimization
Persistence PERSISTENCE_HNSW_MAX_LOG_SIZE Set it close to your HNSW graph size (e.g. 1 GiB for a ~1 GiB graph) to speed up compaction. Note: This increases memory usage.
Deletions TOMBSTONE_DELETION_CONCURRENCY Default is already half your CPU cores. In large-core clusters, consider setting lower to prevent cleanup from consuming too many resources. For small-core clusters with heavy deletions, increase to speed up cleanup.
Deletions TOMBSTONE_DELETION_MIN_PER_CYCLE For very large indexes, set to 100000 (100k) to ensure cleanup happens before search speed degrades.
Deletions TOMBSTONE_DELETION_MAX_PER_CYCLE For very large indexes, set to 10000000 (10 million) to cap the number of tombstones deleted per cycle and prevent resource overconsumption.
Global Defaults DEFAULT_QUANTIZATION Set to RQ to ensure all new collections use 8-bit compression automatically.

Tombstones are markers for deleted objects in the HNSW index. They get cleaned up periodically (controlled by cleanupIntervalSeconds).


The HNSW Tuning

Tuning HNSW is a balance between graph density (Recall) and traversal speed (Latency). The following are starting points and should be validated per dataset.

✅ The Production DOs

  • Set ef: -1: Enables Dynamic ef. This allows Weaviate to auto-tune search depth based on your query limit.
    • For predictable recall requirements where you need static performance, set ef between 300–500 (test your specific dataset to find optimal value).
  • Optimize maxConnections: Reduce from 64 to 32 as a good default; increase only if you need higher recall and can afford extra memory. This provides significant RAM savings with minimal recall loss and often better QPS/recall performance.
  • Bulk Import Performance: Use Async Indexing (ASYNC_INDEXING=true) to prevent graph construction from blocking data ingestion.

❌ The Production DON'Ts

  • DON'T exceed ef: 512 Causes massive latency penalties for negligible recall gains.
  • DON'T disable lazy loading for multi-tenant deployments: Only use DISABLE_LAZY_LOAD_SHARDS=true for single-tenant collections.

High-Efficiency Compression

For production, Rotational Quantization (RQ-8) is the standard for compression.

  • RQ (Recommended): Provides 98-99% recall with a 4x reduction in vector RAM. It requires no training phase, compression is instant.
  • PQ (Large Scale): Use only for massive datasets (>1M) where custom segment tuning is needed. Requires 10k–100k objects per shard for training before it activates.

Availability & Restart Optimization

In production, rolling updates can cause search latency spikes if the new Pod hasn't finished loading its cache. Use these settings to ensure a Pod is only "Ready" once it's fully performant.

Environment Variable Recommended Value Why it's Critical
HNSW_STARTUP_WAIT_FOR_VECTOR_CACHE true Forces Weaviate to wait until the vector cache is hot before marking the Pod as "Ready."
DISABLE_LAZY_LOAD_SHARDS true For multi-tenant keep it FALSE to reduce startup time. For single-tenant deployments where you can afford longer startup, set to TRUE so that everything is loaded before the node is considered ready.

Why this matters: A pod that is "up" but hasn't loaded its HNSW graph, leading massive latency spikes during restarts.


Developer Checklist

Environment & Infrastructure:

  • For Availability: HNSW_STARTUP_WAIT_FOR_VECTOR_CACHE=true and DISABLE_LAZY_LOAD_SHARDS=true are active (single-tenant collections only).
  • Persistence: PERSISTENCE_HNSW_MAX_LOG_SIZE=1024MiB (or match HNSW graph size; adjust based on dataset).
  • Global Defaults: DEFAULT_QUANTIZATION RQ (applies RQ compression to all new collections).

HNSW Configuration:

  • ef is set to -1 for dynamic optimization (or 300-500 for static predictable recall).
  • maxConnections=32 (reduced from 64 for modern high-dimensional vectors).
  • Never set ef > 512 (causes massive latency penalties).

Compression & Memory:

  • Compression: RQ is enabled for RAM efficiency (98-99% recall, 4x reduction).
  • Memory: Vector cache is sized to fit the "hot" portion of the dataset.

Deletions & Cleanup:

  • Deletions: TOMBSTONE_DELETION_CONCURRENCY at default (already half CPU cores) or lower for large clusters.
  • For Large Datasets, configure TOMBSTONE_DELETION_MIN_PER_CYCLE and TOMBSTONE_DELETION_MAX_PER_CYCLE.