Optimize your vector search performance by choosing the right index strategy for your dataset size and requirements.
Your choice of index determines the needs of RAM, operational cost and the retrieval latency of your cluster.
| Index Type | Production Use Case | Size | Retrieval Speed |
|---|---|---|---|
| HNSW | High Performance (Production Recommended) | Optimized for Large datasets | Fastest |
| Flat | Memory Saver | Best for small datasets only. Performs brute-force linear scans; uses zero graph memory. | Good |
| Dynamic | Balanced | Starts as Flat for efficiency, auto-upgrades to HNSW at a threshold (Default: 10,000). | Adaptive |
Dynamic Indexing requires
ASYNC_INDEXING=truein your environment. This is a one-way switch; once a shard converts to HNSW, it will not revert to Flat even if the object count drops.
| Category | Variable | Optimization |
|---|---|---|
| Persistence | PERSISTENCE_HNSW_MAX_LOG_SIZE |
Set it close to your HNSW graph size (e.g. 1 GiB for a ~1 GiB graph) to speed up compaction. Note: This increases memory usage. |
| Deletions | TOMBSTONE_DELETION_CONCURRENCY |
Default is already half your CPU cores. In large-core clusters, consider setting lower to prevent cleanup from consuming too many resources. For small-core clusters with heavy deletions, increase to speed up cleanup. |
| Deletions | TOMBSTONE_DELETION_MIN_PER_CYCLE |
For very large indexes, set to 100000 (100k) to ensure cleanup happens before search speed degrades. |
| Deletions | TOMBSTONE_DELETION_MAX_PER_CYCLE |
For very large indexes, set to 10000000 (10 million) to cap the number of tombstones deleted per cycle and prevent resource overconsumption. |
| Global Defaults | DEFAULT_QUANTIZATION |
Set to RQ to ensure all new collections use 8-bit compression automatically. |
Tombstones are markers for deleted objects in the HNSW index. They get cleaned up periodically (controlled by cleanupIntervalSeconds).
Tuning HNSW is a balance between graph density (Recall) and traversal speed (Latency). The following are starting points and should be validated per dataset.
Set ef: -1: Enables Dynamic ef. This allows Weaviate to auto-tune search depth based on your query limit.- For predictable recall requirements where you need static performance, set
efbetween300–500(test your specific dataset to find optimal value).
- For predictable recall requirements where you need static performance, set
- Optimize
maxConnections: Reduce from 64 to 32 as a good default; increase only if you need higher recall and can afford extra memory. This provides significant RAM savings with minimal recall loss and often better QPS/recall performance. - Bulk Import Performance: Use Async Indexing (
ASYNC_INDEXING=true) to prevent graph construction from blocking data ingestion.
- DON'T exceed
ef: 512Causes massive latency penalties for negligible recall gains. - DON'T disable lazy loading for multi-tenant deployments: Only use
DISABLE_LAZY_LOAD_SHARDS=truefor single-tenant collections.
For production, Rotational Quantization (RQ-8) is the standard for compression.
- RQ (Recommended): Provides 98-99% recall with a 4x reduction in vector RAM. It requires no training phase, compression is instant.
- PQ (Large Scale): Use only for massive datasets (>1M) where custom segment tuning is needed. Requires 10k–100k objects per shard for training before it activates.
In production, rolling updates can cause search latency spikes if the new Pod hasn't finished loading its cache. Use these settings to ensure a Pod is only "Ready" once it's fully performant.
| Environment Variable | Recommended Value | Why it's Critical |
|---|---|---|
HNSW_STARTUP_WAIT_FOR_VECTOR_CACHE |
true |
Forces Weaviate to wait until the vector cache is hot before marking the Pod as "Ready." |
DISABLE_LAZY_LOAD_SHARDS |
true |
For multi-tenant keep it FALSE to reduce startup time. For single-tenant deployments where you can afford longer startup, set to TRUE so that everything is loaded before the node is considered ready. |
Why this matters: A pod that is "up" but hasn't loaded its HNSW graph, leading massive latency spikes during restarts.
Environment & Infrastructure:
- For Availability:
HNSW_STARTUP_WAIT_FOR_VECTOR_CACHE=trueandDISABLE_LAZY_LOAD_SHARDS=trueare active (single-tenant collections only). - Persistence:
PERSISTENCE_HNSW_MAX_LOG_SIZE=1024MiB(or match HNSW graph size; adjust based on dataset). - Global Defaults:
DEFAULT_QUANTIZATION RQ(applies RQ compression to all new collections).
HNSW Configuration:
- ef is set to -1 for dynamic optimization (or 300-500 for static predictable recall).
- maxConnections=32 (reduced from 64 for modern high-dimensional vectors).
- Never set ef > 512 (causes massive latency penalties).
Compression & Memory:
- Compression: RQ is enabled for RAM efficiency (98-99% recall, 4x reduction).
- Memory: Vector cache is sized to fit the "hot" portion of the dataset.
Deletions & Cleanup:
- Deletions:
TOMBSTONE_DELETION_CONCURRENCYat default (already half CPU cores) or lower for large clusters. - For Large Datasets, configure
TOMBSTONE_DELETION_MIN_PER_CYCLEandTOMBSTONE_DELETION_MAX_PER_CYCLE.