Skip to content

Commit 7bedece

Browse files
committed
needed to add info about sizing but realized to do that we needed a better basis for diskbbq in the docs at all; added that
1 parent 3af6444 commit 7bedece

File tree

2 files changed

+56
-6
lines changed

2 files changed

+56
-6
lines changed

deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md

Lines changed: 48 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,13 @@ Another option is to use [synthetic `_source`](elasticsearch://reference/elasti
4646

4747
## Ensure data nodes have enough memory [_ensure_data_nodes_have_enough_memory]
4848

49-
{{es}} uses the [HNSW](https://arxiv.org/abs/1603.09320) algorithm for approximate kNN search. HNSW is a graph-based algorithm which only works efficiently when most vector data is held in memory. You should ensure that data nodes have at least enough RAM to hold the vector data and index structures. To check the size of the vector data, you can use the [Analyze index disk usage](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-disk-usage) API.
49+
{{es}} uses either the [HNSW](https://arxiv.org/abs/1603.09320) algorithm or the [DiskBBQ](https://www.elastic.co/search-labs/blog/diskbbq-elasticsearch-introduction) algorithm for approximate kNN search.
50+
51+
HNSW is a graph-based algorithm which only works efficiently when most vector data is held in memory. You should ensure that data nodes have at least enough RAM to hold the vector data and index structures.
52+
53+
DiskBBQ is a clustering algorithm which can scale effeciently on a fraction of the total memory. You can start with enough RAM to hold the vector data and index structures but it's likely you will be able to use signifigantly less than this and still maintain good performance. In testing we find this will be between 1-5% of the index structure size (centroids and quantized vector data) per unique query where unique queries access non-overlapping clusters.
54+
55+
To check the size of the vector data, you can use the [Analyze index disk usage](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-disk-usage) API.
5056

5157
Here are estimates for different element types and quantization levels:
5258

@@ -59,6 +65,8 @@ Here are estimates for different element types and quantization levels:
5965

6066
If utilizing HNSW, the graph must also be in memory, to estimate the required bytes use `num_vectors * 4 * HNSW.m`. The default value for `HNSW.m` is 16, so by default `num_vectors * 4 * 16`.
6167

68+
If utilizing DiskBBQ, a fraction of the clusters and centroids will need to be in memory. When doing this estimation it makes more sense to include both the index structure and the quantized vectors together as the structures are dependent. To estimate the total bytes we compute the cost of the centroids as `num_clusters * num_dimensions * 2 * 4 + num_clusters * (num_dimensions + 14)` plus the cost of the quantized vectors within the clusters as `num_vectors * ((num_dimensions/8 + 14 + 2) * 2)` where `num_clusters` is defined as `num_vectors / vectors_per_cluster` which by default will be `num_vectors / 384`
69+
6270
Note that the required RAM is for the filesystem cache, which is separate from the Java heap.
6371

6472
The data nodes should also leave a buffer for other ways that RAM is needed. For example your index might also include text fields and numerics, which also benefit from using filesystem cache. It’s recommended to run benchmarks with your specific dataset to ensure there’s a sufficient amount of memory to give good search performance. You can find [here](https://elasticsearch-benchmarks.elastic.co/#tracks/so_vector) and [here](https://elasticsearch-benchmarks.elastic.co/#tracks/dense_vector) some examples of datasets and configurations that we use for our nightly benchmarks.
@@ -72,16 +80,53 @@ If the machine running {{es}} is restarted, the filesystem cache will be empty,
7280
Loading data into the filesystem cache eagerly on too many indices or too many files will make search *slower* if the filesystem cache is not large enough to hold all the data. Use with caution.
7381
::::
7482

75-
7683
The following file extensions are used for the approximate kNN search: Each extension is broken down by the quantization types.
7784

85+
* {applies_to}`stack: ga 9.3` `cenivf` for DiskBBQ to store centroids
86+
* {applies_to}`stack: ga 9.3` `clivf` for DiskBBQ to store clusters of quantized vectors
7887
* `vex` for the HNSW graph
7988
* `vec` for all non-quantized vector values. This includes all element types: `float`, `byte`, and `bit`.
8089
* `veq` for quantized vectors indexed with [`quantization`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization): `int4` or `int8`
8190
* `veb` for binary vectors indexed with [`quantization`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization): `bbq`
8291
* `vem`, `vemf`, `vemq`, and `vemb` for metadata, usually small and not a concern for preloading
8392

84-
Generally, if you are using a quantized index, you should only preload the relevant quantized values and the HNSW graph. Preloading the raw vectors is not necessary and might be counterproductive.
93+
Generally, if you are using a quantized index, you should only preload the relevant quantized values and index structures such as the HNSW graph. Preloading the raw vectors is not necessary and might be counterproductive.
94+
95+
Additional detail can be gleened on the specific files by using the [stats endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-stats) which will display information about the index and fields for example for DiskBBQ you might see something like this:
96+
97+
[source,console]
98+
----
99+
GET my_index/_stats?filter_path=indices.my_index.primaries.dense_vector
100+
101+
Example Response:
102+
{
103+
"indices": {
104+
"my_index": {
105+
"primaries": {
106+
"dense_vector": {
107+
"value_count": 3,
108+
"off_heap": {
109+
"total_size_bytes": 249,
110+
"total_veb_size_bytes": 0,
111+
"total_vec_size_bytes": 36,
112+
"total_veq_size_bytes": 0,
113+
"total_vex_size_bytes": 0,
114+
"total_cenivf_size_bytes": 111,
115+
"total_clivf_size_bytes": 102,
116+
"fielddata": {
117+
"my_vector": {
118+
"cenivf_size_bytes": 111,
119+
"clivf_size_bytes": 102,
120+
"vec_size_bytes": 36
121+
}
122+
}
123+
}
124+
}
125+
}
126+
}
127+
}
128+
}
129+
----
85130

86131

87132
## Reduce the number of index segments [_reduce_the_number_of_index_segments]

solutions/search/vector/knn.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ Approximate kNN offers low latency and good accuracy, while exact kNN guarantees
6161
## Approximate kNN search [approximate-knn]
6262

6363
::::{warning}
64-
Approximate kNN search has specific resource requirements. All vector data must fit in the node’s page cache for efficient performance. Refer to the [approximate kNN tuning guide](/deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md) for configuration tips.
64+
Approximate kNN search has specific resource requirements. For instance, for HNSW all vector data must fit in the node’s page cache for efficient performance. Refer to the [approximate kNN tuning guide](/deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md) for configuration tips.
6565
::::
6666

6767
To run an approximate kNN search:
@@ -132,9 +132,10 @@ Support for approximate kNN search was added in version 8.0. Before 8.0, `dense_
132132

133133
### Indexing considerations for approximate kNN search [knn-indexing-considerations]
134134

135-
For approximate kNN, {{es}} stores dense vector values per segment as an [HNSW graph](https://arxiv.org/abs/1603.09320). Building HNSW graphs is compute-intensive, so indexing vectors can take time; you may need to increase client request timeouts for index and bulk operations. The [approximate kNN tuning guide](/deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md) covers indexing performance, sizing, and configuration trade-offs that affect search performance.
136135

137-
In addition to search-time parameters, HNSW exposes index-time settings that balance graph build cost, search speed, and accuracy. When defining your `dense_vector` mapping, use [`index_options`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) to set these parameters:
136+
For approximate kNN, {{es}} stores dense vector values per segment as an [HNSW graph](https://arxiv.org/abs/1603.09320) or per segment as clusters using [DiskBBQ](https://www.elastic.co/search-labs/blog/diskbbq-elasticsearch-introduction). Building these approximate kNN structures is compute-intensive, so indexing vectors can take time; you may need to increase client request timeouts for index and bulk operations. The [approximate kNN tuning guide](/deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md) covers indexing performance, sizing, and configuration trade-offs that affect search performance.
137+
138+
In addition to search-time parameters, HNSW and DiskBBQ expose index-time settings that balance graph build cost, search speed, and accuracy. When defining your `dense_vector` mapping, use [`index_options`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) to set these parameters:
138139

139140
```console
140141
PUT image-index
@@ -156,6 +157,10 @@ PUT image-index
156157
}
157158
```
158159

160+
::::{note}
161+
Support for DisKBBQ was introduced in version 9.2.0
162+
::::
163+
159164
### Tune approximate kNN for speed or accuracy [tune-approximate-knn-for-speed-accuracy]
160165

161166
To gather results, the kNN API first finds a `num_candidates` number of approximate neighbors per shard, computes similarity to the query vector, selects the top `k` per shard, and merges them into the global top `k` nearest neighbors.

0 commit comments

Comments
 (0)