-
Notifications
You must be signed in to change notification settings - Fork 179
DiskBBQ Updates #4037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DiskBBQ Updates #4037
Changes from 2 commits
7bedece
4583290
862a91d
51a3d9b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -46,7 +46,13 @@ | |
|
|
||
| ## Ensure data nodes have enough memory [_ensure_data_nodes_have_enough_memory] | ||
|
|
||
| {{es}} uses the [HNSW](https://arxiv.org/abs/1603.09320) algorithm for approximate kNN search. HNSW is a graph-based algorithm which only works efficiently when most vector data is held in memory. You should ensure that data nodes have at least enough RAM to hold the vector data and index structures. To check the size of the vector data, you can use the [Analyze index disk usage](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-disk-usage) API. | ||
| {{es}} uses either the [HNSW](https://arxiv.org/abs/1603.09320) algorithm or the [DiskBBQ](https://www.elastic.co/search-labs/blog/diskbbq-elasticsearch-introduction) algorithm for approximate kNN search. | ||
|
|
||
| HNSW is a graph-based algorithm which only works efficiently when most vector data is held in memory. You should ensure that data nodes have at least enough RAM to hold the vector data and index structures. | ||
|
|
||
| DiskBBQ is a clustering algorithm which can scale effeciently on a fraction of the total memory. You can start with enough RAM to hold the vector data and index structures but it's likely you will be able to use signifigantly less than this and still maintain good performance. In testing we find this will be between 1-5% of the index structure size (centroids and quantized vector data) per unique query where unique queries access non-overlapping clusters. | ||
|
Check notice on line 53 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md
|
||
|
|
||
| To check the size of the vector data, you can use the [Analyze index disk usage](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-disk-usage) API. | ||
|
|
||
| Here are estimates for different element types and quantization levels: | ||
|
|
||
|
|
@@ -59,6 +65,8 @@ | |
|
|
||
| If utilizing HNSW, the graph must also be in memory, to estimate the required bytes use `num_vectors * 4 * HNSW.m`. The default value for `HNSW.m` is 16, so by default `num_vectors * 4 * 16`. | ||
|
|
||
| If utilizing DiskBBQ, a fraction of the clusters and centroids will need to be in memory. When doing this estimation it makes more sense to include both the index structure and the quantized vectors together as the structures are dependent. To estimate the total bytes we compute the cost of the centroids as `num_clusters * num_dimensions * 2 * 4 + num_clusters * (num_dimensions + 14)` plus the cost of the quantized vectors within the clusters as `num_vectors * ((num_dimensions/8 + 14 + 2) * 2)` where `num_clusters` is defined as `num_vectors / vectors_per_cluster` which by default will be `num_vectors / 384` | ||
|
Check notice on line 68 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md
|
||
|
||
|
|
||
| Note that the required RAM is for the filesystem cache, which is separate from the Java heap. | ||
|
|
||
| The data nodes should also leave a buffer for other ways that RAM is needed. For example your index might also include text fields and numerics, which also benefit from using filesystem cache. It’s recommended to run benchmarks with your specific dataset to ensure there’s a sufficient amount of memory to give good search performance. You can find [here](https://elasticsearch-benchmarks.elastic.co/#tracks/so_vector) and [here](https://elasticsearch-benchmarks.elastic.co/#tracks/dense_vector) some examples of datasets and configurations that we use for our nightly benchmarks. | ||
|
|
@@ -72,16 +80,53 @@ | |
| Loading data into the filesystem cache eagerly on too many indices or too many files will make search *slower* if the filesystem cache is not large enough to hold all the data. Use with caution. | ||
| :::: | ||
|
|
||
|
|
||
| The following file extensions are used for the approximate kNN search: Each extension is broken down by the quantization types. | ||
|
|
||
| * {applies_to}`stack: ga 9.3` `cenivf` for DiskBBQ to store centroids | ||
| * {applies_to}`stack: ga 9.3` `clivf` for DiskBBQ to store clusters of quantized vectors | ||
| * `vex` for the HNSW graph | ||
| * `vec` for all non-quantized vector values. This includes all element types: `float`, `byte`, and `bit`. | ||
| * `veq` for quantized vectors indexed with [`quantization`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization): `int4` or `int8` | ||
| * `veb` for binary vectors indexed with [`quantization`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization): `bbq` | ||
| * `vem`, `vemf`, `vemq`, and `vemb` for metadata, usually small and not a concern for preloading | ||
|
|
||
| Generally, if you are using a quantized index, you should only preload the relevant quantized values and the HNSW graph. Preloading the raw vectors is not necessary and might be counterproductive. | ||
| Generally, if you are using a quantized index, you should only preload the relevant quantized values and index structures such as the HNSW graph. Preloading the raw vectors is not necessary and might be counterproductive. | ||
|
|
||
| Additional detail can be gleened on the specific files by using the [stats endpoint](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-stats) which will display information about the index and fields for example for DiskBBQ you might see something like this: | ||
|
Check notice on line 95 in deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md
|
||
|
|
||
| [source,console] | ||
| ---- | ||
| GET my_index/_stats?filter_path=indices.my_index.primaries.dense_vector | ||
|
|
||
| Example Response: | ||
| { | ||
| "indices": { | ||
| "my_index": { | ||
| "primaries": { | ||
| "dense_vector": { | ||
| "value_count": 3, | ||
| "off_heap": { | ||
| "total_size_bytes": 249, | ||
| "total_veb_size_bytes": 0, | ||
| "total_vec_size_bytes": 36, | ||
| "total_veq_size_bytes": 0, | ||
| "total_vex_size_bytes": 0, | ||
| "total_cenivf_size_bytes": 111, | ||
| "total_clivf_size_bytes": 102, | ||
| "fielddata": { | ||
| "my_vector": { | ||
| "cenivf_size_bytes": 111, | ||
| "clivf_size_bytes": 102, | ||
| "vec_size_bytes": 36 | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ---- | ||
|
|
||
|
|
||
| ## Reduce the number of index segments [_reduce_the_number_of_index_segments] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's be careful here, obviously, going to disk is always slower than just reading things in memory. We need to clarify that the performance degrades more linearly than with HNSW, which degrades exponentially.
I think calling about these percentages is OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense I'll reword to clarify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reworded this a bit let me know what you think now