Skip to content

Conversation

@john-wagster
Copy link
Contributor

Added content around DiskBBQ to the ANN docs. Specifically needed to do this so we could address sizing for DiskBBQ.

I have a somewhat related PR here: elastic/elasticsearch#138433, which should probably go in first.

…etter basis for diskbbq in the docs at all; added that
@github-actions
Copy link

github-actions bot commented Nov 21, 2025

@github-actions
Copy link

github-actions bot commented Nov 21, 2025

Vale Linting Results

Summary: 20 suggestions found

💡 Suggestions (20)
File Line Rule Message
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 49 Elastic.Acronyms 'HNSW' has no definition.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 51 Elastic.Acronyms 'HNSW' has no definition.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 53 Elastic.Acronyms 'HNSW' has no definition.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 53 Elastic.Acronyms 'HNSW' has no definition.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 53 Elastic.FutureTense 'will typically' might be in future tense. Write in the present tense to describe the state of the product as it is now.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 53 Elastic.FutureTense 'will scale' might be in future tense. Write in the present tense to describe the state of the product as it is now.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 53 Elastic.FutureTense 'will be' might be in future tense. Write in the present tense to describe the state of the product as it is now.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 53 Elastic.FutureTense 'will be' might be in future tense. Write in the present tense to describe the state of the product as it is now.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 68 Elastic.FutureTense 'will need' might be in future tense. Write in the present tense to describe the state of the product as it is now.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 68 Elastic.FutureTense 'will be' might be in future tense. Write in the present tense to describe the state of the product as it is now.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 93 Elastic.Acronyms 'HNSW' has no definition.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 95 Elastic.FutureTense 'will display' might be in future tense. Write in the present tense to describe the state of the product as it is now.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 95 Elastic.WordChoice Consider using 'refer to (if it's a document), view (if it's a UI element)' instead of 'see', unless the term is in the UI.
deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md 97 Elastic.Capitalization '[source,console]' should use sentence-style capitalization.
solutions/search/vector/knn.md 64 Elastic.Acronyms 'HNSW' has no definition.
solutions/search/vector/knn.md 136 Elastic.Acronyms 'HNSW' has no definition.
solutions/search/vector/knn.md 136 Elastic.Semicolons Use semicolons judiciously.
solutions/search/vector/knn.md 136 Elastic.WordChoice Consider using 'can, might' instead of 'may', unless the term is in the UI.
solutions/search/vector/knn.md 138 Elastic.Wordiness Consider using 'also' instead of 'In addition'.
solutions/search/vector/knn.md 138 Elastic.Acronyms 'HNSW' has no definition.


HNSW is a graph-based algorithm which only works efficiently when most vector data is held in memory. You should ensure that data nodes have at least enough RAM to hold the vector data and index structures.

DiskBBQ is a clustering algorithm which can scale effeciently on a fraction of the total memory. You can start with enough RAM to hold the vector data and index structures but it's likely you will be able to use signifigantly less than this and still maintain good performance. In testing we find this will be between 1-5% of the index structure size (centroids and quantized vector data) per unique query where unique queries access non-overlapping clusters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's be careful here, obviously, going to disk is always slower than just reading things in memory. We need to clarify that the performance degrades more linearly than with HNSW, which degrades exponentially.

I think calling about these percentages is OK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense I'll reword to clarify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded this a bit let me know what you think now


If utilizing HNSW, the graph must also be in memory, to estimate the required bytes use `num_vectors * 4 * HNSW.m`. The default value for `HNSW.m` is 16, so by default `num_vectors * 4 * 16`.

If utilizing DiskBBQ, a fraction of the clusters and centroids will need to be in memory. When doing this estimation it makes more sense to include both the index structure and the quantized vectors together as the structures are dependent. To estimate the total bytes we compute the cost of the centroids as `num_clusters * num_dimensions * 2 * 4 + num_clusters * (num_dimensions + 14)` plus the cost of the quantized vectors within the clusters as `num_vectors * ((num_dimensions/8 + 14 + 2) * 2)` where `num_clusters` is defined as `num_vectors / vectors_per_cluster` which by default will be `num_vectors / 384`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty complicated...

num_clusters * num_dimensions * 2 * 4 you need two floating point representations of the clusters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That 2 is definitely not right. Removing that.

Went back and double checked what bytes are there.

posting list w/o vectors = num_clusters * (centroid_bytes + dotProd(centroid,centroid) + clusterSize + encoding)
posting list w/o vectors = num_clusters * ((dims * 4) + 4 + 4 + 1)

I dont think it's worth it to describe the additional bytes. That would just make this even more complicated.

Suggestions for simplifying it are welcome. It probably seems more complex because it includes both the "index structure" for the centroids and the vectors because of SOAR. Maybe I can reference the bbq quantized calculation from above and somewhat simplify the centroids figures. So something like this instead. Thoughts?

the cost of the centroids as `num_clusters * (num_dimensions * 4  + (num_dimensions + 14))` 
plus the cost of the quantized vectors within the clusters as `bbq_quantizated_vectors * 2` 

The downside is this ignores the doc id (2 byte) cost per vector. I almost feel like it makes it more complicated to mention this rather than not if we break it up for instance like this:

plus the cost of the quantized vectors within the clusters as `bbq_quantizated_vectors * 2` + `num_vectors * 2 * 2` 

I'll think about it some more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants