Skip to content

Commit 7bb6a2f

Browse files
ctindelclaude
andcommitted
Address all PR review feedback from @kderusso
Implements comprehensive improvements based on code review: 1. Added BBQ default behavior notes (384+ dims default to BBQ HNSW) 2. Enhanced quantization explanation with blog link and clearer description 3. Qualified BBQ recommendations for text embeddings specifically 4. Added BBQ 64 dimensions minimum requirement 5. Updated all examples to use built-in E5 endpoint (.multilingual-e5-small-elasticsearch) 6. Clarified E5/ELSER automatic availability 7. Improved bbq_flat description (maximum accuracy at expense of speed) 8. Improved bbq_disk description (simpler use cases, fewer vectors) 9. Added reference to full list of quantization options (int4_flat, etc.) 10. Added default behavior note to dense-vector.md All changes ensure users have accurate, complete information about BBQ quantization strategies and their appropriate use cases. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
1 parent 9c9edf5 commit 7bb6a2f

File tree

2 files changed

+14
-12
lines changed

2 files changed

+14
-12
lines changed

solutions/search/semantic-search/semantic-search-semantic-text.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ The mapping of the destination index - the index that contains the embeddings th
3030
You can run {{infer}} either using the [Elastic {{infer-cap}} Service](/explore-analyze/elastic-inference/eis.md) or on your own ML-nodes. The following examples show you both scenarios.
3131

3232
::::{tip}
33-
For production deployments with dense vector embeddings, consider optimizing storage and performance using [`index_options`](#semantic-text-index-options). This allows you to configure quantization strategies like BBQ (Better Binary Quantization) that can reduce memory usage by up to 32x.
33+
For production deployments with dense vector embeddings, consider optimizing storage and performance using [`index_options`](#semantic-text-index-options). This allows you to configure quantization strategies like BBQ (Better Binary Quantization) that can reduce memory usage by up to 32x. Note that new indices with 384 or more dimensions will default to BBQ HNSW automatically.
3434
::::
3535

3636
:::::::{tab-set}
@@ -117,10 +117,10 @@ To try the ELSER model on the Elastic Inference Service, explicitly set the `inf
117117

118118
When using `semantic_text` with dense vector embeddings (such as E5 or other text embedding models), you can optimize storage and search performance by configuring `index_options` on the underlying `dense_vector` field. This is particularly useful for large-scale deployments.
119119

120-
The `index_options` parameter controls how vectors are indexed and stored. For dense vector embeddings, you can specify quantization strategies like Better Binary Quantization (BBQ) that significantly reduce memory footprint while maintaining search quality. For details on available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).
120+
The `index_options` parameter controls how vectors are indexed and stored. For dense vector embeddings, you can specify [quantization strategies](https://www.elastic.co/blog/vector-search-elasticsearch-rationale) like Better Binary Quantization (BBQ) that significantly reduce memory footprint while maintaining search quality. Quantization compresses high-dimensional vectors into more efficient representations, enabling faster searches and lower memory consumption. For details on available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).
121121

122122
::::{tip}
123-
For most production use cases using `semantic_text` with dense vector embeddings, using BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. Choose from:
123+
For most production use cases using `semantic_text` with dense vector embeddings from text models (like E5, OpenAI, or Cohere), BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. BBQ requires a minimum of 64 dimensions and works best with text embeddings (it may not perform well with other types like image embeddings). Choose from:
124124
- `bbq_hnsw` - Best for most use cases (default for 384+ dimensions)
125125
- `bbq_flat` - Simpler option for smaller datasets
126126
- `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements (Elasticsearch 9.2+)
@@ -135,7 +135,7 @@ PUT semantic-embeddings-optimized
135135
"properties": {
136136
"content": {
137137
"type": "semantic_text",
138-
"inference_id": "my-e5-model", <1>
138+
"inference_id": ".multilingual-e5-small-elasticsearch", <1>
139139
"index_options": {
140140
"dense_vector": {
141141
"type": "bbq_hnsw" <2>
@@ -147,10 +147,10 @@ PUT semantic-embeddings-optimized
147147
}
148148
```
149149

150-
1. Reference to a text embedding inference endpoint (e.g., E5, OpenAI, or Cohere embeddings). You must create this endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
150+
1. Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint that is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
151151
2. Use Better Binary Quantization with HNSW indexing for optimal memory efficiency. This setting applies to the underlying `dense_vector` field that stores the embeddings.
152152

153-
You can also use `bbq_flat` for simpler datasets or when you don't need the HNSW graph:
153+
You can also use `bbq_flat` for simpler datasets where you need maximum accuracy at the expense of speed:
154154

155155
```console
156156
PUT semantic-embeddings-flat
@@ -159,7 +159,7 @@ PUT semantic-embeddings-flat
159159
"properties": {
160160
"content": {
161161
"type": "semantic_text",
162-
"inference_id": "my-e5-model",
162+
"inference_id": ".multilingual-e5-small-elasticsearch",
163163
"index_options": {
164164
"dense_vector": {
165165
"type": "bbq_flat" <1>
@@ -182,7 +182,7 @@ PUT semantic-embeddings-disk
182182
"properties": {
183183
"content": {
184184
"type": "semantic_text",
185-
"inference_id": "my-e5-model",
185+
"inference_id": ".multilingual-e5-small-elasticsearch",
186186
"index_options": {
187187
"dense_vector": {
188188
"type": "bbq_disk" <1>
@@ -199,7 +199,7 @@ stack: ga 9.2
199199
serverless: unavailable
200200
```
201201

202-
1. Use DiskBBQ for disk-based vector storage with minimal memory requirements. Available in Elasticsearch 9.2+. This option stores compressed vectors on disk, reducing RAM usage to as little as 100 MB while maintaining query latencies around 15ms.
202+
1. Use disk-optimized BBQ for simpler use cases with fewer vectors. This requires less compute resources during indexing. Available in Elasticsearch 9.2+, this option stores compressed vectors on disk, reducing RAM usage to as little as 100 MB while maintaining query latencies around 15ms.
203203

204204
Other quantization options include `int8_hnsw` (8-bit integer quantization) and `int4_hnsw` (4-bit integer quantization):
205205

@@ -210,7 +210,7 @@ PUT semantic-embeddings-int8
210210
"properties": {
211211
"content": {
212212
"type": "semantic_text",
213-
"inference_id": "my-e5-model",
213+
"inference_id": ".multilingual-e5-small-elasticsearch",
214214
"index_options": {
215215
"dense_vector": {
216216
"type": "int8_hnsw" <1>
@@ -222,7 +222,7 @@ PUT semantic-embeddings-int8
222222
}
223223
```
224224

225-
1. Use 8-bit integer quantization for 4x memory reduction with high accuracy retention. For 4-bit quantization, use `"type": "int4_hnsw"` instead, which provides 8x memory reduction.
225+
1. Use 8-bit integer quantization for 4x memory reduction with high accuracy retention. For 4-bit quantization, use `"type": "int4_hnsw"` instead, which provides 8x memory reduction. For the full list of other available quantization options (including `int4_flat` and others), refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).
226226

227227
For HNSW-specific tuning parameters like `m` and `ef_construction`, you can include them in the `index_options`:
228228

@@ -233,7 +233,7 @@ PUT semantic-embeddings-custom
233233
"properties": {
234234
"content": {
235235
"type": "semantic_text",
236-
"inference_id": "my-e5-model",
236+
"inference_id": ".multilingual-e5-small-elasticsearch",
237237
"index_options": {
238238
"dense_vector": {
239239
"type": "bbq_hnsw",

solutions/search/vector/dense-vector.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ For more information about how the profile affects virtual compute unit (VCU) al
4545

4646
Better Binary Quantization (BBQ) is an advanced vector quantization technique for `dense_vector` fields. It compresses embeddings into compact binary form, enabling faster similarity search and reducing memory usage. This improves both search relevance and cost efficiency, especially when used with HNSW (Hierarchical Navigable Small World).
4747

48+
New indices with 384 or more dimensions will default to BBQ HNSW automatically for optimal performance and memory efficiency.
49+
4850
Learn more about how BBQ works, supported algorithms, and configuration examples in the [Better Binary Quantization (BBQ) documentation](https://www.elastic.co/docs/reference/elasticsearch/index-settings/bbq).
4951

5052
::::{tip}

0 commit comments

Comments
 (0)