-
Notifications
You must be signed in to change notification settings - Fork 181
Add semantic_text index_options examples for BBQ quantization #3854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
bb1544f
754604e
dd7b107
9c9edf5
7bb6a2f
8bb10f3
1a1a562
3174a6a
ba95752
c1e272a
affde45
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -29,6 +29,10 @@ The mapping of the destination index - the index that contains the embeddings th | |||||||||
|
|
||||||||||
| You can run {{infer}} either using the [Elastic {{infer-cap}} Service](/explore-analyze/elastic-inference/eis.md) or on your own ML-nodes. The following examples show you both scenarios. | ||||||||||
|
|
||||||||||
| ::::{tip} | ||||||||||
| For production deployments with dense vector embeddings, consider optimizing storage and performance using [`index_options`](#semantic-text-index-options). This allows you to configure quantization strategies like BBQ (Better Binary Quantization) that can reduce memory usage by up to 32x. | ||||||||||
ctindel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| :::: | ||||||||||
|
|
||||||||||
| :::::::{tab-set} | ||||||||||
|
|
||||||||||
| ::::::{tab-item} Using EIS on Serverless | ||||||||||
|
|
@@ -107,10 +111,151 @@ PUT semantic-embeddings | |||||||||
|
|
||||||||||
| ::::::: | ||||||||||
|
|
||||||||||
| To try the ELSER model on the Elastic Inference Service, explicitly set the `inference_id` to `.elser-2-elastic`. For instructions, refer to [Using `semantic_text` with ELSER on EIS](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#using-elser-on-eis). | ||||||||||
| To try the ELSER model on the Elastic Inference Service, explicitly set the `inference_id` to `.elser-2-elastic`. For instructions, refer to [Using `semantic_text` with ELSER on EIS](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#using-elser-on-eis). | ||||||||||
|
|
||||||||||
| ### Optimizing vector storage with `index_options` [semantic-text-index-options] | ||||||||||
|
|
||||||||||
| When using `semantic_text` with dense vector embeddings (such as E5 or other text embedding models), you can optimize storage and search performance by configuring `index_options` on the underlying `dense_vector` field. This is particularly useful for large-scale deployments. | ||||||||||
|
|
||||||||||
| The `index_options` parameter controls how vectors are indexed and stored. For dense vector embeddings, you can specify quantization strategies like Better Binary Quantization (BBQ) that significantly reduce memory footprint while maintaining search quality. For details on available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options). | ||||||||||
ctindel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
|
|
||||||||||
| ::::{tip} | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's refactor this by removing the TIP block and just using a subheading |
||||||||||
| For most production use cases with dense vector embeddings, using BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. Choose from: | ||||||||||
ctindel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
ctindel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| - `bbq_hnsw` - Best for most use cases (default for 384+ dimensions) | ||||||||||
| - `bbq_flat` - Simpler option for smaller datasets | ||||||||||
| - `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements (Elasticsearch 9.2+) | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||
| :::: | ||||||||||
|
|
||||||||||
| Here's an example using `semantic_text` with a text embedding inference endpoint and BBQ quantization: | ||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's add a heading to each example, this improves the scannability of the page, making it easier to digest and navigate. Keep the headings short as possible and start the sentence with a verb |
||||||||||
|
|
||||||||||
| ```console | ||||||||||
| PUT semantic-embeddings-optimized | ||||||||||
| { | ||||||||||
| "mappings": { | ||||||||||
| "properties": { | ||||||||||
| "content": { | ||||||||||
| "type": "semantic_text", | ||||||||||
| "inference_id": "my-e5-model", <1> | ||||||||||
ctindel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| "index_options": { | ||||||||||
| "dense_vector": { | ||||||||||
| "type": "bbq_hnsw" <2> | ||||||||||
| } | ||||||||||
| } | ||||||||||
| } | ||||||||||
| } | ||||||||||
| } | ||||||||||
| } | ||||||||||
| ``` | ||||||||||
|
|
||||||||||
| 1. Reference to a text embedding inference endpoint (e.g., E5, OpenAI, or Cohere embeddings). You must create this endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put). | ||||||||||
ctindel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||
| 2. Use Better Binary Quantization with HNSW indexing for optimal memory efficiency. This setting applies to the underlying `dense_vector` field that stores the embeddings. | ||||||||||
|
|
||||||||||
| You can also use `bbq_flat` for simpler datasets or when you don't need the HNSW graph: | ||||||||||
|
||||||||||
| You can also use `bbq_flat` for simpler datasets or when you don't need the HNSW graph: | |
| You can also use `bbq_flat` for simpler datasets where you need maximum accuracy at the expense of speed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ Resolved - Updated description at line 153: "You can also use bbq_flat for simpler datasets where you need maximum accuracy at the expense of speed:"
ctindel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
ctindel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a section-level applies_to which should be used immediately after a heading
Where it's situated right now makes it hard to tell what example it's referring to
It should be placed under the new heading for the example it's tagging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you accepted my previous suggestion, then this should be deleted.
| ```{applies_to} | |
| stack: ga 9.2 | |
| serverless: unavailable | |
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you add the subheadings as Liam suggests below, you can disregard this suggestion, since you can add this tag to the new section title.
ctindel marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a subheading here describing the example
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't correct, please, stop using AI without reading through to make sure its correct before committing.
HNSW graphs are directional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page now has two {note}s on top of each other which isn't ideal, and again we should use admonitions sparingly.
Also this message probably belongs at the top of this section. Consider moving it and seeing if it can be plain prose in the introduction, to avoid stacking admonitions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The `index_options` parameter is only applicable when using inference endpoints that produce dense vector embeddings (like E5, OpenAI embeddings, Cohere embeddings, etc.). It does not apply to sparse vector models like ELSER, which use a different internal representation. | |
| The `index_options` parameter is only applicable when using {{infer}} endpoints that produce dense vector embeddings (like E5, OpenAI embeddings, Cohere embeddings, etc.). It does not apply to sparse vector models like ELSER, which use a different internal representation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we need this tip as we already have the Optimizing vector storage with index_options subsection which is prominent.
We also need to be sparing in the use of admonitions because you really can't use more than a few in a single doc. They can make a page hard to maintain over time because of that budget. Often a large TIP can simply become a new subheading.