Skip to content
Open
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,7 @@ AGENTS.md
.github/instructions/**.instructions.md
CLAUDE.md
GEMINI.md
.cursor
.cursor
.claude
.claude-flow
.hive-mind
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't even know what .claude-flow and .hive-mind are 😄

149 changes: 147 additions & 2 deletions solutions/search/semantic-search/semantic-search-semantic-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,10 @@ The mapping of the destination index - the index that contains the embeddings th

You can run {{infer}} either using the [Elastic {{infer-cap}} Service](/explore-analyze/elastic-inference/eis.md) or on your own ML-nodes. The following examples show you both scenarios.

::::{tip}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need this tip as we already have the Optimizing vector storage with index_options subsection which is prominent.

We also need to be sparing in the use of admonitions because you really can't use more than a few in a single doc. They can make a page hard to maintain over time because of that budget. Often a large TIP can simply become a new subheading.

For production deployments with dense vector embeddings, consider optimizing storage and performance using [`index_options`](#semantic-text-index-options). This allows you to configure quantization strategies like BBQ (Better Binary Quantization) that can reduce memory usage by up to 32x. Note that with newer indices, `semantic_text` will default to BBQ HNSW automatically.
::::

:::::::{tab-set}

::::::{tab-item} Using EIS on Serverless
Expand Down Expand Up @@ -107,10 +111,151 @@ PUT semantic-embeddings

:::::::

To try the ELSER model on the Elastic Inference Service, explicitly set the `inference_id` to `.elser-2-elastic`. For instructions, refer to [Using `semantic_text` with ELSER on EIS](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#using-elser-on-eis).
To try the ELSER model on the Elastic Inference Service, explicitly set the `inference_id` to `.elser-2-elastic`. For instructions, refer to [Using `semantic_text` with ELSER on EIS](https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/semantic-text#using-elser-on-eis).

### Optimizing vector storage with `index_options` [semantic-text-index-options]

When using `semantic_text` with dense vector embeddings (such as E5 or other text embedding models), you can optimize storage and search performance by configuring `index_options` on the underlying `dense_vector` field. This is particularly useful for large-scale deployments.

The `index_options` parameter controls how vectors are indexed and stored. For dense vector embeddings, you can specify [quantization strategies](https://www.elastic.co/blog/vector-search-elasticsearch-rationale) like Better Binary Quantization (BBQ) that significantly reduce memory footprint while maintaining search quality. Quantization compresses high-dimensional vectors into more efficient representations, enabling faster searches and lower memory consumption. For details on available options and their trade-offs, refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).

::::{tip}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refactor this by removing the TIP block and just using a subheading

For most production use cases using `semantic_text` with dense vector embeddings from text models (like E5, OpenAI, or Cohere), BBQ is recommended as it provides up to 32x memory reduction with minimal accuracy loss. BBQ requires a minimum of 64 dimensions and works best with text embeddings (it may not perform well with other types like image embeddings). Choose from:
- `bbq_hnsw` - Best for most use cases (default for 384+ dimensions)
- `bbq_flat` - Simpler option for smaller datasets
- `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements (Elasticsearch 9.2+)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements (Elasticsearch 9.2+)
- `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements ({{es}} 9.2+)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements (Elasticsearch 9.2+)
- `bbq_disk` - Disk-based storage for very large datasets with minimal memory requirements {applies_to}`stack: 9.2`

::::

Here's an example using `semantic_text` with a text embedding inference endpoint and BBQ quantization:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Here's an example using `semantic_text` with a text embedding inference endpoint and BBQ quantization:
Here's an example using `semantic_text` with a text embedding {{infer}} endpoint and BBQ quantization:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a heading to each example, this improves the scannability of the page, making it easier to digest and navigate.

Keep the headings short as possible and start the sentence with a verb


```console
PUT semantic-embeddings-optimized
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch", <1>
"index_options": {
"dense_vector": {
"type": "bbq_hnsw" <2>
}
}
}
}
}
}
```

1. Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint that is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Reference to a text embedding inference endpoint. This example uses the built-in E5 endpoint that is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).
1. Reference to a text embedding {{infer}} endpoint. This example uses the built-in E5 endpoint that is automatically available. For custom models, you must create the endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put).

2. Use Better Binary Quantization with HNSW indexing for optimal memory efficiency. This setting applies to the underlying `dense_vector` field that stores the embeddings.

You can also use `bbq_flat` for smaller datasets where you need maximum accuracy at the expense of speed:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a subheading here describing the example


```console
PUT semantic-embeddings-flat
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_flat" <1>
}
}
}
}
}
}
```

1. Use disk-optimized BBQ for simpler use cases with fewer vectors. This requires less compute resources during indexing.

For very large datasets where RAM is constrained, use `bbq_disk` (DiskBBQ) to minimize memory usage:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a subheading here describing the example

Copy link
Contributor

@szabosteve szabosteve Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The placement of the applies_to tag below is a bit unusual – after the code block, before the annotation note – I understand that if we put it above the example, it would seem to cover the rest of the examples. For this reason, I'd suggest using inline applies_to tags here.

Suggested change
For very large datasets where RAM is constrained, use `bbq_disk` (DiskBBQ) to minimize memory usage:
{applies_to}`serverless: unavailable` {applies_to}`stack: ga 9.2` For very large datasets where RAM is constrained, use `bbq_disk` (DiskBBQ) to minimize memory usage:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add the subheadings as Liam suggests below, you can disregard this suggestion, since you can add the section-level applies to tag starting in line 197 to the new section title.


```console
PUT semantic-embeddings-disk
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_disk" <1>
}
}
}
}
}
}
```

```{applies_to}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a section-level applies_to which should be used immediately after a heading

Where it's situated right now makes it hard to tell what example it's referring to

It should be placed under the new heading for the example it's tagging

stack: ga 9.2
serverless: unavailable
```

Comment on lines +197 to +201
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you accepted my previous suggestion, then this should be deleted.

Suggested change
```{applies_to}
stack: ga 9.2
serverless: unavailable
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you add the subheadings as Liam suggests below, you can disregard this suggestion, since you can add this tag to the new section title.

1. Use DiskBBQ when RAM is limited. Available in Elasticsearch 9.2+, this option keeps vectors in compressed form on disk and only loads/decompresses small portions on-demand during queries. Unlike standard HNSW indexes (which rely on filesystem cache to load vectors into memory for fast search), DiskBBQ dramatically reduces RAM requirements by avoiding the need to cache vectors in memory. This enables vector search on much larger datasets with minimal memory, though queries will be slower compared to in-memory approaches.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. Use DiskBBQ when RAM is limited. Available in Elasticsearch 9.2+, this option keeps vectors in compressed form on disk and only loads/decompresses small portions on-demand during queries. Unlike standard HNSW indexes (which rely on filesystem cache to load vectors into memory for fast search), DiskBBQ dramatically reduces RAM requirements by avoiding the need to cache vectors in memory. This enables vector search on much larger datasets with minimal memory, though queries will be slower compared to in-memory approaches.
1. Use DiskBBQ when RAM is limited. Available in {{es}} 9.2+, this option keeps vectors in compressed form on disk and only loads/decompresses small portions on-demand during queries. Unlike standard HNSW indexes (which rely on filesystem cache to load vectors into memory for fast search), DiskBBQ dramatically reduces RAM requirements by avoiding the need to cache vectors in memory. This enables vector search on much larger datasets with minimal memory, though queries will be slower compared to in-memory approaches.


Other quantization options include `int8_hnsw` (8-bit integer quantization) and `int4_hnsw` (4-bit integer quantization):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a subheading here describing the example


```console
PUT semantic-embeddings-int8
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "int8_hnsw" <1>
}
}
}
}
}
}
```

1. Use 8-bit integer quantization for 4x memory reduction with high accuracy retention. For 4-bit quantization, use `"type": "int4_hnsw"` instead, which provides 8x memory reduction. For the full list of other available quantization options (including `int4_flat` and others), refer to the [`dense_vector` `index_options` documentation](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options).

For HNSW-specific tuning parameters like `m` and `ef_construction`, you can include them in the `index_options`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a subheading here describing the example


```console
PUT semantic-embeddings-custom
{
"mappings": {
"properties": {
"content": {
"type": "semantic_text",
"inference_id": ".multilingual-e5-small-elasticsearch",
"index_options": {
"dense_vector": {
"type": "bbq_hnsw",
"m": 32, <1>
"ef_construction": 200 <2>
}
}
}
}
}
}
```

1. Number of bidirectional links per node in the HNSW graph. Higher values improve recall but increase memory usage. Default is 16.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't correct, please, stop using AI without reading through to make sure its correct before committing.

HNSW graphs are directional.

2. Number of candidates considered during graph construction. Higher values improve index quality but slow down indexing. Default is 100.

::::{note}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page now has two {note}s on top of each other which isn't ideal, and again we should use admonitions sparingly.

Also this message probably belongs at the top of this section. Consider moving it and seeing if it can be plain prose in the introduction, to avoid stacking admonitions.

The `index_options` parameter is only applicable when using inference endpoints that produce dense vector embeddings (like E5, OpenAI embeddings, Cohere embeddings, etc.). It does not apply to sparse vector models like ELSER, which use a different internal representation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The `index_options` parameter is only applicable when using inference endpoints that produce dense vector embeddings (like E5, OpenAI embeddings, Cohere embeddings, etc.). It does not apply to sparse vector models like ELSER, which use a different internal representation.
The `index_options` parameter is only applicable when using {{infer}} endpoints that produce dense vector embeddings (like E5, OpenAI embeddings, Cohere embeddings, etc.). It does not apply to sparse vector models like ELSER, which use a different internal representation.

::::

::::{note}
If youre using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, youll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data.
If you're using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, you'll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data.

::::

Expand Down
6 changes: 6 additions & 0 deletions solutions/search/vector/dense-vector.md
Original file line number Diff line number Diff line change
Expand Up @@ -45,4 +45,10 @@ For more information about how the profile affects virtual compute unit (VCU) al

Better Binary Quantization (BBQ) is an advanced vector quantization technique for `dense_vector` fields. It compresses embeddings into compact binary form, enabling faster similarity search and reducing memory usage. This improves both search relevance and cost efficiency, especially when used with HNSW (Hierarchical Navigable Small World).

New indices with 384 or more dimensions will default to BBQ HNSW automatically for optimal performance and memory efficiency.

Learn more about how BBQ works, supported algorithms, and configuration examples in the [Better Binary Quantization (BBQ) documentation](https://www.elastic.co/docs/reference/elasticsearch/index-settings/bbq).

::::{tip}
When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md), you can configure BBQ and other quantization options through the `index_options` parameter. See [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples of using `bbq_hnsw`, `int8_hnsw`, and other quantization strategies with semantic text fields.
::::
6 changes: 5 additions & 1 deletion solutions/search/vector/knn.md
Original file line number Diff line number Diff line change
Expand Up @@ -134,7 +134,11 @@ Support for approximate kNN search was added in version 8.0. Before 8.0, `dense_

For approximate kNN, {{es}} stores dense vector values per segment as an [HNSW graph](https://arxiv.org/abs/1603.09320). Building HNSW graphs is compute-intensive, so indexing vectors can take time; you may need to increase client request timeouts for index and bulk operations. The [approximate kNN tuning guide](/deploy-manage/production-guidance/optimize-performance/approximate-knn-search.md) covers indexing performance, sizing, and configuration trade-offs that affect search performance.

In addition to search-time parameters, HNSW exposes index-time settings that balance graph build cost, search speed, and accuracy. When defining your `dense_vector` mapping, use [`index_options`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) to set these parameters:
In addition to search-time parameters, HNSW exposes index-time settings that balance graph build cost, search speed, and accuracy. When defining your `dense_vector` mapping, use [`index_options`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-index-options) to set these parameters.

::::{tip}
When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md) with dense vector embeddings, you can also configure `index_options` directly on the field. See [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md) with dense vector embeddings, you can also configure `index_options` directly on the field. See [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples.
When using the [`semantic_text` field type](../semantic-search/semantic-search-semantic-text.md) with dense vector embeddings, you can also configure `index_options` directly on the field. Refer to [Optimizing vector storage with `index_options`](../semantic-search/semantic-search-semantic-text.md#semantic-text-index-options) for examples.

::::

```console
PUT image-index
Expand Down
Loading