Skip to content
Merged
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
3ecd9c7
Consolidates the Troubleshooting section
kosabogi Oct 28, 2025
6b49e51
Merge branch 'main' into semantic-blend-troubleshooting-section
kosabogi Oct 28, 2025
7dae585
Merge branch 'main' into semantic-blend-troubleshooting-section
kosabogi Oct 29, 2025
fb44113
Update docs/reference/elasticsearch/mapping-reference/semantic-text.md
kosabogi Oct 30, 2025
96d7183
Test applies switch tabs
kosabogi Oct 30, 2025
fd7eb83
Adds version specific examples
kosabogi Oct 31, 2025
3767283
Syntax fix
kosabogi Oct 31, 2025
d26917b
Fixes syntax
kosabogi Oct 31, 2025
7fbdb71
Merge branch 'main' into semantic-blend-troubleshooting-section
kosabogi Oct 31, 2025
fc06c3f
Adds separate tabs for each versions
kosabogi Oct 31, 2025
4db8fb2
Adds only two version tahs
kosabogi Oct 31, 2025
e2a1224
Deletes tabs, uses section level applies-to instead
kosabogi Oct 31, 2025
cabbfee
Merge branch 'main' into semantic-blend-troubleshooting-section
kosabogi Oct 31, 2025
4ff57b9
Removes response block, adds intros and warning
kosabogi Nov 3, 2025
5be7634
Reorganizes content, adds applies_to where necessary
kosabogi Nov 4, 2025
60b518d
Merge branch 'main' into semantic-blend-troubleshooting-section
kosabogi Nov 4, 2025
037bbf7
Update docs/reference/elasticsearch/mapping-reference/semantic-text.md
kosabogi Nov 5, 2025
576d820
Update docs/reference/elasticsearch/mapping-reference/semantic-text.md
kosabogi Nov 5, 2025
457e7b8
Adds additional information, changes admonition types
kosabogi Nov 5, 2025
511f019
Update docs/reference/elasticsearch/mapping-reference/semantic-text.md
kosabogi Nov 6, 2025
a07dcb1
Shorten tab titles
kosabogi Nov 6, 2025
6618e0d
Merge branch 'main' into semantic-blend-troubleshooting-section
kosabogi Nov 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
156 changes: 116 additions & 40 deletions docs/reference/elasticsearch/mapping-reference/semantic-text.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ the field mappings.

:::::::{tab-set}

::::::{tab-item} Using the default ELSER on EIS endpoint on Serverless
::::::{tab-item} Default ELSER on EIS endpoint on {{serverless-short}}

```{applies_to}
serverless: ga
Expand All @@ -72,7 +72,7 @@ PUT my-index-000001

::::::

::::::{tab-item} Using the preconfigured ELSER on EIS endpoint in Cloud
::::::{tab-item} Preconfigured ELSER on EIS endpoint in Cloud

```{applies_to}
stack: ga 9.2
Expand All @@ -98,7 +98,7 @@ PUT my-index-000001

::::::

::::::{tab-item} Using the default ELSER endpoint
::::::{tab-item} Default ELSER endpoint

If you use the preconfigured `.elser-2-elasticsearch` endpoint, you can set up `semantic_text` with the following API request:

Expand Down Expand Up @@ -544,9 +544,14 @@ stack: ga 9.2
serverless: ga
```

:::{important}
Starting with {{es}} 9.2, the recommended method for retrieving embeddings has changed from that used in previous versions.
For instructions on retrieving embeddings in versions earlier than 9.2, refer to [Returning semantic field embeddings using `fields`](#return-embeddings-fields).
:::

By default, the embeddings generated for `semantic_text` fields are stored internally and **not included in `_source`** when retrieving documents.

To include the full inference fields, including their embeddings, in `_source`, set the `_source.exclude_vectors` option to `false`.
To include the full {{infer}} fields, including their embeddings, in `_source`, set the `_source.exclude_vectors` option to `false`.
This works with the
[Get](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-get),
[Search](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search),
Expand All @@ -565,18 +570,23 @@ POST my-index/_search
}
}
```
% TEST[skip:Requires inference endpoint]
% TEST[skip:Requires {{infer}} endpoint]

The embeddings will appear under `_inference_fields` in `_source`.

**Use cases**

Including embeddings in `_source` is useful when you want to:

* Reindex documents into another index **with the same `inference_id`** without re-running inference.
* Export or migrate documents while preserving their embeddings.
* Inspect or debug the raw embeddings generated for your content.

### Example: Reindex while preserving embeddings
```{applies_to}
stack: ga 9.2
serverless: ga
```

```console
POST _reindex
Expand All @@ -592,7 +602,7 @@ POST _reindex
}
}
```
% TEST[skip:Requires inference endpoint]
% TEST[skip:Requires {{infer}} endpoint]

1. Sends the source documents with their stored embeddings to the destination index.

Expand All @@ -602,16 +612,110 @@ the documents will **fail the reindex task**.
Matching `inference_id` values are required to reuse the existing embeddings.
::::

This allows documents to be re-indexed without triggering inference again, **as long as the target `semantic_text` field uses the same `inference_id` as the source**.
This allows documents to be re-indexed without triggering {{infer}} again, **as long as the target `semantic_text` field uses the same `inference_id` as the source**.

::::{note}
**For versions prior to 9.2.0**
### Example: Troubleshooting semantic_text fields [troubleshooting-semantic-text-fields]
```{applies_to}
stack: ga 9.2
serverless: ga
```

To verify that your embeddings look correct, you can retrieve the {{infer}} data that `semantic_text` normally hides from search results.

Older versions do not support the `exclude_vectors` option to retrieve the embeddings of the semantic text fields.
To return the `_inference_fields`, use the `fields` option in a search request instead:
To retrieve the stored embeddings in {{es}} 9.2 and later, set the `exclude_vectors` parameter to `false` in the `_source` field. This ensures that the vector data, which is excluded by default, is included in the search response.

```console
POST test-index/_search
{
"_source": {
"exclude_vectors": false
},
"query": {
"match": {
"my_semantic_field": "Which country is Paris in?"
}
}
}
```
% TEST[skip:Requires {{infer}} endpoint]

This will return verbose chunked embeddings content that is used to perform
semantic search for `semantic_text` fields:

```console-response
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": { "value": 1, "relation": "eq" },
"max_score": 16.532316,
"hits": [
{
"_index": "test-index",
"_id": "1",
"_score": 16.532316,
"_source": {
"my_semantic_field": "Paris is the capital of France.",
"_inference_fields": {
"my_semantic_field": {
"inference": {
"inference_id": ".elser-2-elasticsearch", <1>
"model_settings": { <2>
"service": "elasticsearch",
"task_type": "sparse_embedding"
},
"chunks": {
"my_semantic_field": [
{
"start_offset": 0,
"end_offset": 31,
"embeddings": { <3>
"airport": 0.12011719,
"brussels": 0.032836914,
"capital": 2.1328125,
"capitals": 0.6386719,
"capitol": 1.2890625,
"cities": 0.78125,
"city": 1.265625,
"continent": 0.26953125,
"country": 0.59765625,
...
}
}
]
}
}
}
}
}
}
]
}
}
```
% TEST[skip:Requires {{infer}} endpoint]
1. The {{infer}} endpoint used to generate embeddings.
2. Lists details about the model used to generate embeddings, such as the service name and task type.
3. The embeddings generated for this chunk.

## Returning semantic field embeddings using `fields` [return-embeddings-fields]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to add an applies_to tag that lists 9.0 and 9.1 and marks it unavailable from 9.2?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - I didn’t want to mark this as unavailable from 9.2, since it’s not technically unavailable, it’s just not recommended to use.
I thought specifically listing 9.0 and 9.1 might be a bit misleading, as it could imply that this parameter is only available starting from 9.0. But I’m open to adding the 9.0 and 9.1 tags here, I was thinking about that myself too. Do you think it would be better?

:::{important}
This method for returning semantic field embeddings is recommended only for {{es}} versions earlier than 9.2.
For version 9.2 and later, use the [`exclude_vectors`](#troubleshooting-semantic-text-fields) parameter instead.
:::

To retrieve stored embeddings, use the `fields` parameter with `_inference_fields`. This lets you include the vector data that is not shown by default in the response.
The `fields` parameter only works with the `_search` endpoint.

```console
POST my-index/_search
{
"query": {
"match": {
Expand All @@ -623,11 +727,7 @@ POST test-index/_search
]
}
```
% TEST[skip:Requires inference endpoint]

This returns the chunked embeddings used for semantic search under `_inference_fields` in `_source`.
Note that the `fields` option is **not** available for the Reindex API.
::::
% TEST[skip:Requires {{infer}} endpoint]

## Customizing `semantic_text` indexing [custom-indexing]

Expand Down Expand Up @@ -741,30 +841,6 @@ You can query `semantic_text` fields using the following query types:

- [Semantic query](/reference/query-languages/query-dsl/query-dsl-semantic-query.md): We don't recommend this legacy query type for _new_ projects, because the alternatives in this list enable more flexibility and customization. The `semantic` query remains available to support existing implementations.


## Troubleshooting semantic_text fields [troubleshooting-semantic-text-fields]

If you want to verify that your embeddings look correct, you can view the
inference data that `semantic_text` typically hides using `fields`.

```console
POST test-index/_search
{
"query": {
"match": {
"my_semantic_field": "Which country is Paris in?"
}
},
"fields": [
"_inference_fields"
]
}
```
% TEST[skip:Requires inference endpoint]

This will return verbose chunked embeddings content that is used to perform
semantic search for `semantic_text` fields.

### Document count discrepancy in `_cat/indices`

When an index contains a `semantic_text` field, the `docs.count` value returned by the [`_cat/indices`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-indices) API may be higher than the number of documents you indexed.
Expand Down