Skip to content

Commit 7f375f2

Browse files
kosabogiMikep86szabosteve
authored andcommitted
Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section (#137233)
* Consolidates the Troubleshooting section * Update docs/reference/elasticsearch/mapping-reference/semantic-text.md Co-authored-by: Mike Pellegrini <[email protected]> * Test applies switch tabs * Adds version specific examples * Syntax fix * Fixes syntax * Adds separate tabs for each versions * Adds only two version tahs * Deletes tabs, uses section level applies-to instead * Removes response block, adds intros and warning * Reorganizes content, adds applies_to where necessary * Update docs/reference/elasticsearch/mapping-reference/semantic-text.md Co-authored-by: Mike Pellegrini <[email protected]> * Update docs/reference/elasticsearch/mapping-reference/semantic-text.md Co-authored-by: Mike Pellegrini <[email protected]> * Adds additional information, changes admonition types * Update docs/reference/elasticsearch/mapping-reference/semantic-text.md Co-authored-by: István Zoltán Szabó <[email protected]> * Shorten tab titles --------- Co-authored-by: Mike Pellegrini <[email protected]> Co-authored-by: István Zoltán Szabó <[email protected]>
1 parent 7cb5687 commit 7f375f2

File tree

1 file changed

+116
-40
lines changed

1 file changed

+116
-40
lines changed

docs/reference/elasticsearch/mapping-reference/semantic-text.md

Lines changed: 116 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ the field mappings.
4848

4949
:::::::{tab-set}
5050

51-
::::::{tab-item} Using the default ELSER on EIS endpoint on Serverless
51+
::::::{tab-item} Default ELSER on EIS endpoint on {{serverless-short}}
5252

5353
```{applies_to}
5454
serverless: ga
@@ -72,7 +72,7 @@ PUT my-index-000001
7272

7373
::::::
7474

75-
::::::{tab-item} Using the preconfigured ELSER on EIS endpoint in Cloud
75+
::::::{tab-item} Preconfigured ELSER on EIS endpoint in Cloud
7676

7777
```{applies_to}
7878
stack: ga 9.2
@@ -98,7 +98,7 @@ PUT my-index-000001
9898

9999
::::::
100100

101-
::::::{tab-item} Using the default ELSER endpoint
101+
::::::{tab-item} Default ELSER endpoint
102102

103103
If you use the preconfigured `.elser-2-elasticsearch` endpoint, you can set up `semantic_text` with the following API request:
104104

@@ -544,9 +544,14 @@ stack: ga 9.2
544544
serverless: ga
545545
```
546546

547+
:::{important}
548+
Starting with {{es}} 9.2, the recommended method for retrieving embeddings has changed from that used in previous versions.
549+
For instructions on retrieving embeddings in versions earlier than 9.2, refer to [Returning semantic field embeddings using `fields`](#return-embeddings-fields).
550+
:::
551+
547552
By default, the embeddings generated for `semantic_text` fields are stored internally and **not included in `_source`** when retrieving documents.
548553

549-
To include the full inference fields, including their embeddings, in `_source`, set the `_source.exclude_vectors` option to `false`.
554+
To include the full {{infer}} fields, including their embeddings, in `_source`, set the `_source.exclude_vectors` option to `false`.
550555
This works with the
551556
[Get](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-get),
552557
[Search](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search),
@@ -565,18 +570,23 @@ POST my-index/_search
565570
}
566571
}
567572
```
568-
% TEST[skip:Requires inference endpoint]
573+
% TEST[skip:Requires {{infer}} endpoint]
569574

570575
The embeddings will appear under `_inference_fields` in `_source`.
571576

572577
**Use cases**
578+
573579
Including embeddings in `_source` is useful when you want to:
574580

575581
* Reindex documents into another index **with the same `inference_id`** without re-running inference.
576582
* Export or migrate documents while preserving their embeddings.
577583
* Inspect or debug the raw embeddings generated for your content.
578584

579585
### Example: Reindex while preserving embeddings
586+
```{applies_to}
587+
stack: ga 9.2
588+
serverless: ga
589+
```
580590

581591
```console
582592
POST _reindex
@@ -592,7 +602,7 @@ POST _reindex
592602
}
593603
}
594604
```
595-
% TEST[skip:Requires inference endpoint]
605+
% TEST[skip:Requires {{infer}} endpoint]
596606

597607
1. Sends the source documents with their stored embeddings to the destination index.
598608

@@ -602,16 +612,110 @@ the documents will **fail the reindex task**.
602612
Matching `inference_id` values are required to reuse the existing embeddings.
603613
::::
604614

605-
This allows documents to be re-indexed without triggering inference again, **as long as the target `semantic_text` field uses the same `inference_id` as the source**.
615+
This allows documents to be re-indexed without triggering {{infer}} again, **as long as the target `semantic_text` field uses the same `inference_id` as the source**.
606616

607-
::::{note}
608-
**For versions prior to 9.2.0**
617+
### Example: Troubleshooting semantic_text fields [troubleshooting-semantic-text-fields]
618+
```{applies_to}
619+
stack: ga 9.2
620+
serverless: ga
621+
```
622+
623+
To verify that your embeddings look correct, you can retrieve the {{infer}} data that `semantic_text` normally hides from search results.
609624

610-
Older versions do not support the `exclude_vectors` option to retrieve the embeddings of the semantic text fields.
611-
To return the `_inference_fields`, use the `fields` option in a search request instead:
625+
To retrieve the stored embeddings in {{es}} 9.2 and later, set the `exclude_vectors` parameter to `false` in the `_source` field. This ensures that the vector data, which is excluded by default, is included in the search response.
612626

613627
```console
614628
POST test-index/_search
629+
{
630+
"_source": {
631+
"exclude_vectors": false
632+
},
633+
"query": {
634+
"match": {
635+
"my_semantic_field": "Which country is Paris in?"
636+
}
637+
}
638+
}
639+
```
640+
% TEST[skip:Requires {{infer}} endpoint]
641+
642+
This will return verbose chunked embeddings content that is used to perform
643+
semantic search for `semantic_text` fields:
644+
645+
```console-response
646+
{
647+
"took": 18,
648+
"timed_out": false,
649+
"_shards": {
650+
"total": 1,
651+
"successful": 1,
652+
"skipped": 0,
653+
"failed": 0
654+
},
655+
"hits": {
656+
"total": { "value": 1, "relation": "eq" },
657+
"max_score": 16.532316,
658+
"hits": [
659+
{
660+
"_index": "test-index",
661+
"_id": "1",
662+
"_score": 16.532316,
663+
"_source": {
664+
"my_semantic_field": "Paris is the capital of France.",
665+
"_inference_fields": {
666+
"my_semantic_field": {
667+
"inference": {
668+
"inference_id": ".elser-2-elasticsearch", <1>
669+
"model_settings": { <2>
670+
"service": "elasticsearch",
671+
"task_type": "sparse_embedding"
672+
},
673+
"chunks": {
674+
"my_semantic_field": [
675+
{
676+
"start_offset": 0,
677+
"end_offset": 31,
678+
"embeddings": { <3>
679+
"airport": 0.12011719,
680+
"brussels": 0.032836914,
681+
"capital": 2.1328125,
682+
"capitals": 0.6386719,
683+
"capitol": 1.2890625,
684+
"cities": 0.78125,
685+
"city": 1.265625,
686+
"continent": 0.26953125,
687+
"country": 0.59765625,
688+
...
689+
}
690+
}
691+
]
692+
}
693+
}
694+
}
695+
}
696+
}
697+
}
698+
]
699+
}
700+
}
701+
```
702+
% TEST[skip:Requires {{infer}} endpoint]
703+
1. The {{infer}} endpoint used to generate embeddings.
704+
2. Lists details about the model used to generate embeddings, such as the service name and task type.
705+
3. The embeddings generated for this chunk.
706+
707+
## Returning semantic field embeddings using `fields` [return-embeddings-fields]
708+
709+
:::{important}
710+
This method for returning semantic field embeddings is recommended only for {{es}} versions earlier than 9.2.
711+
For version 9.2 and later, use the [`exclude_vectors`](#troubleshooting-semantic-text-fields) parameter instead.
712+
:::
713+
714+
To retrieve stored embeddings, use the `fields` parameter with `_inference_fields`. This lets you include the vector data that is not shown by default in the response.
715+
The `fields` parameter only works with the `_search` endpoint.
716+
717+
```console
718+
POST my-index/_search
615719
{
616720
"query": {
617721
"match": {
@@ -623,11 +727,7 @@ POST test-index/_search
623727
]
624728
}
625729
```
626-
% TEST[skip:Requires inference endpoint]
627-
628-
This returns the chunked embeddings used for semantic search under `_inference_fields` in `_source`.
629-
Note that the `fields` option is **not** available for the Reindex API.
630-
::::
730+
% TEST[skip:Requires {{infer}} endpoint]
631731

632732
## Customizing `semantic_text` indexing [custom-indexing]
633733

@@ -741,30 +841,6 @@ You can query `semantic_text` fields using the following query types:
741841

742842
- [Semantic query](/reference/query-languages/query-dsl/query-dsl-semantic-query.md): We don't recommend this legacy query type for _new_ projects, because the alternatives in this list enable more flexibility and customization. The `semantic` query remains available to support existing implementations.
743843

744-
745-
## Troubleshooting semantic_text fields [troubleshooting-semantic-text-fields]
746-
747-
If you want to verify that your embeddings look correct, you can view the
748-
inference data that `semantic_text` typically hides using `fields`.
749-
750-
```console
751-
POST test-index/_search
752-
{
753-
"query": {
754-
"match": {
755-
"my_semantic_field": "Which country is Paris in?"
756-
}
757-
},
758-
"fields": [
759-
"_inference_fields"
760-
]
761-
}
762-
```
763-
% TEST[skip:Requires inference endpoint]
764-
765-
This will return verbose chunked embeddings content that is used to perform
766-
semantic search for `semantic_text` fields.
767-
768844
### Document count discrepancy in `_cat/indices`
769845

770846
When an index contains a `semantic_text` field, the `docs.count` value returned by the [`_cat/indices`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-indices) API may be higher than the number of documents you indexed.

0 commit comments

Comments
 (0)