-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Consolidates troubleshooting content into the "Returning semantic field embeddings in _source" section #137233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
3ecd9c7
6b49e51
7dae585
fb44113
96d7183
fd7eb83
3767283
d26917b
7fbdb71
fc06c3f
4db8fb2
e2a1224
cabbfee
4ff57b9
5be7634
60b518d
037bbf7
576d820
457e7b8
511f019
a07dcb1
6618e0d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -449,7 +449,7 @@ serverless: ga | |
|
|
||
| By default, the embeddings generated for `semantic_text` fields are stored internally and **not included in `_source`** when retrieving documents. | ||
|
|
||
| To include the full inference fields, including their embeddings, in `_source`, set the `_source.exclude_vectors` option to `false`. | ||
| To include the full {{infer}} fields, including their embeddings, in `_source`, set the `_source.exclude_vectors` option to `false`. | ||
| This works with the | ||
| [Get](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-get), | ||
| [Search](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search), | ||
|
|
@@ -468,11 +468,12 @@ POST my-index/_search | |
| } | ||
| } | ||
| ``` | ||
| % TEST[skip:Requires inference endpoint] | ||
| % TEST[skip:Requires {{infer}} endpoint] | ||
|
|
||
| The embeddings will appear under `_inference_fields` in `_source`. | ||
|
|
||
| **Use cases** | ||
|
|
||
| Including embeddings in `_source` is useful when you want to: | ||
|
|
||
| * Reindex documents into another index **with the same `inference_id`** without re-running inference. | ||
|
|
@@ -495,7 +496,7 @@ POST _reindex | |
| } | ||
| } | ||
| ``` | ||
| % TEST[skip:Requires inference endpoint] | ||
| % TEST[skip:Requires {{infer}} endpoint] | ||
|
|
||
| 1. Sends the source documents with their stored embeddings to the destination index. | ||
|
|
||
|
|
@@ -505,7 +506,7 @@ the documents will **fail the reindex task**. | |
| Matching `inference_id` values are required to reuse the existing embeddings. | ||
| :::: | ||
|
|
||
| This allows documents to be re-indexed without triggering inference again, **as long as the target `semantic_text` field uses the same `inference_id` as the source**. | ||
| This allows documents to be re-indexed without triggering {{infer}} again, **as long as the target `semantic_text` field uses the same `inference_id` as the source**. | ||
|
|
||
| ::::{note} | ||
| **For versions prior to 9.2.0** | ||
|
|
@@ -514,7 +515,7 @@ Older versions do not support the `exclude_vectors` option to retrieve the embed | |
| To return the `_inference_fields`, use the `fields` option in a search request instead: | ||
|
|
||
| ```console | ||
| POST test-index/_search | ||
| POST my-index/_search | ||
| { | ||
| "query": { | ||
| "match": { | ||
|
|
@@ -526,12 +527,112 @@ POST test-index/_search | |
| ] | ||
| } | ||
| ``` | ||
| % TEST[skip:Requires inference endpoint] | ||
| % TEST[skip:Requires {{infer}} endpoint] | ||
|
|
||
| This returns the chunked embeddings used for semantic search under `_inference_fields` in `_source`. | ||
| Note that the `fields` option is **not** available for the Reindex API. | ||
| :::: | ||
|
|
||
| ### Example: Troubleshooting semantic_text fields [troubleshooting-semantic-text-fields] | ||
|
|
||
| ::::{applies-switch} | ||
| :::{applies-item} { "stack": "preview 9.0" } | ||
| Content for 9.0 version | ||
| ::: | ||
| :::{applies-item} { "stack": "ga 9.1" } | ||
| Content for 9.1 version | ||
| ::: | ||
| :::: | ||
|
|
||
| ::::{applies-switch} | ||
| :::{applies-item} stack: preview 9.0 | ||
| Other content for 9.0 version | ||
| ::: | ||
| :::{applies-item} stack: ga 9.1 | ||
| Other content for 9.1 version | ||
| ::: | ||
| :::: | ||
|
|
||
| If you want to verify that your embeddings look correct, you can view the | ||
| {{infer}} data that `semantic_text` typically hides using `fields`. | ||
|
|
||
| ```console | ||
| POST my-index/_search | ||
| { | ||
| "query": { | ||
| "match": { | ||
| "my_semantic_field": "Which country is Paris in?" | ||
| } | ||
| }, | ||
| "fields": [ | ||
| "_inference_fields" | ||
| ] | ||
|
||
| } | ||
| ``` | ||
| % TEST[skip:Requires {{infer}} endpoint] | ||
|
|
||
| This will return verbose chunked embeddings content that is used to perform | ||
| semantic search for `semantic_text` fields. | ||
|
|
||
| ```console-response | ||
| { | ||
| "took": 179, | ||
| "timed_out": false, | ||
| "_shards": { | ||
| "total": 1, | ||
| "successful": 1, | ||
| "skipped": 0, | ||
| "failed": 0 | ||
| }, | ||
| "hits": { | ||
| "total": { "value": 1, "relation": "eq" }, | ||
| "max_score": 16.532316, | ||
| "hits": [ | ||
| { | ||
| "_index": "test-index", | ||
| "_id": "1", | ||
| "_score": 16.532316, | ||
| "_source": { | ||
| "my_semantic_field": "Paris is the capital of France.", | ||
| "_inference_fields": { | ||
| "my_semantic_field": { | ||
| "inference": { | ||
| "inference_id": ".elser-2-elasticsearch", <1> | ||
| "model_settings": { <2> | ||
| "service": "elasticsearch", | ||
| "task_type": "sparse_embedding" | ||
| }, | ||
| "chunks": { | ||
| "my_semantic_field": [ | ||
| { | ||
| "start_offset": 0, | ||
| "end_offset": 31, | ||
| "embeddings": { <3> | ||
| "paris": 2.5234375, | ||
| "france": 2.0, | ||
| "capital": 2.1328125, | ||
| "city": 1.265625, | ||
| "country": 0.59765625, | ||
| ... | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| ] | ||
| } | ||
| } | ||
| ``` | ||
| % TEST[skip:Requires {{infer}} endpoint] | ||
| 1. The {{infer}} endpoint used to generate embeddings. | ||
| 2. Lists details about the model used to generate embeddings, such as the service name and task type. | ||
| 3. The embeddings generated for this chunk. | ||
|
|
||
|
|
||
kosabogi marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| ## Customizing `semantic_text` indexing [custom-indexing] | ||
|
|
||
| `semantic_text` uses defaults for indexing data based on the {{infer}} endpoint | ||
|
|
@@ -656,30 +757,6 @@ You can query `semantic_text` fields using the following query types: | |
|
|
||
| - [Semantic query](/reference/query-languages/query-dsl/query-dsl-semantic-query.md): We don't recommend this legacy query type for _new_ projects, because the alternatives in this list enable more flexibility and customization. The `semantic` query remains available to support existing implementations. | ||
|
|
||
|
|
||
| ## Troubleshooting semantic_text fields [troubleshooting-semantic-text-fields] | ||
|
|
||
| If you want to verify that your embeddings look correct, you can view the | ||
| inference data that `semantic_text` typically hides using `fields`. | ||
|
|
||
| ```console | ||
| POST test-index/_search | ||
| { | ||
| "query": { | ||
| "match": { | ||
| "my_semantic_field": "Which country is Paris in?" | ||
| } | ||
| }, | ||
| "fields": [ | ||
| "_inference_fields" | ||
| ] | ||
| } | ||
| ``` | ||
| % TEST[skip:Requires inference endpoint] | ||
|
|
||
| This will return verbose chunked embeddings content that is used to perform | ||
| semantic search for `semantic_text` fields. | ||
|
|
||
| ### Document count discrepancy in `_cat/indices` | ||
|
|
||
| When an index contains a `semantic_text` field, the `docs.count` value returned by the [`_cat/indices`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cat-indices) API may be higher than the number of documents you indexed. | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.