-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[DOCS] Simplifies semantic_text tutorial by removing copy_to field #112864
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
8f7e348
[DOCS] Highlights that copy_to is not required for using semantic_text.
szabosteve dffb38a
[DOCS] Adds note about possible timeout to the ELSER tab of the infer…
szabosteve 2fc17a7
[DOCS] Simplifies wording.
szabosteve e243890
[DOCS] Further edits.
szabosteve 6d39fe9
[DOCS] Simplifies tutorial.
szabosteve a58c9fa
[DOCS] Adds link to hybrid search example.
szabosteve File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -59,24 +59,18 @@ If using the Python client, you can set the `timeout` parameter to a higher valu | |
[[semantic-text-index-mapping]] | ||
==== Create the index mapping | ||
|
||
The mapping of the destination index - the index that contains the embeddings | ||
that the inference endpoint will generate based on your input text - must be created. The | ||
destination index must have a field with the <<semantic-text,`semantic_text`>> | ||
field type to index the output of the used inference endpoint. | ||
The mapping of the destination index - the index that contains the embeddings that the inference endpoint will generate based on your input text - must be created. | ||
The destination index must have a field with the <<semantic-text,`semantic_text`>> field type to index the output of the used inference endpoint. | ||
|
||
[source,console] | ||
------------------------------------------------------------ | ||
PUT semantic-embeddings | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"semantic_text": { <1> | ||
"content": { <1> | ||
"type": "semantic_text", <2> | ||
"inference_id": "my-elser-endpoint" <3> | ||
}, | ||
"content": { <4> | ||
"type": "text", | ||
"copy_to": "semantic_text" <5> | ||
} | ||
} | ||
} | ||
|
@@ -88,9 +82,6 @@ PUT semantic-embeddings | |
<3> The `inference_id` is the inference endpoint you created in the previous step. | ||
It will be used to generate the embeddings based on the input text. | ||
Every time you ingest data into the related `semantic_text` field, this endpoint will be used for creating the vector representation of the text. | ||
<4> The field to store the text reindexed from a source index in the <<semantic-text-reindex-data,Reindex the data>> step. | ||
<5> The textual data stored in the `content` field will be copied to `semantic_text` and processed by the {infer} endpoint. | ||
The `semantic_text` field will store the embeddings generated based on the input data. | ||
|
||
|
||
[discrete] | ||
|
@@ -116,13 +107,9 @@ you can see an index named `test-data` with 182469 documents. | |
[[semantic-text-reindex-data]] | ||
==== Reindex the data | ||
|
||
Create the embeddings from the text by reindexing the data from the `test-data` | ||
index to the `semantic-embeddings` index. The data in the `content` field will | ||
be reindexed into the `content` field of the destination index. | ||
The `content` field data will be copied to the `semantic_text` field as a result of the `copy_to` | ||
parameter set in the index mapping creation step. The copied data will be | ||
processed by the {infer} endpoint associated with the `semantic_text` semantic text | ||
field. | ||
Create the embeddings from the text by reindexing the data from the `test-data` index to the `semantic-embeddings` index. | ||
The data in the `content` field will be reindexed into the `content` semantic text field of the destination index. | ||
The reindexed data will be processed by the {infer} endpoint associated with the `content` semantic text field. | ||
|
||
[source,console] | ||
------------------------------------------------------------ | ||
|
@@ -164,18 +151,17 @@ POST _tasks/<task_id>/_cancel | |
[[semantic-text-semantic-search]] | ||
==== Semantic search | ||
|
||
After the data set has been enriched with the embeddings, you can query the data | ||
using semantic search. Provide the `semantic_text` field name and the query text | ||
in a `semantic` query type. The {infer} endpoint used to generate the embeddings | ||
for the `semantic_text` field will be used to process the query text. | ||
After the data set has been enriched with the embeddings, you can query the data using semantic search. | ||
Provide the `semantic_text` field name and the query text in a `semantic` query type. | ||
The {infer} endpoint used to generate the embeddings for the `semantic_text` field will be used to process the query text. | ||
|
||
[source,console] | ||
------------------------------------------------------------ | ||
GET semantic-embeddings/_search | ||
{ | ||
"query": { | ||
"semantic": { | ||
"field": "semantic_text", <1> | ||
"field": "content", <1> | ||
"query": "How to avoid muscle soreness while running?" <2> | ||
} | ||
} | ||
|
@@ -196,7 +182,7 @@ query from the `semantic-embedding` index: | |
"_id": "6DdEuo8B0vYIvzmhoEtt", | ||
"_score": 24.972616, | ||
"_source": { | ||
"semantic_text": { | ||
"content": { | ||
"inference": { | ||
"inference_id": "my-elser-endpoint", | ||
"model_settings": { | ||
|
@@ -213,15 +199,15 @@ query from the `semantic-embedding` index: | |
} | ||
}, | ||
"id": 1713868, | ||
"content": "There are a few foods and food groups that will help to fight inflammation and delayed onset muscle soreness (both things that are inevitable after a long, hard workout) when you incorporate them into your postworkout eats, whether immediately after your run or at a meal later in the day. Advertisement. Advertisement." | ||
"text": "There are a few foods and food groups that will help to fight inflammation and delayed onset muscle soreness (both things that are inevitable after a long, hard workout) when you incorporate them into your postworkout eats, whether immediately after your run or at a meal later in the day. Advertisement. Advertisement." | ||
} | ||
}, | ||
{ | ||
"_index": "semantic-embeddings", | ||
"_id": "-zdEuo8B0vYIvzmhplLX", | ||
"_score": 22.143118, | ||
"_source": { | ||
"semantic_text": { | ||
"content": { | ||
"inference": { | ||
"inference_id": "my-elser-endpoint", | ||
"model_settings": { | ||
|
@@ -238,15 +224,15 @@ query from the `semantic-embedding` index: | |
} | ||
}, | ||
"id": 3389244, | ||
"content": "During Your Workout. There are a few things you can do during your workout to help prevent muscle injury and soreness. According to personal trainer and writer for Iron Magazine, Marc David, doing warm-ups and cool-downs between sets can help keep muscle soreness to a minimum." | ||
"text": "During Your Workout. There are a few things you can do during your workout to help prevent muscle injury and soreness. According to personal trainer and writer for Iron Magazine, Marc David, doing warm-ups and cool-downs between sets can help keep muscle soreness to a minimum." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here |
||
} | ||
}, | ||
{ | ||
"_index": "semantic-embeddings", | ||
"_id": "77JEuo8BdmhTuQdXtQWt", | ||
"_score": 21.506052, | ||
"_source": { | ||
"semantic_text": { | ||
"content": { | ||
"inference": { | ||
"inference_id": "my-elser-endpoint", | ||
"model_settings": { | ||
|
@@ -263,11 +249,16 @@ query from the `semantic-embedding` index: | |
} | ||
}, | ||
"id": 363742, | ||
"content": "This is especially important if the soreness is due to a weightlifting routine. For this time period, do not exert more than around 50% of the level of effort (weight, distance and speed) that caused the muscle groups to be sore." | ||
"text": "This is especially important if the soreness is due to a weightlifting routine. For this time period, do not exert more than around 50% of the level of effort (weight, distance and speed) that caused the muscle groups to be sore." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same here |
||
} | ||
}, | ||
(...) | ||
] | ||
------------------------------------------------------------ | ||
// NOTCONSOLE | ||
|
||
[discrete] | ||
[[semantic-text-further-examples]] | ||
==== Further examples | ||
|
||
If you want to use `semantic_text` in hybrid search, refer to https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/search/09-semantic-text.ipynb[this notebook] for a step-by-step guide. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This
text
field should be contained within thecontent
object, like so: