Skip to content

Commit 91a81e3

Browse files
authored
[DOCS] Simplifies semantic_text tutorial by removing copy_to field (#112864) (#112877)
1 parent d72d1e6 commit 91a81e3

File tree

3 files changed

+30
-30
lines changed

3 files changed

+30
-30
lines changed

docs/reference/search/search-your-data/semantic-search-inference.asciidoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Create an {infer} endpoint by using the <<put-inference-api>>:
3838

3939
include::{es-ref-dir}/tab-widgets/inference-api/infer-api-task-widget.asciidoc[]
4040

41+
4142
[discrete]
4243
[[infer-service-mappings]]
4344
==== Create the index mapping

docs/reference/search/search-your-data/semantic-search-semantic-text.asciidoc

Lines changed: 21 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -59,24 +59,18 @@ If using the Python client, you can set the `timeout` parameter to a higher valu
5959
[[semantic-text-index-mapping]]
6060
==== Create the index mapping
6161

62-
The mapping of the destination index - the index that contains the embeddings
63-
that the inference endpoint will generate based on your input text - must be created. The
64-
destination index must have a field with the <<semantic-text,`semantic_text`>>
65-
field type to index the output of the used inference endpoint.
62+
The mapping of the destination index - the index that contains the embeddings that the inference endpoint will generate based on your input text - must be created.
63+
The destination index must have a field with the <<semantic-text,`semantic_text`>> field type to index the output of the used inference endpoint.
6664

6765
[source,console]
6866
------------------------------------------------------------
6967
PUT semantic-embeddings
7068
{
7169
"mappings": {
7270
"properties": {
73-
"semantic_text": { <1>
71+
"content": { <1>
7472
"type": "semantic_text", <2>
7573
"inference_id": "my-elser-endpoint" <3>
76-
},
77-
"content": { <4>
78-
"type": "text",
79-
"copy_to": "semantic_text" <5>
8074
}
8175
}
8276
}
@@ -88,9 +82,6 @@ PUT semantic-embeddings
8882
<3> The `inference_id` is the inference endpoint you created in the previous step.
8983
It will be used to generate the embeddings based on the input text.
9084
Every time you ingest data into the related `semantic_text` field, this endpoint will be used for creating the vector representation of the text.
91-
<4> The field to store the text reindexed from a source index in the <<semantic-text-reindex-data,Reindex the data>> step.
92-
<5> The textual data stored in the `content` field will be copied to `semantic_text` and processed by the {infer} endpoint.
93-
The `semantic_text` field will store the embeddings generated based on the input data.
9485

9586

9687
[discrete]
@@ -116,13 +107,9 @@ you can see an index named `test-data` with 182469 documents.
116107
[[semantic-text-reindex-data]]
117108
==== Reindex the data
118109

119-
Create the embeddings from the text by reindexing the data from the `test-data`
120-
index to the `semantic-embeddings` index. The data in the `content` field will
121-
be reindexed into the `content` field of the destination index.
122-
The `content` field data will be copied to the `semantic_text` field as a result of the `copy_to`
123-
parameter set in the index mapping creation step. The copied data will be
124-
processed by the {infer} endpoint associated with the `semantic_text` semantic text
125-
field.
110+
Create the embeddings from the text by reindexing the data from the `test-data` index to the `semantic-embeddings` index.
111+
The data in the `content` field will be reindexed into the `content` semantic text field of the destination index.
112+
The reindexed data will be processed by the {infer} endpoint associated with the `content` semantic text field.
126113

127114
[source,console]
128115
------------------------------------------------------------
@@ -164,18 +151,17 @@ POST _tasks/<task_id>/_cancel
164151
[[semantic-text-semantic-search]]
165152
==== Semantic search
166153

167-
After the data set has been enriched with the embeddings, you can query the data
168-
using semantic search. Provide the `semantic_text` field name and the query text
169-
in a `semantic` query type. The {infer} endpoint used to generate the embeddings
170-
for the `semantic_text` field will be used to process the query text.
154+
After the data set has been enriched with the embeddings, you can query the data using semantic search.
155+
Provide the `semantic_text` field name and the query text in a `semantic` query type.
156+
The {infer} endpoint used to generate the embeddings for the `semantic_text` field will be used to process the query text.
171157

172158
[source,console]
173159
------------------------------------------------------------
174160
GET semantic-embeddings/_search
175161
{
176162
"query": {
177163
"semantic": {
178-
"field": "semantic_text", <1>
164+
"field": "content", <1>
179165
"query": "How to avoid muscle soreness while running?" <2>
180166
}
181167
}
@@ -196,7 +182,7 @@ query from the `semantic-embedding` index:
196182
"_id": "6DdEuo8B0vYIvzmhoEtt",
197183
"_score": 24.972616,
198184
"_source": {
199-
"semantic_text": {
185+
"content": {
200186
"inference": {
201187
"inference_id": "my-elser-endpoint",
202188
"model_settings": {
@@ -213,15 +199,15 @@ query from the `semantic-embedding` index:
213199
}
214200
},
215201
"id": 1713868,
216-
"content": "There are a few foods and food groups that will help to fight inflammation and delayed onset muscle soreness (both things that are inevitable after a long, hard workout) when you incorporate them into your postworkout eats, whether immediately after your run or at a meal later in the day. Advertisement. Advertisement."
202+
"text": "There are a few foods and food groups that will help to fight inflammation and delayed onset muscle soreness (both things that are inevitable after a long, hard workout) when you incorporate them into your postworkout eats, whether immediately after your run or at a meal later in the day. Advertisement. Advertisement."
217203
}
218204
},
219205
{
220206
"_index": "semantic-embeddings",
221207
"_id": "-zdEuo8B0vYIvzmhplLX",
222208
"_score": 22.143118,
223209
"_source": {
224-
"semantic_text": {
210+
"content": {
225211
"inference": {
226212
"inference_id": "my-elser-endpoint",
227213
"model_settings": {
@@ -238,15 +224,15 @@ query from the `semantic-embedding` index:
238224
}
239225
},
240226
"id": 3389244,
241-
"content": "During Your Workout. There are a few things you can do during your workout to help prevent muscle injury and soreness. According to personal trainer and writer for Iron Magazine, Marc David, doing warm-ups and cool-downs between sets can help keep muscle soreness to a minimum."
227+
"text": "During Your Workout. There are a few things you can do during your workout to help prevent muscle injury and soreness. According to personal trainer and writer for Iron Magazine, Marc David, doing warm-ups and cool-downs between sets can help keep muscle soreness to a minimum."
242228
}
243229
},
244230
{
245231
"_index": "semantic-embeddings",
246232
"_id": "77JEuo8BdmhTuQdXtQWt",
247233
"_score": 21.506052,
248234
"_source": {
249-
"semantic_text": {
235+
"content": {
250236
"inference": {
251237
"inference_id": "my-elser-endpoint",
252238
"model_settings": {
@@ -263,11 +249,16 @@ query from the `semantic-embedding` index:
263249
}
264250
},
265251
"id": 363742,
266-
"content": "This is especially important if the soreness is due to a weightlifting routine. For this time period, do not exert more than around 50% of the level of effort (weight, distance and speed) that caused the muscle groups to be sore."
252+
"text": "This is especially important if the soreness is due to a weightlifting routine. For this time period, do not exert more than around 50% of the level of effort (weight, distance and speed) that caused the muscle groups to be sore."
267253
}
268254
},
269255
(...)
270256
]
271257
------------------------------------------------------------
272258
// NOTCONSOLE
273259

260+
[discrete]
261+
[[semantic-text-further-examples]]
262+
==== Further examples
263+
264+
If you want to use `semantic_text` in hybrid search, refer to https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/search/09-semantic-text.ipynb[this notebook] for a step-by-step guide.

docs/reference/tab-widgets/inference-api/infer-api-task.asciidoc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,14 @@ is the unique identifier of the {infer} endpoint is `elser_embeddings`.
5050
You don't need to download and deploy the ELSER model upfront, the API request
5151
above will download the model if it's not downloaded yet and then deploy it.
5252

53+
[NOTE]
54+
====
55+
You might see a 502 bad gateway error in the response when using the {kib} Console.
56+
This error usually just reflects a timeout, while the model downloads in the background.
57+
You can check the download progress in the {ml-app} UI.
58+
If using the Python client, you can set the `timeout` parameter to a higher value.
59+
====
60+
5361
// end::elser[]
5462

5563
// tag::hugging-face[]

0 commit comments

Comments
 (0)