Skip to content

Commit d30a678

Browse files
authored
Rename the text field in the auto-generated embeddings example (#11554)
Signed-off-by: Fanit Kolchina <[email protected]>
1 parent 56b4709 commit d30a678

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

_vector-search/getting-started/auto-generated-embeddings.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ You'll need the model ID in order to use this model for several of the following
101101

102102
### Step 2: Create an ingest pipeline
103103

104-
First, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains one processor: a task that transforms document fields before documents are ingested into an index. You'll set up a `text_embedding` processor that creates vector embeddings from text. You'll need the `model_id` of the model you set up in the previous section and a `field_map`, which specifies the name of the field from which to take the text (`text`) and the name of the field in which to record embeddings (`passage_embedding`):
104+
First, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains one processor: a task that transforms document fields before documents are ingested into an index. You'll set up a `text_embedding` processor that creates vector embeddings from text. You'll need the `model_id` of the model you set up in the previous section and a `field_map`, which specifies the name of the field from which to take the text (`passage`) and the name of the field in which to record embeddings (`passage_embedding`):
105105

106106
```json
107107
PUT /_ingest/pipeline/nlp-ingest-pipeline
@@ -112,7 +112,7 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline
112112
"text_embedding": {
113113
"model_id": "aVeif4oB5Vm0Tdw8zYO2",
114114
"field_map": {
115-
"text": "passage_embedding"
115+
"passage": "passage_embedding"
116116
}
117117
}
118118
}
@@ -123,7 +123,7 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline
123123

124124
### Step 3: Create a vector index
125125

126-
Now you'll create a vector index by setting `index.knn` to `true`. In the index, the field named `text` contains an image description, and a [`knn_vector`]({{site.url}}{{site.baseurl}}/mappings/supported-field-types/knn-vector/) field named `passage_embedding` contains the vector embedding of the text. The vector field `dimension` must match the dimensionality of the model you configured in Step 2. Additionally, set the default ingest pipeline to the `nlp-ingest-pipeline` you created in the previous step:
126+
Now you'll create a vector index by setting `index.knn` to `true`. In the index, the field named `passage` contains an image description, and a [`knn_vector`]({{site.url}}{{site.baseurl}}/mappings/supported-field-types/knn-vector/) field named `passage_embedding` contains the vector embedding of the text. The vector field `dimension` must match the dimensionality of the model you configured in Step 2. Additionally, set the default ingest pipeline to the `nlp-ingest-pipeline` you created in the previous step:
127127

128128

129129
```json
@@ -140,7 +140,7 @@ PUT /my-nlp-index
140140
"dimension": 768,
141141
"space_type": "l2"
142142
},
143-
"text": {
143+
"passage": {
144144
"type": "text"
145145
}
146146
}
@@ -153,28 +153,28 @@ Setting up a vector index allows you to later perform a vector search on the `pa
153153

154154
### Step 4: Ingest documents into the index
155155

156-
In this step, you'll ingest several sample documents into the index. The sample data is taken from the [Flickr image dataset](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset). Each document contains a `text` field corresponding to the image description and an `id` field corresponding to the image ID:
156+
In this step, you'll ingest several sample documents into the index. The sample data is taken from the [Flickr image dataset](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset). Each document contains a `passage` field corresponding to the image description and an `id` field corresponding to the image ID:
157157

158158
```json
159159
PUT /my-nlp-index/_doc/1
160160
{
161-
"text": "A man who is riding a wild horse in the rodeo is very near to falling off ."
161+
"passage": "A man who is riding a wild horse in the rodeo is very near to falling off ."
162162
}
163163
```
164164
{% include copy-curl.html %}
165165

166166
```json
167167
PUT /my-nlp-index/_doc/2
168168
{
169-
"text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse ."
169+
"passage": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse ."
170170
}
171171
```
172172
{% include copy-curl.html %}
173173

174174
```json
175175
PUT /my-nlp-index/_doc/3
176176
{
177-
"text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco ."
177+
"passage": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco ."
178178
}
179179
```
180180
{% include copy-curl.html %}
@@ -228,23 +228,23 @@ The response contains the matching documents:
228228
"_id": "1",
229229
"_score": 0.015851952,
230230
"_source": {
231-
"text": "A man who is riding a wild horse in the rodeo is very near to falling off ."
231+
"passage": "A man who is riding a wild horse in the rodeo is very near to falling off ."
232232
}
233233
},
234234
{
235235
"_index": "my-nlp-index",
236236
"_id": "2",
237237
"_score": 0.015177963,
238238
"_source": {
239-
"text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse ."
239+
"passage": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse ."
240240
}
241241
},
242242
{
243243
"_index": "my-nlp-index",
244244
"_id": "3",
245245
"_score": 0.011347729,
246246
"_source": {
247-
"text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco ."
247+
"passage": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco ."
248248
}
249249
}
250250
]
@@ -264,14 +264,14 @@ To register and deploy a model, select the built-in workflow template for the mo
264264

265265
### Step 2: Configure a workflow
266266

267-
Create and provision a semantic search workflow. You must provide the model ID for the model deployed in the previous step. Review your selected workflow template [defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/semantic-search-defaults.json) to determine whether you need to update any of the parameters. For example, if the model dimensionality is different from the default (`1024`), specify the dimensionality of your model in the `output_dimension` parameter. Change the workflow template default text field from `passage_text` to `text` in order to match the manual example:
267+
Create and provision a semantic search workflow. You must provide the model ID for the model deployed in the previous step. Review your selected workflow template [defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/semantic-search-defaults.json) to determine whether you need to update any of the parameters. For example, if the model dimensionality is different from the default (`1024`), specify the dimensionality of your model in the `output_dimension` parameter. Change the workflow template default text field from `passage_text` to `passage` in order to match the manual example:
268268

269269
```json
270270
POST /_plugins/_flow_framework/workflow?use_case=semantic_search&provision=true
271271
{
272272
"create_ingest_pipeline.model_id" : "aVeif4oB5Vm0Tdw8zYO2",
273273
"text_embedding.field_map.output.dimension": "768",
274-
"text_embedding.field_map.input": "text"
274+
"text_embedding.field_map.input": "passage"
275275
}
276276
```
277277
{% include copy-curl.html %}

0 commit comments

Comments
 (0)