You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _vector-search/getting-started/auto-generated-embeddings.md
+13-13Lines changed: 13 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -101,7 +101,7 @@ You'll need the model ID in order to use this model for several of the following
101
101
102
102
### Step 2: Create an ingest pipeline
103
103
104
-
First, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains one processor: a task that transforms document fields before documents are ingested into an index. You'll set up a `text_embedding` processor that creates vector embeddings from text. You'll need the `model_id` of the model you set up in the previous section and a `field_map`, which specifies the name of the field from which to take the text (`text`) and the name of the field in which to record embeddings (`passage_embedding`):
104
+
First, you need to create an [ingest pipeline]({{site.url}}{{site.baseurl}}/api-reference/ingest-apis/index/) that contains one processor: a task that transforms document fields before documents are ingested into an index. You'll set up a `text_embedding` processor that creates vector embeddings from text. You'll need the `model_id` of the model you set up in the previous section and a `field_map`, which specifies the name of the field from which to take the text (`passage`) and the name of the field in which to record embeddings (`passage_embedding`):
105
105
106
106
```json
107
107
PUT /_ingest/pipeline/nlp-ingest-pipeline
@@ -112,7 +112,7 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline
112
112
"text_embedding": {
113
113
"model_id": "aVeif4oB5Vm0Tdw8zYO2",
114
114
"field_map": {
115
-
"text": "passage_embedding"
115
+
"passage": "passage_embedding"
116
116
}
117
117
}
118
118
}
@@ -123,7 +123,7 @@ PUT /_ingest/pipeline/nlp-ingest-pipeline
123
123
124
124
### Step 3: Create a vector index
125
125
126
-
Now you'll create a vector index by setting `index.knn` to `true`. In the index, the field named `text` contains an image description, and a [`knn_vector`]({{site.url}}{{site.baseurl}}/mappings/supported-field-types/knn-vector/) field named `passage_embedding` contains the vector embedding of the text. The vector field `dimension` must match the dimensionality of the model you configured in Step 2. Additionally, set the default ingest pipeline to the `nlp-ingest-pipeline` you created in the previous step:
126
+
Now you'll create a vector index by setting `index.knn` to `true`. In the index, the field named `passage` contains an image description, and a [`knn_vector`]({{site.url}}{{site.baseurl}}/mappings/supported-field-types/knn-vector/) field named `passage_embedding` contains the vector embedding of the text. The vector field `dimension` must match the dimensionality of the model you configured in Step 2. Additionally, set the default ingest pipeline to the `nlp-ingest-pipeline` you created in the previous step:
127
127
128
128
129
129
```json
@@ -140,7 +140,7 @@ PUT /my-nlp-index
140
140
"dimension": 768,
141
141
"space_type": "l2"
142
142
},
143
-
"text": {
143
+
"passage": {
144
144
"type": "text"
145
145
}
146
146
}
@@ -153,28 +153,28 @@ Setting up a vector index allows you to later perform a vector search on the `pa
153
153
154
154
### Step 4: Ingest documents into the index
155
155
156
-
In this step, you'll ingest several sample documents into the index. The sample data is taken from the [Flickr image dataset](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset). Each document contains a `text` field corresponding to the image description and an `id` field corresponding to the image ID:
156
+
In this step, you'll ingest several sample documents into the index. The sample data is taken from the [Flickr image dataset](https://www.kaggle.com/datasets/hsankesara/flickr-image-dataset). Each document contains a `passage` field corresponding to the image description and an `id` field corresponding to the image ID:
157
157
158
158
```json
159
159
PUT /my-nlp-index/_doc/1
160
160
{
161
-
"text": "A man who is riding a wild horse in the rodeo is very near to falling off ."
161
+
"passage": "A man who is riding a wild horse in the rodeo is very near to falling off ."
162
162
}
163
163
```
164
164
{% include copy-curl.html %}
165
165
166
166
```json
167
167
PUT /my-nlp-index/_doc/2
168
168
{
169
-
"text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse ."
169
+
"passage": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse ."
170
170
}
171
171
```
172
172
{% include copy-curl.html %}
173
173
174
174
```json
175
175
PUT /my-nlp-index/_doc/3
176
176
{
177
-
"text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco ."
177
+
"passage": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco ."
178
178
}
179
179
```
180
180
{% include copy-curl.html %}
@@ -228,23 +228,23 @@ The response contains the matching documents:
228
228
"_id": "1",
229
229
"_score": 0.015851952,
230
230
"_source": {
231
-
"text": "A man who is riding a wild horse in the rodeo is very near to falling off ."
231
+
"passage": "A man who is riding a wild horse in the rodeo is very near to falling off ."
232
232
}
233
233
},
234
234
{
235
235
"_index": "my-nlp-index",
236
236
"_id": "2",
237
237
"_score": 0.015177963,
238
238
"_source": {
239
-
"text": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse ."
239
+
"passage": "A rodeo cowboy , wearing a cowboy hat , is being thrown off of a wild white horse ."
240
240
}
241
241
},
242
242
{
243
243
"_index": "my-nlp-index",
244
244
"_id": "3",
245
245
"_score": 0.011347729,
246
246
"_source": {
247
-
"text": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco ."
247
+
"passage": "People line the stands which advertise Freemont 's orthopedics , a cowboy rides a light brown bucking bronco ."
248
248
}
249
249
}
250
250
]
@@ -264,14 +264,14 @@ To register and deploy a model, select the built-in workflow template for the mo
264
264
265
265
### Step 2: Configure a workflow
266
266
267
-
Create and provision a semantic search workflow. You must provide the model ID for the model deployed in the previous step. Review your selected workflow template [defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/semantic-search-defaults.json) to determine whether you need to update any of the parameters. For example, if the model dimensionality is different from the default (`1024`), specify the dimensionality of your model in the `output_dimension` parameter. Change the workflow template default text field from `passage_text` to `text` in order to match the manual example:
267
+
Create and provision a semantic search workflow. You must provide the model ID for the model deployed in the previous step. Review your selected workflow template [defaults](https://github.com/opensearch-project/flow-framework/blob/2.13/src/main/resources/defaults/semantic-search-defaults.json) to determine whether you need to update any of the parameters. For example, if the model dimensionality is different from the default (`1024`), specify the dimensionality of your model in the `output_dimension` parameter. Change the workflow template default text field from `passage_text` to `passage` in order to match the manual example:
268
268
269
269
```json
270
270
POST /_plugins/_flow_framework/workflow?use_case=semantic_search&provision=true
0 commit comments