You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: solutions/search/semantic-search/semantic-search-semantic-text.md
+58-8Lines changed: 58 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -27,7 +27,14 @@ This tutorial uses the `elasticsearch` service for demonstration, which is creat
27
27
28
28
The mapping of the destination index - the index that contains the embeddings that the inference endpoint will generate based on your input text - must be created. The destination index must have a field with the [`semantic_text`](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field type to index the output of the used inference endpoint.
29
29
30
-
```console
30
+
::::{tab-set}
31
+
32
+
:::{tab-item} Using EIS on Serverless
33
+
34
+
```{applies_to}
35
+
serverless: ga
36
+
```
37
+
31
38
PUT semantic-embeddings
32
39
{
33
40
"mappings": {
@@ -38,17 +45,64 @@ PUT semantic-embeddings
38
45
}
39
46
}
40
47
}
48
+
:::
49
+
50
+
1. The name of the field to contain the generated embeddings.
51
+
2. The field to contain the embeddings is a `semantic_text` field. Since no `inference_id` is provided, the default endpoint `.elser-2-elastic` for the `elasticsearch` service is used. This {{infer}} endpoint uses the [Elastic {{infer-cap}} Service (EIS)](/explore-analyze/elastic-inference/eis.md).
52
+
53
+
:::{tab-item} Using EIS in Cloud
54
+
55
+
```{applies_to}
56
+
stack: ga
57
+
deployment:
58
+
self: unavailable
41
59
```
42
60
61
+
PUT semantic-embeddings
62
+
{
63
+
"mappings": {
64
+
"properties": {
65
+
"content": { <1>
66
+
"type": "semantic_text", <2>
67
+
"inference_id": ".elser-2-elastic" <3>
68
+
}
69
+
}
70
+
}
71
+
}
72
+
:::
73
+
43
74
1. The name of the field to contain the generated embeddings.
44
-
2. The field to contain the embeddings is a `semantic_text` field. Since no `inference_id` is provided, the default endpoint `.elser-2-elasticsearch` for the `elasticsearch` service is used. To use a different {{infer}} service, you must create an {{infer}} endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put) and then specify it in the `semantic_text` field mapping using the `inference_id` parameter.
75
+
2. The field to contain the embeddings is a `semantic_text` field.
76
+
3. The `.elser-2-elastic` preconfigured {{infer}} endpoint for the `elasticsearch` service is used. This {{infer}} endpoint uses the [Elastic {{infer-cap}} Service (EIS)](/explore-analyze/elastic-inference/eis.md).
45
77
46
-
::::{note}
47
-
If you’re using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, you’ll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data.
78
+
:::{tab-item} Using ML-nodes
79
+
80
+
```console
81
+
PUT semantic-embeddings
82
+
{
83
+
"mappings": {
84
+
"properties": {
85
+
"content": { <1>
86
+
"type": "semantic_text", <2>
87
+
"inference_id": ".elser-2-elasticsearch" <3>
88
+
}
89
+
}
90
+
}
91
+
}
92
+
```
93
+
94
+
1. The name of the field to contain the generated embeddings.
95
+
2. The field to contain the embeddings is a `semantic_text` field.
96
+
3. The `.elser-2-elasticsearch` preconfigured {{infer}} endpoint for the `elasticsearch` service is used. To use a different {{infer}} service, you must create an {{infer}} endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put) and then specify it in the `semantic_text` field mapping using the `inference_id` parameter.
97
+
98
+
:::
48
99
49
100
::::
50
101
102
+
::::{note}
103
+
If you’re using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, you’ll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data.
51
104
105
+
::::
52
106
53
107
## Load data [semantic-text-load-data]
54
108
@@ -58,7 +112,6 @@ Use the `msmarco-passagetest2019-top1000` data set, which is a subset of the MS
58
112
59
113
Download the file and upload it to your cluster using the [Data Visualizer](../../../manage-data/ingest/upload-data-files.md) in the {{ml-app}} UI. After your data is analyzed, click **Override settings**. Under **Edit field names**, assign `id` to the first column and `content` to the second. Click **Apply**, then **Import**. Name the index `test-data`, and click **Import**. After the upload is complete, you will see an index named `test-data` with 182,469 documents.
60
114
61
-
62
115
## Reindex the data [semantic-text-reindex-data]
63
116
64
117
Create the embeddings from the text by reindexing the data from the `test-data` index to the `semantic-embeddings` index. The data in the `content` field will be reindexed into the `content` semantic text field of the destination index. The reindexed data will be processed by the {{infer}} endpoint associated with the `content` semantic text field.
@@ -68,7 +121,6 @@ This step uses the reindex API to simulate data ingestion. If you are working wi
68
121
69
122
::::
70
123
71
-
72
124
```console
73
125
POST _reindex?wait_for_completion=false
74
126
{
@@ -84,7 +136,6 @@ POST _reindex?wait_for_completion=false
84
136
85
137
1. The default batch size for reindexing is 1000. Reducing size to a smaller number makes the update of the reindexing process quicker which enables you to follow the progress closely and detect errors early.
86
138
87
-
88
139
The call returns a task ID to monitor the progress:
89
140
90
141
```console
@@ -97,7 +148,6 @@ Reindexing large datasets can take a long time. You can test this workflow using
After the data has been indexed with the embeddings, you can query the data using semantic search. Choose between [Query DSL](/explore-analyze/query-filter/languages/querydsl.md) or [{{esql}}](elasticsearch://reference/query-languages/esql.md) syntax to execute the query.
0 commit comments