Skip to content

Commit 328e74f

Browse files
committed
[SEARCH] Adds EIS scenario to semantic search tutorial.
1 parent 3e46ddc commit 328e74f

File tree

1 file changed

+58
-8
lines changed

1 file changed

+58
-8
lines changed

solutions/search/semantic-search/semantic-search-semantic-text.md

Lines changed: 58 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,14 @@ This tutorial uses the `elasticsearch` service for demonstration, which is creat
2727

2828
The mapping of the destination index - the index that contains the embeddings that the inference endpoint will generate based on your input text - must be created. The destination index must have a field with the [`semantic_text`](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field type to index the output of the used inference endpoint.
2929

30-
```console
30+
::::{tab-set}
31+
32+
:::{tab-item} Using EIS on Serverless
33+
34+
```{applies_to}
35+
serverless: ga
36+
```
37+
3138
PUT semantic-embeddings
3239
{
3340
"mappings": {
@@ -38,17 +45,64 @@ PUT semantic-embeddings
3845
}
3946
}
4047
}
48+
:::
49+
50+
1. The name of the field to contain the generated embeddings.
51+
2. The field to contain the embeddings is a `semantic_text` field. Since no `inference_id` is provided, the default endpoint `.elser-2-elastic` for the `elasticsearch` service is used. This {{infer}} endpoint uses the [Elastic {{infer-cap}} Service (EIS)](/explore-analyze/elastic-inference/eis.md).
52+
53+
:::{tab-item} Using EIS in Cloud
54+
55+
```{applies_to}
56+
stack: ga
57+
deployment:
58+
self: unavailable
4159
```
4260

61+
PUT semantic-embeddings
62+
{
63+
"mappings": {
64+
"properties": {
65+
"content": { <1>
66+
"type": "semantic_text", <2>
67+
"inference_id": ".elser-2-elastic" <3>
68+
}
69+
}
70+
}
71+
}
72+
:::
73+
4374
1. The name of the field to contain the generated embeddings.
44-
2. The field to contain the embeddings is a `semantic_text` field. Since no `inference_id` is provided, the default endpoint `.elser-2-elasticsearch` for the `elasticsearch` service is used. To use a different {{infer}} service, you must create an {{infer}} endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put) and then specify it in the `semantic_text` field mapping using the `inference_id` parameter.
75+
2. The field to contain the embeddings is a `semantic_text` field.
76+
3. The `.elser-2-elastic` preconfigured {{infer}} endpoint for the `elasticsearch` service is used. This {{infer}} endpoint uses the [Elastic {{infer-cap}} Service (EIS)](/explore-analyze/elastic-inference/eis.md).
4577

46-
::::{note}
47-
If you’re using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, you’ll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data.
78+
:::{tab-item} Using ML-nodes
79+
80+
```console
81+
PUT semantic-embeddings
82+
{
83+
"mappings": {
84+
"properties": {
85+
"content": { <1>
86+
"type": "semantic_text", <2>
87+
"inference_id": ".elser-2-elasticsearch" <3>
88+
}
89+
}
90+
}
91+
}
92+
```
93+
94+
1. The name of the field to contain the generated embeddings.
95+
2. The field to contain the embeddings is a `semantic_text` field.
96+
3. The `.elser-2-elasticsearch` preconfigured {{infer}} endpoint for the `elasticsearch` service is used. To use a different {{infer}} service, you must create an {{infer}} endpoint first using the [Create {{infer}} API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-inference-put) and then specify it in the `semantic_text` field mapping using the `inference_id` parameter.
97+
98+
:::
4899

49100
::::
50101

102+
::::{note}
103+
If you’re using web crawlers or connectors to generate indices, you have to [update the index mappings](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-indices-put-mapping) for these indices to include the `semantic_text` field. Once the mapping is updated, you’ll need to run a full web crawl or a full connector sync. This ensures that all existing documents are reprocessed and updated with the new semantic embeddings, enabling semantic search on the updated data.
51104

105+
::::
52106

53107
## Load data [semantic-text-load-data]
54108

@@ -58,7 +112,6 @@ Use the `msmarco-passagetest2019-top1000` data set, which is a subset of the MS
58112

59113
Download the file and upload it to your cluster using the [Data Visualizer](../../../manage-data/ingest/upload-data-files.md) in the {{ml-app}} UI. After your data is analyzed, click **Override settings**. Under **Edit field names**, assign `id` to the first column and `content` to the second. Click **Apply**, then **Import**. Name the index `test-data`, and click **Import**. After the upload is complete, you will see an index named `test-data` with 182,469 documents.
60114

61-
62115
## Reindex the data [semantic-text-reindex-data]
63116

64117
Create the embeddings from the text by reindexing the data from the `test-data` index to the `semantic-embeddings` index. The data in the `content` field will be reindexed into the `content` semantic text field of the destination index. The reindexed data will be processed by the {{infer}} endpoint associated with the `content` semantic text field.
@@ -68,7 +121,6 @@ This step uses the reindex API to simulate data ingestion. If you are working wi
68121

69122
::::
70123

71-
72124
```console
73125
POST _reindex?wait_for_completion=false
74126
{
@@ -84,7 +136,6 @@ POST _reindex?wait_for_completion=false
84136

85137
1. The default batch size for reindexing is 1000. Reducing size to a smaller number makes the update of the reindexing process quicker which enables you to follow the progress closely and detect errors early.
86138

87-
88139
The call returns a task ID to monitor the progress:
89140

90141
```console
@@ -97,7 +148,6 @@ Reindexing large datasets can take a long time. You can test this workflow using
97148
POST _tasks/<task_id>/_cancel
98149
```
99150

100-
101151
## Semantic search [semantic-text-semantic-search]
102152

103153
After the data has been indexed with the embeddings, you can query the data using semantic search. Choose between [Query DSL](/explore-analyze/query-filter/languages/querydsl.md) or [{{esql}}](elasticsearch://reference/query-languages/esql.md) syntax to execute the query.

0 commit comments

Comments
 (0)