You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: solutions/search/serverless-elasticsearch-get-started-semantic.md
+40-17Lines changed: 40 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ As you get started on vector search, keep in mind there are two forms of vector
20
20
TBD: Which type is implemented when you use semantic_text field?
21
21
-->
22
22
23
-
Elastic also offers an out-of-the-box Learned Sparse Encoder model for semantic search.
23
+
Elastic also offers an out-of-the-box Learned Sparse Encoder model ([ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser)) for semantic search.
24
24
This model outperforms on a variety of data sets, such as financial data, weather records, and question-answer pairs, among others.
25
25
The model is built to provide great relevance across domains, without the need for additional fine tuning.
26
26
@@ -33,21 +33,32 @@ In addition, Elastic also supports dense vectors to implement similarity search
33
33
The advantage of [AI-powered search](/solutions/search/ai-search/ai-search.md) is that these technologies enable you to use intuitive language in your search queries.
34
34
For example, if you want to search for workplace guidelines on a second income, you could search for "side hustle", which is not a term you're likely to see in a formal HR document.
35
35
36
-
To try it out, [create an {{es-serverless}} project](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-get-started-create-project) that is optimized for vectors.
36
+
## Prerequisites
37
37
38
-
% TBD: It seems like semantic search fields exist in all, so what is the value of this option?
38
+
To try out semantic search, [create an {{es-serverless}} project](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-get-started-create-project) that is optimized for vectors.
39
+
40
+
<!--
41
+
TBD: It seems like semantic search fields exist in all, so what is the value of this option?
42
+
TBD: Can all roles perform these steps?
43
+
-->
39
44
40
45
## Add data
41
46
42
47
% TBD: What type of data is ideal for semantic search?
43
48
44
49
There are some simple data sets that you can use for learning purposes.
45
50
For example, if you follow the [guided index flow](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-follow-guided-index-flow), you can choose the semantic search option.
46
-
Follow the instructions to install an {{es}} client and define field mappings.
51
+
Follow the instructions to install an {{es}} client and copy the code examples.
47
52
Alternatively, try out the API requests in the [Console](/explore-analyze/query-filter/tools/console.md):
48
53
54
+
:::::{stepper}
55
+
56
+
::::{step} Define a semantic text field
57
+
58
+
The following example creates a mapping for a single [semantic_text](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field:
59
+
49
60
```console
50
-
PUT /my-index/_mapping
61
+
PUT /semantic-index/_mapping
51
62
{
52
63
"properties": {
53
64
"text": {
@@ -57,10 +68,11 @@ PUT /my-index/_mapping
57
68
}
58
69
```
59
70
60
-
By default, the [semantic_text](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field type provides vector search capabilities using the ELSER model.
61
-
% TBD: Confirm "Elser model" vs ".elser-2-elasticsearch, a preconfigured endpoint for the elasticsearch service".
71
+
::::
72
+
73
+
::::{step} Add documents
62
74
63
-
Next, use the Elasticsearch bulk API to ingest an array of documents into the index.
75
+
You can use the Elasticsearch bulk API to ingest an array of documents.
64
76
The initial bulk ingestion request could take longer than the default request timeout.
65
77
If the following request times out, allow time for the machine learning model loading to complete (typically 1-5 minutes) then retry it:
66
78
@@ -70,25 +82,36 @@ TBD: Describe where to look for the downloaded model in Trained Models?
70
82
71
83
```console
72
84
POST /_bulk?pretty
73
-
{ "index": { "_index": "my-index" } }
85
+
{ "index": { "_index": "semantic-index" } }
74
86
{"text":"Yellowstone National Park is one of the largest national parks in the United States. It ranges from the Wyoming to Montana and Idaho, and contains an area of 2,219,791 acress across three different states. Its most famous for hosting the geyser Old Faithful and is centered on the Yellowstone Caldera, the largest super volcano on the American continent. Yellowstone is host to hundreds of species of animal, many of which are endangered or threatened. Most notably, it contains free-ranging herds of bison and elk, alongside bears, cougars and wolves. The national park receives over 4.5 million visitors annually and is a UNESCO World Heritage Site."}
75
-
{ "index": { "_index": "my-index" } }
87
+
{ "index": { "_index": "semantic-index" } }
76
88
{"text":"Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face."}
77
-
{ "index": { "_index": "my-index" } }
89
+
{ "index": { "_index": "semantic-index" } }
78
90
{"text":"Rocky Mountain National Park is one of the most popular national parks in the United States. It receives over 4.5 million visitors annually, and is known for its mountainous terrain, including Longs Peak, which is the highest peak in the park. The park is home to a variety of wildlife, including elk, mule deer, moose, and bighorn sheep. The park is also home to a variety of ecosystems, including montane, subalpine, and alpine tundra. The park is a popular destination for hiking, camping, and wildlife viewing, and is a UNESCO World Heritage Site."}
79
91
```
92
+
::::
93
+
:::::
80
94
81
95
What just happened? The content was transformed into a sparse vector inside the `text` field.
82
96
This transformation involves two main steps.
83
97
First, the content is divided into smaller, manageable chunks to ensure that meaningful segments can be more effectively processed and searched. Next, each chunk of text is transformed into a sparse vector representation using text expansion techniques.
84
-
This step leverages ELSER (Elastic Search Engine for Relevance) to convert the text into a format that captures the semantic meaning, enabling more accurate and relevant search results.
98
+
By default, `semantic_text` fields leverage ELSER to convert the text into a format that captures the semantic meaning.
TBD: Confirm "Elser model" vs ".elser-2-elasticsearch, a preconfigured endpoint for the elasticsearch service".
104
+
TBD: Show how this data looks in Discover, do you see the text or just the vectors?
105
+
TBD: Include the Elastic Open Web Crawler variation too?
106
+
-->
85
107
86
108
## Test a semantic search query
87
109
88
-
Now try a semantic query:
110
+
Try running some queries to check the accuracy and relevance of the search results.
111
+
For example, use some keywords that don't exist in the documents:
89
112
90
113
```console
91
-
GET my-index/_search
114
+
GET semantic-index/_search
92
115
{
93
116
"query": {
94
117
"semantic": {
@@ -101,7 +124,6 @@ GET my-index/_search
101
124
102
125
The query is translated automatically into a vector representation and runs against the contents of the semantic text field.
103
126
The search results are sorted by relevance score, which measures how well each document matches the query.
104
-
For example:
105
127
106
128
```json
107
129
{
@@ -131,7 +153,9 @@ For example:
131
153
...
132
154
```
133
155
134
-
% TBD: Provide more information about how to interpret and filter the search results.
156
+
<!--
157
+
TBD: Provide more information about how to interpret and filter the search results.
158
+
-->
135
159
136
160
## Next steps
137
161
@@ -143,4 +167,3 @@ For example, if you have both a `text` field and a `semantic_text` field, you ca
143
167
A [hybrid search](/solutions/search/hybrid-semantic-text.md) provides comprehensive search capabilities to find relevant information based on both the raw text and its underlying meaning.
144
168
145
169
To learn about more options, such as vector and keyword search, check out [](/solutions/search/search-approaches.md).
0 commit comments