You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
navigation_title: Get started with semantic search
2
+
navigation_title: Semantic search
3
3
applies_to:
4
4
serverless:
5
5
products:
@@ -8,10 +8,11 @@ products:
8
8
# Build an AI-powered search experience in {{es-serverless}}
9
9
10
10
<!--
11
-
As you ramp up on Elastic, you’ll use the Elasticsearch Relevance Engine™ (ESRE), designed to power AI search applications. With ESRE, you can take advantage of a suite of developer tools including Elastic’s textual search, vector database, and our proprietary transformer model for semantic search.
11
+
As you ramp up on Elastic, you'll use the Elasticsearch Relevance Engine™ (ESRE), designed to power AI search applications. With ESRE, you can take advantage of a suite of developer tools including Elastic's textual search, vector database, and our proprietary transformer model for semantic search.
12
12
-->
13
13
14
-
Elastic offers a variety of search techniques, starting with BM25, the industry standard for textual search. It provides precise matching for specific searches, matching exact keywords, and it improves with tuning.
14
+
Elastic offers a variety of search techniques, starting with BM25, the industry standard for textual search.
15
+
It provides precise matching for specific searches, matching exact keywords, and it improves with tuning.
15
16
16
17
<!--
17
18
As you get started on vector search, keep in mind there are two forms of vector search: “dense” (aka, kNN vector search) and “sparse."
@@ -23,163 +24,121 @@ This model outperforms on a variety of data sets, such as financial data, weathe
23
24
The model is built to provide great relevance across domains, without the need for additional fine tuning.
24
25
25
26
<!--
26
-
Check out this interactive demo to see how search results are more relevant when you test Elastic's Learned Sparse Encoder model against Elastic’s textual BM25 algorithm.
27
+
Check out this interactive demo to see how search results are more relevant when you test Elastic's Learned Sparse Encoder model against Elastic's textual BM25 algorithm.
27
28
28
29
In addition, Elastic also supports dense vectors to implement similarity search on unstructured data beyond text, such as videos, images, and audio.
29
30
-->
30
31
31
-
The advantage of semantic search and vector search is that these technologies allow you to use intuitive language in your search queries.
32
+
The advantage of [AI-powered search](/solutions/search/ai-search/ai-search.md) is that these technologies enable you to use intuitive language in your search queries.
32
33
For example, if you want to search for workplace guidelines on a second income, you could search for "side hustle", which is not a term you're likely to see in a formal HR document.
33
34
34
-
## Setup
35
+
To try it out, [create an {{es-serverless}} project](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-get-started-create-project) that is optimized for vectors.
35
36
36
-
In this guide, we'll demonstrate how to ingest data with the Elastic web crawler then try out some queries.
37
-
If you want to play along, follow these setup steps. Otherwise, you can jump ahead to the query examples.
37
+
% TBD: It seems like semantic search fields exist in all, so what is the value of this option?
38
38
39
-
1. Optional: Create an {{ecloud}} project, in particular an {{es-serverless}} project that is optimized for vectors.
40
-
If you want to perform the steps in this guide you'll need:
39
+
## Add data
41
40
42
-
* The {{es}} URL: the endpoint to which you will send your data
43
-
* The API Key: the easiest of the authentication methods
44
-
% TBD: Is this mandatory? Clarify value.
41
+
% TBD: What type of data is ideal for semantic search?
45
42
46
-
1. Optional: Add data.
47
-
In this guide, the data is derived from a live website, the [{{es}} Labs](https://www.elastic.co/search-labs).
48
-
49
-
1. Let’s create an {{es}} index named `elasticsearch-labs-blog`.
50
-
1. Create mappings for a text field and a semantic text field.
51
-
<!--
52
-
Now before writing data to the index, let’s do a small configuration to ensure you have semantic search right from the start. Click on Mappings and + Add field, create a text field called `body`, this is where the crawler will put the content of the web pages it reads. Next, add a semantic text type field, let’s give it a very creative name: `semantic_text`.
53
-
54
-
By using both a text field and semantic_text field, the process combines the strengths of traditional keyword search and advanced semantic search. This hybrid search provides comprehensive search capabilities, ensuring that users can find relevant information based on both the raw text and its underlying meaning.
55
-
-->
56
-
1. Ingest data with the Elastic Open Web Crawler.
57
-
<!--
58
-
You will need Docker to use the Open Web Crawler. Here is a simple config file, it tells the crawler to read the https://www.elastic.co/search-labs blog and write it to the `elasticsearch-labs-blog` index at `elasticsearch.host` using the `elasticsearch.api_key`... then create a `docker-compose.yml` file. Start the service with `docker-compose up -d` then start the crawling process with `docker-compose exec -it crawler bin/crawler crawl /app/config/crawler-config-blog.yml`. After a few minutes you should have the whole {{es}} labs blogs indexed.
59
-
60
-
What just happened? The blog content was indexed to the `body` field, then this content was transformed into a sparse vector inside the `semantic_text` field. This transformation involved two main steps. First, the content was divided into smaller, manageable chunks to ensure that the text is broken down into meaningful segments, which can be more effectively processed and searched. Next, each chunk of text was transformed into a sparse vector representation using text expansion techniques. This step leverages ELSER (Elastic Search Engine for Relevance) to convert the text into a format that captures the semantic meaning, enabling more accurate and relevant search results.
61
-
-->
62
-
63
-
## Test a keyword query
64
-
65
-
% TBD Move below the semantic search example? Assumes people are already familiar with old search options?
66
-
67
-
Now it’s time to search for the information you’re looking for.
68
-
If you’re a developer who’s implementing search for a web application, you can use the Console/Dev Tools to test and refine search results from your indexed data.
69
-
70
-
Let’s start with a simple `multi_match query`, which will match the text against the “title” and “body” fields.
71
-
Since this is a classic lexical search (not semantic yet) the results of a query like “how to implement multilingual search” will match the words you are providing.
43
+
There are some simple data sets that you can use for learning purposes.
44
+
For example, if you follow the [guided index flow](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-follow-guided-index-flow), you can choose the semantic search option.
45
+
Follow the instructions to install an {{es}} client and define field mappings or try out the API requests in the [Console](/explore-analyze/query-filter/tools/console.md):
72
46
73
47
```console
74
-
GET elasticsearch-labs-blog/_search
48
+
PUT /my-index/_mapping
75
49
{
76
-
"_source": ["title"],
77
-
"query": {
78
-
"multi_match": {
79
-
"query": "how to implement multilingual search",
80
-
"fields": ["title","body"]
50
+
"properties": {
51
+
"text": {
52
+
"type": "semantic_text"
81
53
}
82
54
}
83
55
}
84
56
```
85
57
86
-
In this case the top 5 matches are good, but not great.
58
+
By default, thee [semantic_text](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) field type provides vector search capabilities using the ELSER model.
59
+
% TBD: Confirm "Elser model" vs ".elser-2-elasticsearch, a preconfigured endpoint for the elasticsearch service".
87
60
88
-
```txt
89
-
"Multilingual vector search with the E5 embedding model"
90
-
"Scalar quantization optimized for vector databases"
91
-
"How to migrate your Ruby app from OpenSearch to Elasticsearch"
92
-
"How to search languages with compound words"
93
-
"How To"
94
-
```
61
+
Next, use the Elasticsearch bulk API to ingest an array of documents into the index.
62
+
The initial bulk ingestion request could take longer than the default request timeout.
63
+
If the following request times out, allow time for the machine learning model loading to complete (typically 1-5 minutes) then retry it:
95
64
96
-
## Test a semantic search query
97
-
98
-
Now try the same but with a semantic query, it will translate the text “how to implement multilingual search?” automatically into a vector representation and perform the query against the `semantic_text` field.
65
+
<!--
66
+
TBD: Describe where to look for the downloaded model in Trained Models?
67
+
-->
99
68
100
69
```console
101
-
GET elasticsearch-labs-blog/_search
102
-
{
103
-
"_source": ["title"],
104
-
"query": {
105
-
"semantic": {
106
-
"field": "semantic_text",
107
-
"query": "how to implement multilingual search?"
108
-
}
109
-
}
110
-
}
70
+
POST /_bulk?pretty
71
+
{ "index": { "_index": "my-index" } }
72
+
{"text":"Yellowstone National Park is one of the largest national parks in the United States. It ranges from the Wyoming to Montana and Idaho, and contains an area of 2,219,791 acress across three different states. Its most famous for hosting the geyser Old Faithful and is centered on the Yellowstone Caldera, the largest super volcano on the American continent. Yellowstone is host to hundreds of species of animal, many of which are endangered or threatened. Most notably, it contains free-ranging herds of bison and elk, alongside bears, cougars and wolves. The national park receives over 4.5 million visitors annually and is a UNESCO World Heritage Site."}
73
+
{ "index": { "_index": "my-index" } }
74
+
{"text":"Yosemite National Park is a United States National Park, covering over 750,000 acres of land in California. A UNESCO World Heritage Site, the park is best known for its granite cliffs, waterfalls and giant sequoia trees. Yosemite hosts over four million visitors in most years, with a peak of five million visitors in 2016. The park is home to a diverse range of wildlife, including mule deer, black bears, and the endangered Sierra Nevada bighorn sheep. The park has 1,200 square miles of wilderness, and is a popular destination for rock climbers, with over 3,000 feet of vertical granite to climb. Its most famous and cliff is the El Capitan, a 3,000 feet monolith along its tallest face."}
75
+
{ "index": { "_index": "my-index" } }
76
+
{"text":"Rocky Mountain National Park is one of the most popular national parks in the United States. It receives over 4.5 million visitors annually, and is known for its mountainous terrain, including Longs Peak, which is the highest peak in the park. The park is home to a variety of wildlife, including elk, mule deer, moose, and bighorn sheep. The park is also home to a variety of ecosystems, including montane, subalpine, and alpine tundra. The park is a popular destination for hiking, camping, and wildlife viewing, and is a UNESCO World Heritage Site."}
111
77
```
112
78
113
-
The 5 top results you get back from this semantic search look much better.
79
+
What just happened? The content was transformed into a sparse vector inside the `text` field.
80
+
This transformation involves two main steps.
81
+
First, the content is divided into smaller, manageable chunks to ensure that meaningful segments can be more effectively processed and searched. Next, each chunk of text is transformed into a sparse vector representation using text expansion techniques.
82
+
This step leverages ELSER (Elastic Search Engine for Relevance) to convert the text into a format that captures the semantic meaning, enabling more accurate and relevant search results.
114
83
115
-
```txt
116
-
"Multilingual vector search with the E5 embedding model"
117
-
"How to search languages with compound words"
118
-
"Dataset translation with LangChain, Python & Vector Database for multilingual insights"
119
-
"Building multilingual RAG with Elastic and Mistral"
120
-
"Agentic RAG with Elasticsearch & Langchain"
121
-
```
122
-
123
-
## Test a hybrid search query
124
-
125
-
A more advanced example using Reciprocal Rank Fusion (RRF) is a technique used in hybrid retrieval systems to improve the relevance of search results.
126
-
It combines different retrieval methods, such as lexical (traditional) search and semantic search, to enhance the overall search performance.
127
-
128
-
By leveraging RRF, the query ensures that the final list of documents is a balanced mix of the top results from both retrieval methods, thereby improving the overall relevance and diversity of the search results.
129
-
This fusion technique mitigates the limitations of individual retrieval methods, providing a more comprehensive and accurate set of results.
84
+
## Test a semantic search query
130
85
131
-
% TBD: Are there reasons for not using hybrid search? e.g. additional storage space, speed
86
+
Now try a semantic query:
132
87
133
88
```console
134
-
GET elasticsearch-labs-blog/_search
89
+
GET my-index/_search
135
90
{
136
-
"_source": [
137
-
"title"
138
-
],
139
-
"retriever": {
140
-
"rrf": {
141
-
"retrievers": [
142
-
{
143
-
"standard": {
144
-
"query": {
145
-
"multi_match": {
146
-
"fields": ["title","body"],
147
-
"query": "how to implement multilingual search"
148
-
}
149
-
}
150
-
}
151
-
},
152
-
{
153
-
"standard": {
154
-
"query": {
155
-
"semantic": {
156
-
"field": "semantic_text",
157
-
"query": "how to implement multilingual search"
158
-
}
159
-
}
160
-
}
161
-
}
162
-
]
163
-
}
164
-
}
91
+
"query": {
92
+
"semantic": {
93
+
"field": "text",
94
+
"query": "best parks for rappelling"
95
+
}
96
+
}
165
97
}
166
98
```
167
99
168
-
The top 5 hits with hybrid search contain very good results, all highly relevant to how you can implement a multilingual search with Elasticsearch:
100
+
The query is translated automatically into a vector representation and runs against the contents of the semantic text field.
101
+
The search results are sorted by relevance score, which measures how well each document matches the query.
102
+
For example:
169
103
170
-
```txt
171
-
"Multilingual vector search with the E5 embedding model"
172
-
"How to search languages with compound words"
173
-
"Dataset translation with LangChain, Python & Vector Database for multilingual insights"
174
-
"Building multilingual RAG with Elastic and Mistral"
175
-
"Evaluating scalar quantization in Elasticsearch"
104
+
```json
105
+
{
106
+
"took": 249,
107
+
"timed_out": false,
108
+
"_shards": {
109
+
"total": 3,
110
+
"successful": 3,
111
+
"skipped": 0,
112
+
"failed": 0
113
+
},
114
+
"hits": {
115
+
"total": {
116
+
"value": 6,
117
+
"relation": "eq"
118
+
},
119
+
"max_score": 12.118624,
120
+
"hits": [
121
+
{
122
+
"_index": "search-0lxc",
123
+
"_id": "0lGtpJcB7hfWuB0FGC06",
124
+
"_score": 12.118624,
125
+
"_source": {
126
+
"text": "Rocky Mountain National Park is one of the most popular national parks in the United States. It receives over 4.5 million visitors annually, and is known for its mountainous terrain, including Longs Peak, which is the highest peak in the park. The park is home to a variety of wildlife, including elk, mule deer, moose, and bighorn sheep. The park is also home to a variety of ecosystems, including montane, subalpine, and alpine tundra. The park is a popular destination for hiking, camping, and wildlife viewing, and is a UNESCO World Heritage Site."
127
+
}
128
+
},
129
+
...
176
130
```
177
131
132
+
% TBD: Provide more information about how to interpret and filter the search results.
133
+
178
134
## Next steps
179
135
180
-
Thanks for taking the time to set up semantic search for your data with {{ecloud}}.
136
+
Thanks for taking the time to try out semantic search in {{es-serverless}}.
137
+
For another semantic search example, check out [](/solutions/search/semantic-search/semantic-search-semantic-text.md).
138
+
139
+
If you want to extend this example, try an index with more fields.
140
+
For example, if you have both a `text` field and a `semantic_text` field, you can combine the strengths of traditional keyword search and advanced semantic search.
141
+
A [hybrid search](/solutions/search/hybrid-semantic-text.md) provides comprehensive search capabilities to find relevant information based on both the raw text and its underlying meaning.
142
+
143
+
To learn about more options, such as vector and keyword search, check out [](/solutions/search/search-approaches.md).
181
144
182
-
<!--
183
-
Ready to get started? Spin up a free 14-day trial on Elastic Cloud or try out these 15 minute hands-on learning on Search AI 101.
184
-
TBD: Link to other quickstarts or to the deeper semantic search options.
0 commit comments