You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -13,14 +13,12 @@ _Semantic search_ is a type of AI-powered search that enables you to use natural
13
13
It returns results that match the meaning of a query, as opposed to literal keyword matches.
14
14
For example, if you want to search for workplace guidelines on a second income, you could search for "side hustle", which is not a term you're likely to see in a formal HR document.
15
15
16
-
Semantic search uses {{es}} vector database and vector search technology.
17
-
Each _vector_ (or _vector embedding_) is an array of numbers that each represent a different characteristic of the text, such as sentiment, context, and syntactics.
18
-
These numeric representations make comparison with other vectors very efficient.
16
+
Semantic search uses {{es}} [vector database](https://www.elastic.co/what-is/vector-database) and [vector search](https://www.elastic.co/what-is/vector-search) technology.
17
+
Each _vector_ (or _vector embedding_) is an array of numbers that represent different characteristics of the text, such as sentiment, context, and syntactics.
18
+
These numeric representations make vector comparisons very efficient.
19
19
20
-
In this guide, you'll learn how to perform semantic search on a small set of sample data.
21
-
You'll create vectors and store them in {{es}}.
22
-
Then you'll run a query, which will be transformed into vectors and compared to the stored data.
23
-
By playing with a simple use case, you'll take the first steps toward understanding whether this type of search is relevant to your own data.
20
+
In this quickstart guide, you'll create vectors for a small set of sample data, store them in {{es}}, then run a semantic query.
21
+
By playing with a simple use case, you'll take the first steps toward understanding whether it's applicable to your own data.
24
22
25
23
## Prerequisites
26
24
@@ -36,12 +34,12 @@ TBD: What is the impact of this "optimized for vectors" option?
36
34
## Create a vector database
37
35
38
36
When you create vectors (or _vectorize_ your data), you convert complex and nuanced documents into multidimensional numerical representations.
39
-
You can choose from many different vector embedding models. Some are extremely hardware efficient and can be run with less computational power. Others have a greater “understanding” of the context and can answer questions and lead a threaded conversation.
40
-
These examples use the default Learned Sparse Encoder ([ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md)) model, which provides great relevance across domains without the need for additional fine tuning.
37
+
You can choose from many different vector embedding models. Some are extremely hardware efficient and can be run with less computational power. Others have a greater understanding of the context, can answer questions, and lead a threaded conversation.
38
+
The examples in this guide use the default Learned Sparse Encoder ([ELSER](/explore-analyze/machine-learning/nlp/ml-nlp-elser.md)) model, which provides great relevance across domains without the need for additional fine tuning.
41
39
42
-
The way that you store and index vectors has a significant impact on the performance and accuracy of search results.
40
+
The way that you store vectors has a significant impact on the performance and accuracy of search results.
43
41
They must be stored in specialized data structures designed to ensure efficient similarity search and speedy vector distance calculations.
44
-
These examples store the vectors in `semantic_text` fields, which provide sensible defaults and automation.
42
+
This guide uses the [semantic text field type](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md), which provide sensible defaults and automation.
45
43
46
44
Try vectorizing a small set of documents.
47
45
You can follow the guided index workflow:
@@ -70,7 +68,6 @@ PUT /semantic-index/_mapping
70
68
When you use `semantic_text` fields, the type of vector is determined by the vector embedding model.
71
69
In this case, the default ELSER model will be used to create sparse vectors.
72
70
73
-
For more details about `semantic_text` fields, refer to [](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md).
74
71
For a deeper dive, check out [Mapping embeddings to Elasticsearch field types: semantic_text, dense_vector, sparse_vector](https://www.elastic.co/search-labs/blog/mapping-embeddings-to-elasticsearch-field-types).
75
72
::::
76
73
@@ -88,22 +85,24 @@ POST /_bulk?pretty
88
85
{"content":"Rocky Mountain National Park is one of the most popular national parks in the United States. It receives over 4.5 million visitors annually, and is known for its mountainous terrain, including Longs Peak, which is the highest peak in the park. The park is home to a variety of wildlife, including elk, mule deer, moose, and bighorn sheep. The park is also home to a variety of ecosystems, including montane, subalpine, and alpine tundra. The park is a popular destination for hiking, camping, and wildlife viewing, and is a UNESCO World Heritage Site."}
89
86
```
90
87
91
-
The bulk ingestion request might take longer than the default request timeout.
92
-
If it times out, wait for the machine learning model loading to complete (typically 1-5 minutes) then retry it.
88
+
The bulk ingestion might take longer than the default request timeout.
89
+
If it times out, wait for the ELSER model to load (typically 1-5 minutes) then retry it.
93
90
::::
94
91
:::::
95
92
96
-
What just happened? The content was transformed into a sparse vector, which involves two main steps.
97
-
First, the content is divided into smaller, manageable chunks to ensure that meaningful segments can be more effectively processed and searched. Then each chunk of text is transformed into a sparse vector representation using text expansion techniques.
93
+
What just happened? The content was transformed into sparse vectors, which involves two main steps.
94
+
First, the content was divided into smaller, manageable chunks to ensure that meaningful segments can be more effectively processed and searched.
95
+
Then each chunk of text was transformed into a sparse vector representation using text expansion techniques.
With a few vectors stored in {{es}}, semantic search can now occur.
101
100
102
101
## Explore the data
103
102
104
103
To familiarize yourself with this data set, open [Discover](/explore-analyze/discover.md) from the navigation menu or by using the [global search field](/explore-analyze/find-and-organize/find-apps-and-objects.md).
105
104
106
-
In **Discover**, you can click the expand icon  to show details about any documents in the table.
105
+
In **Discover**, you can click the expand icon  to show details about documents in the table.
@@ -112,101 +111,19 @@ In **Discover**, you can click the expand icon .
114
113
115
-
<!--
116
-
TBD: When you view these documents in Discover they're shown as having "text" field type instead of "semantic_text" is this right?
117
-
-->
118
-
119
114
## Test semantic search
120
115
121
-
<!--
122
-
TO-DO: Talk about the pipeline where vectors are required for both the data and search query
123
-
% encodes details of searchable information into vectors and then compares vectors to determine which are most similar.
124
-
When you run a query, the search engine transforms the query into embeddings, which are numerical representations of data and related contexts. They are stored in vectors. The kNN algorithm, or k-nearest neighbor algorithm, then matches vectors of existing documents (a semantic search concerns text) to the query vectors. The semantic search then generates results and ranks them based on conceptual relevance.
125
-
-->
126
-
127
116
{{es}} provides a variety of query languages for interacting with your data.
128
117
For an overview of their features and use cases, check out [](/explore-analyze/query-filter/languages.md).
129
118
130
-
You can search data that is stored in `semantic_text` fields by using a specific subset of queries, including `knn`, `match`, `semantic`, and `sparse_vector`. Refer to [Semantic text field type](elasticsearch://reference/elasticsearch/mapping-reference/semantic-text.md) for the complete list.
131
-
132
-
Let's try out two types of queries in two different languages.
133
-
134
-
:::::{stepper}
135
-
136
-
::::{step} Run a semantic query with Query DSL
137
-
138
-
Open the **{{index-manage-app}}** page from the navigation menu or return to the [guided index flow](/solutions/search/serverless-elasticsearch-get-started.md#elasticsearch-follow-guided-index-flow) to find code examples for searching the sample data.
Try running some queries to check the accuracy and relevance of the search results.
146
-
For example, click **Run in Console** and use some seach terms that you did not see when you explored the documents:
147
-
148
-
```console
149
-
POST /semantic-index/_search
150
-
{
151
-
"retriever": {
152
-
"standard": {
153
-
"query": {
154
-
"semantic": {
155
-
"field": "content",
156
-
"query": "best park for rappelling"
157
-
}
158
-
}
159
-
}
160
-
}
161
-
}
162
-
```
163
-
164
-
This is a [semantic](elasticsearch://reference/query-languages/query-dsl/query-dsl-semantic-query.md) query that is expressed in [Query Domain Specific Language](/explore-analyze/query-filter/languages/querydsl.md) (DSL), which is the primary query language for {{es}}.
165
-
166
-
The query is translated automatically into a vector representation and runs against the contents of the semantic text field.
167
-
The search results are sorted by a relevance score, which measures how well each document matches the query.
168
-
169
-
```json
170
-
{
171
-
"took": 22,
172
-
"timed_out": false,
173
-
"_shards": {
174
-
"total": 3,
175
-
"successful": 3,
176
-
"skipped": 0,
177
-
"failed": 0
178
-
},
179
-
"hits": {
180
-
"total": {
181
-
"value": 3,
182
-
"relation": "eq"
183
-
},
184
-
"max_score": 11.389743,
185
-
"hits": [
186
-
{
187
-
"_index": "semantic-index",
188
-
"_id": "Pp0MtJcBZjjo1YKoXkWH",
189
-
"_score": 11.389743,
190
-
"_source": {
191
-
"content": "Rocky Mountain National Park ..."
192
-
...
193
-
}
194
-
```
195
-
196
-
In this example, the document related to Rocky Mountain National park has the highest score.
197
-
::::
198
-
::::{step} Run a match query in ES|QL
199
-
200
-
Another way to try out semantic search is by using the [match](elasticsearch://reference/query-languages/esql/functions-operators/search-functions.md#esql-match) search function in the [Elasticsearch Query Language](/explore-analyze/query-filter/languages/esql.md) (ES|QL).
119
+
You can search data that is stored in `semantic_text` fields by using a specific subset of queries, including `knn`, `match`, `semantic`, and `sparse_vector`.
120
+
The query is translated automatically into the appropriate vector representation to run against the contents of the semantic text field.
121
+
The search results include a relevance score, which measures how well each document matches the query.
201
122
123
+
Let's test a semantic search query in [Elasticsearch Query Language](/explore-analyze/query-filter/languages/esql.md) (ES|QL).
202
124
Go to **Discover** and select **Try ES|QL** from the application menu bar.
Think of some queries that are relevant to the documents you explored, such as finding the biggest park or the best for rappelling.
126
+
For example, copy the following query:
210
127
211
128
```esql
212
129
FROM semantic-index METADATA _score <1>
@@ -217,21 +134,25 @@ FROM semantic-index METADATA _score <1>
217
134
```
218
135
219
136
1. The FROM source command returns a table of data. Each row in the table represents a document. The `METADATA` clause provides access to the query relevance score, which is a [metadata field](elasticsearch://reference/query-languages/esql/esql-metadata-fields.md).
220
-
2. A simplified syntax for the `MATCH` search function, this command performs a semantic query on the specified field.
137
+
2. A simplified syntax for the [match](elasticsearch://reference/query-languages/esql/functions-operators/search-functions.md#esql-match) search function, this command performs a semantic query on the specified field.
221
138
3. The KEEP processing command affects the columns and their order in the results table.
222
139
4. The results are sorted in descending order based on the `_score`.
223
-
5. The maximum number of rows to return.
140
+
5.This optional command defines the maximum number of rows to return.
224
141
225
-
In this example, the first row in the table is the document that had the highest relevance score for the query.
142
+
After you click **▶Run**, the results appear in a table.
143
+
In this example, the first row in the table is the document related to Yellowstone National Park, which had the highest relevance score for the query.
0 commit comments