You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: solutions/search/vector/bring-own-vectors.md
+17-29Lines changed: 17 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,15 +10,13 @@ products:
10
10
description: An introduction to vectors and knn search in Elasticsearch.
11
11
---
12
12
13
-
# Bring your own dense vectors [bring-your-own-vectors]
13
+
# Bring your own dense vectors to {{es}} [bring-your-own-vectors]
14
14
15
-
{{es}} enables you store and search mathematical representations of your content called _embeddings_ or _vectors_, which help machines understand and process your data more effectively.
16
-
There are two types of representation (_dense_ and _sparse_), which are suited to different types of queries and use cases (for example, finding similar images and content or storing expanded terms and weights).
15
+
{{es}} enables you to store and search mathematical representations of your content - _embeddings_ or _vectors_ - which power AI-driven relevance. There are two types of vector representation - _dense_ and _sparse_ - suited to different queries and use cases (for example, finding similar images and content or storing expanded terms and weights).
17
16
18
-
In this introduction to [vector search](/solutions/search/vector.md), you'll store and search for dense vectors.
19
-
You'll also learn the syntax for searching these documents using a [k-nearest neighbour](/solutions/search/vector/knn.md) (kNN) query.
17
+
In this introduction to [vector search](/solutions/search/vector.md), you’ll store and search for dense vectors in {{es}}. You’ll also learn the syntax for querying these documents with a [k-nearest neighbour](/solutions/search/vector/knn.md) (kNN) query.
20
18
21
-
## Prerequisites
19
+
## Prerequisites for vector search
22
20
23
21
- If you're using {{es-serverless}}, create a project with the general purpose configuration. To add the sample data, you must have a `developer` or `admin` predefined role or an equivalent custom role.
24
22
- If you're using {{ech}} or a self-managed cluster, start {{es}} and {{kib}}. The simplest method to complete the steps in this guide is to log in with a user that has the `superuser` built-in role.
@@ -27,11 +25,9 @@ To learn about role-based access control, check out [](/deploy-manage/users-role
27
25
28
26
## Create a vector database
29
27
30
-
When you create vectors (or _vectorize_ your data), you convert complex and nuanced content (such as text, videos, images, or audio) into multidimensional numerical representations.
31
-
They must be stored in specialized data structures designed to ensure efficient similarity search and speedy vector distance calculations.
28
+
When you create vectors (or _vectorize_ your data), you convert complex content (text, images, audio, video) into multidimensional numeric representations. These vectors are stored in specialized data structures that enable efficient similarity search and fast kNN distance calculations.
32
29
33
-
In this quide, you'll use documents that already have dense vector embeddings.
34
-
To deploy a vector embedding model in {{es}} and generate vectors while ingesting and searching your data, refer to the links in [Learn more](#bring-your-own-vectors-learn-more).
30
+
In this guide, you’ll use documents that already include dense vector embeddings. To deploy a vector embedding model in {{es}} and generate vectors during ingest and search, refer to the links in [Learn more](#bring-your-own-vectors-learn-more).
35
31
36
32
::::{tip}
37
33
This is an advanced use case that uses the `dense_vector` field type. Refer to [](/solutions/search/semantic-search.md) for an overview of your options for semantic search with {{es}}.
@@ -47,7 +43,7 @@ Each document in our simple data set will have:
47
43
* An embedding of that review: stored in a `review_vector` field, which is defined as a [`dense_vector`](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md) data type.
48
44
49
45
:::{tip}
50
-
The `dense_vector` type automatically uses `int8_hnsw` quantization by default to reduce the memory footprint required when searching float vectors. Learn more about balancing performance and accuracy in [Dense vector quantization](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization).
46
+
The `dense_vector` type automatically uses `int8_hnsw` quantization by default to reduce the memory footprint when searching float vectors. Learn how to balance performance and accuracy in [Dense vector quantization](elasticsearch://reference/elasticsearch/mapping-reference/dense-vector.md#dense-vector-quantization).
51
47
:::
52
48
53
49
The following API request defines the `review_text` and `review_vector` fields:
@@ -75,9 +71,7 @@ PUT /amazon-reviews
75
71
2. The `index` parameter is set to `true` to enable the use of the `knn` query.
76
72
3. The `similarity` parameter defines the similarity function used to compare the query vector to the document vectors. `cosine` is the default similarity function for `dense_vector` fields in {{es}}.
77
73
78
-
Here we're using an 8-dimensional embedding for readability.
79
-
The vectors that neural network models work with can have several hundreds or even thousands of dimensions that represent a point in a multi-dimensional space.
80
-
Each vector dimension represents a _feature_ or a characteristic of the unstructured data.
74
+
Here we’re using an 8-dimensional embedding for readability. The vectors that neural network models work with can have several hundreds or even thousands of dimensions that represent a point in a multi-dimensional space. Each dimension represents a feature or characteristic of the unstructured data.
81
75
::::
82
76
::::{step} Add documents with embeddings
83
77
@@ -113,9 +107,7 @@ POST /_bulk
113
107
114
108
## Test vector search [bring-your-own-vectors-search-documents]
115
109
116
-
Now you can query these document vectors using a [`knn` retriever](elasticsearch://reference/elasticsearch/rest-apis/retrievers.md#knn-retriever).
117
-
`knn` is a type of vector search, which finds the `k` most similar documents to a query vector.
118
-
Here we're using a raw vector for the query text for demonstration purposes:
110
+
Now you can query these document vectors using a [`knn` retriever](elasticsearch://reference/elasticsearch/rest-apis/retrievers.md#knn-retriever). `knn` is a type of vector similarity search that finds the `k` most similar documents to a query vector. Here we're using a raw vector for the query text for demonstration purposes:
119
111
120
112
```console
121
113
POST /amazon-reviews/_search
@@ -131,31 +123,27 @@ POST /amazon-reviews/_search
131
123
}
132
124
```
133
125
134
-
1. A raw vector serves as the query text in this example. In a real-world scenario, you'll need to generate vectors for queries using an embedding model.
126
+
1. A raw vector serves as the query text in this example. In a real-world scenario, you'll generate query vectors using an embedding model.
135
127
2. The `k` parameter specifies the number of results to return.
136
128
3. The `num_candidates` parameter is optional. It limits the number of candidates returned by the search node. This can improve performance and reduce costs.
137
129
138
-
## Next steps
130
+
## Next steps: implementing vector search
139
131
140
-
If you want to try a similar set of steps from an {{es}} client, check out the guided index workflow:
132
+
If you want to try a similar workflow from an {{es}} client, use the guided index workflow:
141
133
142
-
- If you're using Elasticsearch Serverless, go to **{{es}} > Home**, select the vector search workflow, and **Create a vector optimized index**.
143
-
- If you're using {{ech}} or a self-managed cluster, go to **Elasticsearch > Home** and click **Create API index**. Select the vector search workflow.
134
+
* If you're using {{es}} Serverless, go to **{{es}} > Home**, select the vector search workflow, and **Create a vector optimized index**.
135
+
* If you're using {{ech}} or a self-managed cluster, go to **{{es}} > Home** and click **Create API index**. Select the vector search workflow.
144
136
145
137
When you finish your tests and no longer need the sample data set, delete the index:
146
138
147
139
```console
148
140
DELETE /amazon-reviews
149
141
```
150
142
151
-
## Learn more [bring-your-own-vectors-learn-more]
143
+
## Learn more about vector search [bring-your-own-vectors-learn-more]
152
144
153
-
In these simple examples, we're sending a raw vector for the query text.
154
-
In a real-world scenario you won't know the query text ahead of time.
155
-
You'll need to generate query vectors, on the fly, using the same embedding model that generated the document vectors.
156
-
For this you'll need to deploy a text embedding model in {{es}} and use the [`query_vector_builder` parameter](elasticsearch://reference/query-languages/query-dsl/query-dsl-knn-query.md#knn-query-top-level-parameters).
157
-
Alternatively, you can generate vectors client-side and send them directly with the search request.
145
+
In these simple examples, we send a raw vector for the query text. In a real-world scenario, you won’t know the query text ahead of time. You’ll generate query vectors on the fly using the same embedding model that produced the document vectors. For this, deploy a text embedding model in {{es}} and use the[`query_vector_builder` parameter](elasticsearch://reference/query-languages/query-dsl/query-dsl-knn-query.md#knn-query-top-level-parameters). Alternatively, you can generate vectors client-side and send them directly with the search request.
158
146
159
147
For an example of using pipelines to generate text embeddings, check out [](/solutions/search/vector/dense-versus-sparse-ingest-pipelines.md).
160
148
161
-
To learn about more search options, such as semantic, full-text, and hybrid, go to [](/solutions/search/search-approaches.md).
149
+
To learn more about the search options in {{es}}, such as semantic, full-text, and hybrid, refer to [](/solutions/search/search-approaches.md).
0 commit comments