You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Aug 16, 2022. It is now read-only.
@@ -13,10 +13,10 @@ Short for its associated *k-nearest neighbors* algorithm, the KNN plugin lets yo
13
13
14
14
## Get started
15
15
16
-
To use the KNN plugin, you must create an index with the `index.knn` setting and add one or more fields of the `knn_vector` data type. Additionally, you can specify the `index.knn.space_type` with `l2` or `cosinesimil`, respectively, to useeither Euclidean distance or cosine similarity for calculations. By default, `index.knn.space_type` is set to`l2`. Here is an example that creates an index with two knn_vector fields and uses cosine similarity:
16
+
To use the KNN query type, you must create an index with the `index.knn` setting and add one or more fields of the `knn_vector` data type. Additionally, you can specify the `index.knn.space_type` with `l2` or `cosinesimil`to use, respectively, either Euclidean distance or cosine similarity for calculations. By default, `index.knn.space_type` is `l2`. Here is an example that creates an index with two knn_vector fields and uses cosine similarity:
17
17
18
18
```json
19
-
PUT my-index
19
+
PUT my-knn-index-1
20
20
{
21
21
"settings": {
22
22
"index": {
@@ -48,31 +48,31 @@ After you create the index, add some data to it:
Then you can search the data using the `knn` query type:
73
73
74
74
```json
75
-
GET my-index/_search
75
+
GET my-knn-index-1/_search
76
76
{
77
77
"size": 2,
78
78
"query": {
@@ -88,10 +88,13 @@ GET my-index/_search
88
88
89
89
In this case, `k` is the number of neighbors you want the query to return, but you must also include the `size` option. Otherwise, you get `k` results for each shard (and each segment) rather than `k` results for the entire query. The plugin supports a maximum `k` value of 10,000.
90
90
91
-
If you mix the `knn` query with other clauses, you might receive fewer than `k` results. In this example, the `post_filter` clause reduces the number of results from 2 to 1:
91
+
92
+
## Mixing queries
93
+
94
+
If you mix the `knn` query with filters or other queries, you might receive fewer than `k` results. In this example, `post_filter` reduces the number of results from 2 to 1:
92
95
93
96
```json
94
-
GET my-index/_search
97
+
GET my-knn-index-1/_search
95
98
{
96
99
"size": 2,
97
100
"query": {
@@ -112,3 +115,98 @@ GET my-index/_search
112
115
}
113
116
}
114
117
```
118
+
119
+
120
+
## Custom scoring
121
+
122
+
The [previous example](#mixing-queries) shows a search that returns fewer than `k` results. If you want to avoid this situation, KNN's custom scoring option lets you essentially invert the order of events.
123
+
124
+
First, add another index:
125
+
126
+
```json
127
+
PUT my-knn-index-2
128
+
{
129
+
"settings": {
130
+
"index.knn": true
131
+
},
132
+
"mappings": {
133
+
"properties": {
134
+
"my_vector": {
135
+
"type": "knn_vector",
136
+
"dimension": 2
137
+
},
138
+
"color": {
139
+
"type": "keyword"
140
+
}
141
+
}
142
+
}
143
+
}
144
+
```
145
+
146
+
If you *only* want to use KNN's custom scoring, you can omit `"index.knn": true`, but you lose the ability to perform standard KNN queries on the index. The benefit of this approach is faster indexing speed and lower memory usage.
Finally, use the `script_store` query to pre-filter your documents before identifying nearest neighbors:
169
+
170
+
```json
171
+
GET my-knn-index-2/_search
172
+
{
173
+
"size": 2,
174
+
"query": {
175
+
"script_score": {
176
+
"query": {
177
+
"bool": {
178
+
"filter": {
179
+
"term": {
180
+
"color": "BLUE"
181
+
}
182
+
}
183
+
}
184
+
},
185
+
"script": {
186
+
"lang": "knn",
187
+
"source": "knn_score",
188
+
"params": {
189
+
"field": "my_vector",
190
+
"vector": [9.9, 9.9],
191
+
"space_type": "l2"
192
+
}
193
+
}
194
+
}
195
+
}
196
+
}
197
+
```
198
+
199
+
All options are required.
200
+
201
+
-`lang` is the script type. This value is usually `painless`, but here you must specify `knn`.
202
+
-`source` is the name of the stored script, `knn_store`.
203
+
-`field` is the field that contains your vector data.
204
+
-`vector` is the point you want to find the nearest neighbors for.
205
+
-`space_type` is either `l2` or `cosinesimil`.
206
+
207
+
208
+
## Performance considerations
209
+
210
+
The standard KNN query and custom scoring option have performance tradeoffs. You should test both using a representative set of documents to see if the search results and latencies match your expectations.
211
+
212
+
In general, larger `k` values benefit from the standard KNN query. If you have a smaller `k` value and expect the initial pre-filter to reduce the number of documents to the thousands (not millions), custom scoring can work well.
Copy file name to clipboardExpand all lines: docs/knn/settings.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ parent: KNN
5
5
nav_order: 10
6
6
---
7
7
8
-
# KNN Settings and Statistics
8
+
# KNN Settings and statistics
9
9
10
10
The KNN plugin adds several new index settings, cluster settings, and statistics.
11
11
@@ -60,3 +60,7 @@ Statistic | Description
60
60
`graphMemoryUsage` | Current cache size (total size of all graphs in memory) in kilobytes.
61
61
`missCount` | The number of cache misses. A cache miss occurs when a user queries a graph and it has not yet been loaded into memory.
62
62
`loadExceptionCount` | The number of times an exception occurred when trying to load a graph into the cache.
63
+
`script_compilations` | The number of times the KNN script has been compiled. This value should usually be 1 or 0, but if the cache containing the compiled scripts is filled, the KNN script might be recompiled.
64
+
`script_compilation_errors` | The number of errors during script compilation.
65
+
`script_query_requests` | The number of query requests that use [the KNN script](../#custom-scoring).
66
+
`script_query_errors` | The number of errors during script queries.
0 commit comments