You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/reference/mapping/types/dense-vector.asciidoc
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -121,11 +121,13 @@ The three following quantization strategies are supported:
121
121
* `bbq` - experimental:[] Better binary quantization which reduces each dimension to a single bit precision. This reduces the memory footprint by 96% (or 32x) at a larger cost of accuracy. Generally, oversampling during query time and reranking can help mitigate the accuracy loss.
122
122
123
123
124
-
When using a quantized format, you may want to oversample and rescore the results to improve accuracy. See <<dense-vector-knn-search-reranking, oversampling and rescoring>> for more information.
124
+
When using a quantized format, you may want to oversample and rescore the results to improve accuracy. See <<dense-vector-knn-search-rescoring, oversampling and rescoring>> for more information.
125
125
126
126
To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default
127
127
index type is `int8_hnsw`.
128
128
129
+
Quantized vectors can use <<dense-vector-knn-search-rescoring,oversampling and rescoring>> to improve accuracy on approximate kNN search results.
130
+
129
131
NOTE: Quantization will continue to keep the raw float vector values on disk for reranking, reindexing, and quantization improvements over the lifetime of the data.
130
132
This means disk usage will increase by ~25% for `int8`, ~12.5% for `int4`, and ~3.1% for `bbq` due to the overhead of storing the quantized and raw vectors.
Copy file name to clipboardExpand all lines: docs/reference/rest-api/common-parms.asciidoc
+24Lines changed: 24 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1356,3 +1356,27 @@ tag::rrf-filter[]
1356
1356
Applies the specified <<query-dsl-bool-query, boolean query filter>> to all of the specified sub-retrievers,
1357
1357
according to each retriever's specifications.
1358
1358
end::rrf-filter[]
1359
+
1360
+
tag::knn-rescore-vector[]
1361
+
1362
+
`rescore_vector`::
1363
+
+
1364
+
--
1365
+
(Optional, object) Functionality in preview:[]. Apply oversampling and rescoring to quantized vectors.
1366
+
1367
+
NOTE: Rescoring only makes sense for quantized vectors; when <<dense-vector-quantization,quantization>> is not used, the original vectors are used for scoring.
1368
+
Rescore option will be ignored for non-quantized `dense_vector` fields.
1369
+
1370
+
`oversample`::
1371
+
(Required, float)
1372
+
+
1373
+
Applies the specified oversample factor to `k` on the approximate kNN search.
1374
+
The approximate kNN search will:
1375
+
1376
+
* Retrieve `num_candidates` candidates per shard.
1377
+
* From these candidates, the top `k * oversample` candidates per shard will be rescored using the original vectors.
1378
+
* The top `k` rescored candidates will be returned.
1379
+
1380
+
See <<dense-vector-knn-search-rescoring,oversampling and rescoring quantized vectors>> for details.
The parameters `query_vector` and `query_vector_builder` cannot be used together.
@@ -576,15 +578,15 @@ This example demonstrates how to deploy the {ml-docs}/ml-nlp-rerank.html[Elastic
576
578
577
579
Follow these steps:
578
580
579
-
. Create an inference endpoint for the `rerank` task using the <<put-inference-api, Create {infer} API>>.
581
+
. Create an inference endpoint for the `rerank` task using the <<put-inference-api, Create {infer} API>>.
580
582
+
581
583
[source,console]
582
584
----
583
585
PUT _inference/rerank/my-elastic-rerank
584
586
{
585
587
"service": "elasticsearch",
586
588
"service_settings": {
587
-
"model_id": ".rerank-v1",
589
+
"model_id": ".rerank-v1",
588
590
"num_threads": 1,
589
591
"adaptive_allocations": { <1>
590
592
"enabled": true,
@@ -595,7 +597,7 @@ PUT _inference/rerank/my-elastic-rerank
595
597
}
596
598
----
597
599
// TEST[skip:uses ML]
598
-
<1> {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[Adaptive allocations] will be enabled with the minimum of 1 and the maximum of 10 allocations.
600
+
<1> {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[Adaptive allocations] will be enabled with the minimum of 1 and the maximum of 10 allocations.
the global top `k` matches across shards. You cannot set the
1069
1069
`search_type` explicitly when running kNN search.
1070
1070
1071
+
1071
1072
[discrete]
1072
-
[[exact-knn]]
1073
-
=== Exact kNN
1073
+
[[dense-vector-knn-search-rescoring]]
1074
+
==== Oversampling and rescoring for quantized vectors
1074
1075
1075
-
To run an exact kNN search, use a `script_score` query with a vector function.
1076
+
When using <<dense-vector-quantization,quantized vectors>> for kNN search, you can optionally rescore results to balance performance and accuracy, by doing:
1076
1077
1077
-
. Explicitly map one or more `dense_vector` fields. If you don't intend to use
1078
-
the field for approximate kNN, set the `index` mapping option to `false`. This
1079
-
can significantly improve indexing speed.
1080
-
+
1081
-
[source,console]
1082
-
----
1083
-
PUT product-index
1084
-
{
1085
-
"mappings": {
1086
-
"properties": {
1087
-
"product-vector": {
1088
-
"type": "dense_vector",
1089
-
"dims": 5,
1090
-
"index": false
1091
-
},
1092
-
"price": {
1093
-
"type": "long"
1094
-
}
1095
-
}
1096
-
}
1097
-
}
1098
-
----
1078
+
* *Oversampling*: Retrieve more candidates per shard.
1079
+
* *Rescoring*: Use the original vector values for re-calculating the score on the oversampled candidates.
As the non-quantized, original vectors are used to calculate the final score on the top results, rescoring combines:
1082
+
1083
+
* The performance and memory gains of approximate retrieval using quantized vectors for retrieving the top candidates.
1084
+
* The accuracy of using the original vectors for rescoring the top candidates.
1085
+
1086
+
All forms of quantization will result in some accuracy loss and as the quantization level increases the accuracy loss will also increase.
1087
+
Generally, we have found that:
1088
+
1089
+
* `int8` requires minimal if any rescoring
1090
+
* `int4` requires some rescoring for higher accuracy and larger recall scenarios. Generally, oversampling by 1.5x-2x recovers most of the accuracy loss.
1091
+
* `bbq` requires rescoring except on exceptionally large indices or models specifically designed for quantization. We have found that between 3x-5x oversampling is generally sufficient. But for fewer dimensions or vectors that do not quantize well, higher oversampling may be required.
1092
+
1093
+
You can use the `rescore_vector` preview:[] option to automatically perform reranking.
1094
+
When a rescore `oversample` parameter is specified, the approximate kNN search will:
1095
+
1096
+
* Retrieve `num_candidates` candidates per shard.
1097
+
* From these candidates, the top `k * oversample` candidates per shard will be rescored using the original vectors.
1098
+
* The top `k` rescored candidates will be returned.
1099
+
1100
+
Here is an example of using the `rescore_vector` option with the `oversample` parameter:
1115
1101
1116
-
. Use the <<search-search,search API>> to run a `script_score` query containing
1117
-
a <<vector-functions,vector function>>.
1118
-
+
1119
-
TIP: To limit the number of matched documents passed to the vector function, we
1120
-
recommend you specify a filter query in the `script_score.query` parameter. If
1121
-
needed, you can use a <<query-dsl-match-all-query,`match_all` query>> in this
1122
-
parameter to match all documents. However, matching all documents can
All forms of quantization will result in some accuracy loss and as the quantization level increases the accuracy loss will also increase.
1159
-
Generally, we have found that:
1160
-
- `int8` requires minimal if any rescoring
1161
-
- `int4` requires some rescoring for higher accuracy and larger recall scenarios. Generally, oversampling by 1.5x-2x recovers most of the accuracy loss.
1162
-
- `bbq` requires rescoring except on exceptionally large indices or models specifically designed for quantization. We have found that between 3x-5x oversampling is generally sufficient. But for fewer dimensions or vectors that do not quantize well, higher oversampling may be required.
1133
+
The following sections provide additional ways of rescoring:
====== Use the `rescore` section for top-level kNN search
1138
+
1139
+
You can use this option when you don't want to rescore on each shard, but on the top results from all shards.
1163
1140
1164
-
There are two main ways to oversample and rescore. The first is to utilize the <<rescore, rescore section>> in the `_search` request.
1141
+
Use the <<rescore, rescore section>> in the `_search` request to rescore the top results from a kNN search.
1165
1142
1166
1143
Here is an example using the top level `knn` search with oversampling and using `rescore` to rerank the results:
1167
1144
@@ -1210,8 +1187,16 @@ gathering 20 nearest neighbors according to quantized scoring and rescoring with
1210
1187
<5> The weight of the original query, here we simply throw away the original score
1211
1188
<6> The weight of the rescore query, here we only use the rescore query
1212
1189
1213
-
The second way is to score per shard with the <<query-dsl-knn-query, knn query>> and <<query-dsl-script-score-query, script_score query >>. Generally, this means that there will be more rescoring per shard, but this
1214
-
can increase overall recall at the cost of compute.
0 commit comments