-
Notifications
You must be signed in to change notification settings - Fork 25.6k
[Docs] kNN vector rescoring for quantized vectors #118425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 19 commits
0844cf8
5453003
6a39c4a
c96a1dc
37be056
1f2c846
e2ba897
f4de859
eff2be1
1e7632c
c9af11b
28d929f
2bd7699
cc92e2e
ec30b45
318c6c5
8ade227
4890cd9
6b00373
8785024
ff7f906
9c00564
c3ee2c3
3c60c01
7b5ac90
538d250
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -781,7 +781,7 @@ What if you wanted to filter by some top-level document metadata? You can do thi | |
|
||
|
||
NOTE: `filter` will always be over the top-level document metadata. This means you cannot filter based on `nested` | ||
field metadata. | ||
field metadata. | ||
|
||
[source,console] | ||
---- | ||
|
@@ -1068,102 +1068,80 @@ NOTE: Approximate kNN search always uses the | |
the global top `k` matches across shards. You cannot set the | ||
`search_type` explicitly when running kNN search. | ||
|
||
|
||
[discrete] | ||
[[exact-knn]] | ||
=== Exact kNN | ||
[[dense-vector-knn-search-reranking]] | ||
==== Oversampling and rescoring for quantized vectors | ||
|
||
To run an exact kNN search, use a `script_score` query with a vector function. | ||
When using <<dense-vector-quantization,quantized vectors>> for kNN search, you can optionally rescore results to balance performance and accuracy, by doing: | ||
|
||
. Explicitly map one or more `dense_vector` fields. If you don't intend to use | ||
the field for approximate kNN, set the `index` mapping option to `false`. This | ||
can significantly improve indexing speed. | ||
+ | ||
[source,console] | ||
---- | ||
PUT product-index | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"product-vector": { | ||
"type": "dense_vector", | ||
"dims": 5, | ||
"index": false | ||
}, | ||
"price": { | ||
"type": "long" | ||
} | ||
} | ||
} | ||
} | ||
---- | ||
* *Oversampling*: Retrieve more candidates per shard. | ||
* *Rescoring*: Use the original vector values for re-calculating the score on the oversampled candidates. | ||
|
||
. Index your data. | ||
+ | ||
[source,console] | ||
---- | ||
POST product-index/_bulk?refresh=true | ||
{ "index": { "_id": "1" } } | ||
{ "product-vector": [230.0, 300.33, -34.8988, 15.555, -200.0], "price": 1599 } | ||
{ "index": { "_id": "2" } } | ||
{ "product-vector": [-0.5, 100.0, -13.0, 14.8, -156.0], "price": 799 } | ||
{ "index": { "_id": "3" } } | ||
{ "product-vector": [0.5, 111.3, -13.0, 14.8, -156.0], "price": 1099 } | ||
... | ||
---- | ||
//TEST[continued] | ||
//TEST[s/\.\.\.//] | ||
As the non-quantized, original vectors are used to calculate the final score on the top results, rescoring combines: | ||
|
||
* The performance and memory gains of approximate retrieval using quantized vectors for retrieving the top candidates. | ||
* The accuracy of using the original vectors for rescoring the top candidates. | ||
|
||
All forms of quantization will result in some accuracy loss and as the quantization level increases the accuracy loss will also increase. | ||
Generally, we have found that: | ||
|
||
* `int8` requires minimal if any rescoring | ||
* `int4` requires some rescoring for higher accuracy and larger recall scenarios. Generally, oversampling by 1.5x-2x recovers most of the accuracy loss. | ||
* `bbq` requires rescoring except on exceptionally large indices or models specifically designed for quantization. We have found that between 3x-5x oversampling is generally sufficient. But for fewer dimensions or vectors that do not quantize well, higher oversampling may be required. | ||
|
||
There are three main ways to oversample and rescore: | ||
|
||
* <<dense-vector-knn-search-reranking-rescore-parameter>> | ||
* <<dense-vector-knn-search-reranking-rescore-section>> | ||
* <<dense-vector-knn-search-reranking-script-score>> | ||
|
||
|
||
[discrete] | ||
[[dense-vector-knn-search-reranking-rescore-parameter]] | ||
===== Use the `rescore_vector` option to rescore per shard | ||
|
||
preview:[] | ||
|
||
You can use the `rescore_vector` option to automatically perform reranking. | ||
When a rescore `num_candidates_factor` parameter is specified, the approximate kNN search will retrieve the top `num_candidates * oversample` candidates per shard. | ||
It will then use the original vectors to rescore them, and return the top `k` results. | ||
|
||
Here is an example of using the `rescore_vector` option with the `num_candidates_factor` parameter: | ||
|
||
. Use the <<search-search,search API>> to run a `script_score` query containing | ||
a <<vector-functions,vector function>>. | ||
+ | ||
TIP: To limit the number of matched documents passed to the vector function, we | ||
recommend you specify a filter query in the `script_score.query` parameter. If | ||
needed, you can use a <<query-dsl-match-all-query,`match_all` query>> in this | ||
parameter to match all documents. However, matching all documents can | ||
significantly increase search latency. | ||
+ | ||
[source,console] | ||
---- | ||
POST product-index/_search | ||
POST image-index/_search | ||
{ | ||
"query": { | ||
"script_score": { | ||
"query" : { | ||
"bool" : { | ||
"filter" : { | ||
"range" : { | ||
"price" : { | ||
"gte": 1000 | ||
} | ||
} | ||
} | ||
} | ||
}, | ||
"script": { | ||
"source": "cosineSimilarity(params.queryVector, 'product-vector') + 1.0", | ||
"params": { | ||
"queryVector": [-0.5, 90.0, -10, 14.8, -156.0] | ||
} | ||
} | ||
"knn": { | ||
"field": "image-vector", | ||
"query_vector": [-5, 9, -12], | ||
"k": 10, | ||
"num_candidates": 100, | ||
"rescore_vector": { | ||
"num_candidates_factor": 2.0 | ||
} | ||
} | ||
}, | ||
"fields": [ "title", "file-type" ] | ||
} | ||
---- | ||
//TEST[continued] | ||
// TEST[s/"k": 10/"k": 3/] | ||
// TEST[s/"num_candidates": 100/"num_candidates": 3/] | ||
|
||
[discrete] | ||
[[dense-vector-knn-search-reranking]] | ||
==== Oversampling and rescoring for quantized vectors | ||
This example will: | ||
|
||
* Search using approximate kNN with `num_candidates` set to 200 (`num_candidates` * `num_candidates_factor`). | ||
* Rescore the top 200 candidates per shard using the original, non quantized vectors. | ||
* Merge the rescored canddidates from all shards, and return the top 10 (`k`) results. | ||
|
||
All forms of quantization will result in some accuracy loss and as the quantization level increases the accuracy loss will also increase. | ||
Generally, we have found that: | ||
- `int8` requires minimal if any rescoring | ||
- `int4` requires some rescoring for higher accuracy and larger recall scenarios. Generally, oversampling by 1.5x-2x recovers most of the accuracy loss. | ||
- `bbq` requires rescoring except on exceptionally large indices or models specifically designed for quantization. We have found that between 3x-5x oversampling is generally sufficient. But for fewer dimensions or vectors that do not quantize well, higher oversampling may be required. | ||
|
||
There are two main ways to oversample and rescore. The first is to utilize the <<rescore, rescore section>> in the `_search` request. | ||
[discrete] | ||
[[dense-vector-knn-search-reranking-rescore-section]] | ||
===== Use the `rescore_vector` section for top-level kNN search | ||
|
||
|
||
You can use the <<rescore, rescore section>> in the `_search` request to rescore the top results from a kNN search. | ||
|
||
Here is an example using the top level `knn` search with oversampling and using `rescore` to rerank the results: | ||
Here is an example using the top level `knn` search with oversampling and using `rescore_vector` to rerank the results: | ||
|
||
[source,console] | ||
-------------------------------------------------- | ||
|
@@ -1210,8 +1188,13 @@ gathering 20 nearest neighbors according to quantized scoring and rescoring with | |
<5> The weight of the original query, here we simply throw away the original score | ||
<6> The weight of the rescore query, here we only use the rescore query | ||
|
||
The second way is to score per shard with the <<query-dsl-knn-query, knn query>> and <<query-dsl-script-score-query, script_score query >>. Generally, this means that there will be more rescoring per shard, but this | ||
can increase overall recall at the cost of compute. | ||
|
||
[discrete] | ||
[[dense-vector-knn-search-reranking-script-score]] | ||
===== Use a `script_score` query to rescore per shard | ||
|
||
You can rescore per shard with the <<query-dsl-knn-query, knn query>> and <<query-dsl-script-score-query, script_score query >>. | ||
Generally, this means that there will be more rescoring per shard, but this can increase overall recall at the cost of compute. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let's not bother calling out script_score. I say remove this section and focus on "rescore_vector" |
||
|
||
[source,console] | ||
-------------------------------------------------- | ||
|
@@ -1243,3 +1226,87 @@ POST /my-index/_search | |
<3> The number of candidates to use for the initial approximate `knn` search. This will search using the quantized vectors | ||
and return the top 20 candidates per shard to then be scored | ||
<4> The script to score the results. Script score will interact directly with the originally provided float32 vector. | ||
|
||
|
||
[discrete] | ||
[[exact-knn]] | ||
=== Exact kNN | ||
|
||
To run an exact kNN search, use a `script_score` query with a vector function. | ||
|
||
. Explicitly map one or more `dense_vector` fields. If you don't intend to use | ||
the field for approximate kNN, set the `index` mapping option to `false`. This | ||
can significantly improve indexing speed. | ||
+ | ||
[source,console] | ||
---- | ||
PUT product-index | ||
{ | ||
"mappings": { | ||
"properties": { | ||
"product-vector": { | ||
"type": "dense_vector", | ||
"dims": 5, | ||
"index": false | ||
}, | ||
"price": { | ||
"type": "long" | ||
} | ||
} | ||
} | ||
} | ||
---- | ||
|
||
. Index your data. | ||
+ | ||
[source,console] | ||
---- | ||
POST product-index/_bulk?refresh=true | ||
{ "index": { "_id": "1" } } | ||
{ "product-vector": [230.0, 300.33, -34.8988, 15.555, -200.0], "price": 1599 } | ||
{ "index": { "_id": "2" } } | ||
{ "product-vector": [-0.5, 100.0, -13.0, 14.8, -156.0], "price": 799 } | ||
{ "index": { "_id": "3" } } | ||
{ "product-vector": [0.5, 111.3, -13.0, 14.8, -156.0], "price": 1099 } | ||
... | ||
---- | ||
//TEST[continued] | ||
//TEST[s/\.\.\.//] | ||
|
||
. Use the <<search-search,search API>> to run a `script_score` query containing | ||
a <<vector-functions,vector function>>. | ||
+ | ||
TIP: To limit the number of matched documents passed to the vector function, we | ||
recommend you specify a filter query in the `script_score.query` parameter. If | ||
needed, you can use a <<query-dsl-match-all-query,`match_all` query>> in this | ||
parameter to match all documents. However, matching all documents can | ||
significantly increase search latency. | ||
+ | ||
[source,console] | ||
---- | ||
POST product-index/_search | ||
{ | ||
"query": { | ||
"script_score": { | ||
"query" : { | ||
"bool" : { | ||
"filter" : { | ||
"range" : { | ||
"price" : { | ||
"gte": 1000 | ||
} | ||
} | ||
} | ||
} | ||
}, | ||
"script": { | ||
"source": "cosineSimilarity(params.queryVector, 'product-vector') + 1.0", | ||
"params": { | ||
"queryVector": [-0.5, 90.0, -10, 14.8, -156.0] | ||
} | ||
} | ||
} | ||
} | ||
} | ||
---- | ||
//TEST[continued] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we also need to put it in
search-api-knn
insearch.asciidoc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch - done in 9c00564