Skip to content
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0844cf8
Use a FunctionScoreQuery to replace scores using a VectorSimilarity b…
carlosdelest Nov 12, 2024
5453003
Add rescore vector builder to KnnSearchBuilder
carlosdelest Nov 14, 2024
6a39c4a
Fix refactoring, spotless
carlosdelest Nov 14, 2024
c96a1dc
Check oversampling is not used for quantized types
carlosdelest Nov 18, 2024
37be056
Use KnnRescoreVectorQuery to perform rescoring and limiting the numbe…
carlosdelest Nov 20, 2024
1f2c846
Small name refactoring, fix adjusting parameters
carlosdelest Nov 21, 2024
e2ba897
Minor documentation / style fixes
carlosdelest Dec 2, 2024
f4de859
Properly implement advanceExact()
carlosdelest Dec 2, 2024
eff2be1
Fix VectorSimilarityFloatValueSource implementation for advanceExact
carlosdelest Dec 2, 2024
1e7632c
Correctly implement profiling. Rename ProfilingQuery to QueryProfiler…
carlosdelest Dec 4, 2024
c9af11b
Bytes can't be quantized - remove all infra for byte vectors in resco…
carlosdelest Dec 4, 2024
28d929f
Simplify logic for RescoreKnnVectorQuery now that k is not modifiable
carlosdelest Dec 9, 2024
2bd7699
Vector similarity needs to wrap the new rescoring query and not the o…
carlosdelest Dec 10, 2024
cc92e2e
Add docs
carlosdelest Dec 3, 2024
ec30b45
WIP - Add more docs
carlosdelest Dec 3, 2024
318c6c5
Rewording, adapting to final names in PR
carlosdelest Dec 11, 2024
8ade227
Refactor common params
carlosdelest Dec 11, 2024
4890cd9
Fix rebasing
carlosdelest Dec 11, 2024
6b00373
Fix rebasing
carlosdelest Dec 11, 2024
8785024
Replaced too much
carlosdelest Dec 11, 2024
ff7f906
Rescoring options moved to additional section
carlosdelest Dec 12, 2024
9c00564
Added rescore_vector to Search API
carlosdelest Dec 12, 2024
c3ee2c3
Change rescore_vector for rescore
carlosdelest Jan 9, 2025
3c60c01
Merge branch 'refs/heads/main' into feature/knn-vector-rescore-query-…
carlosdelest Jan 10, 2025
7b5ac90
Fix naming, use oversample parameter instead of num_candidates_factor
carlosdelest Jan 10, 2025
538d250
Merge branch 'main' into feature/knn-vector-rescore-query-docs
carlosdelest Jan 13, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/reference/mapping/types/dense-vector.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -127,6 +127,8 @@ When using a quantized format, you may want to oversample and rescore the result
To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default
index type is `int8_hnsw`.

Quantized vectors can use <<dense-vector-knn-search-reranking,oversampling and rescoring>> to improve accuracy on approximate kNN search results.

NOTE: Quantization will continue to keep the raw float vector values on disk for reranking, reindexing, and quantization improvements over the lifetime of the data.
This means disk usage will increase by ~25% for `int8`, ~12.5% for `int4`, and ~3.1% for `bbq` due to the overhead of storing the quantized and raw vectors.

Expand Down
3 changes: 3 additions & 0 deletions docs/reference/query-dsl/knn-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,9 @@ documents are then scored according to <<dense-vector-similarity, `similarity`>>
and the provided `boost` is applied.
--

include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-rescore-vector]


`boost`::
+
--
Expand Down
20 changes: 20 additions & 0 deletions docs/reference/rest-api/common-parms.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -1346,3 +1346,23 @@ tag::rrf-filter[]
Applies the specified <<query-dsl-bool-query, boolean query filter>> to all of the specified sub-retrievers,
according to each retriever's specifications.
end::rrf-filter[]

tag::knn-rescore-vector[]

`rescore_vector`::
+
--
(Optional, object) Functionality in preview:[]. Apply oversampling and rescoring to quantized vectors.

NOTE: Rescoring only makes sense for quantized vectors; when <<dense-vector-quantization,quantization>> is not used, the original vectors are used for scoring.
Rescore option will be ignored for non-quantized `dense_vector` fields.

`num_candidates_factor`::
(Required, float)
+
Applies the specified oversample factor to the number of candidates on the approximate kNN search.
The approximate kNN search will retrieve `num_candidates * num_candidates_factor` candidates per shard, and then use the original vectors for rescoring them.

See <<dense-vector-knn-search-reranking,oversampling and rescoring quantized vectors>> for details.
--
end::knn-rescore-vector[]
8 changes: 5 additions & 3 deletions docs/reference/search/retriever.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,8 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-filter]
+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-similarity]

include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-rescore-vector]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need to put it in search-api-knn in search.asciidoc

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the catch - done in 9c00564


===== Restrictions

The parameters `query_vector` and `query_vector_builder` cannot be used together.
Expand Down Expand Up @@ -446,15 +448,15 @@ This examples demonstrates how to deploy the Elastic Rerank model and use it to

Follow these steps:

. Create an inference endpoint for the `rerank` task using the <<put-inference-api, Create {infer} API>>.
. Create an inference endpoint for the `rerank` task using the <<put-inference-api, Create {infer} API>>.
+
[source,console]
----
PUT _inference/rerank/my-elastic-rerank
{
"service": "elasticsearch",
"service_settings": {
"model_id": ".rerank-v1",
"model_id": ".rerank-v1",
"num_threads": 1,
"adaptive_allocations": { <1>
"enabled": true,
Expand All @@ -465,7 +467,7 @@ PUT _inference/rerank/my-elastic-rerank
}
----
// TEST[skip:uses ML]
<1> {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[Adaptive allocations] will be enabled with the minimum of 1 and the maximum of 10 allocations.
<1> {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[Adaptive allocations] will be enabled with the minimum of 1 and the maximum of 10 allocations.
+
. Define a `text_similarity_rerank` retriever:
+
Expand Down
227 changes: 146 additions & 81 deletions docs/reference/search/search-your-data/knn-search.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -781,7 +781,7 @@ What if you wanted to filter by some top-level document metadata? You can do thi


NOTE: `filter` will always be over the top-level document metadata. This means you cannot filter based on `nested`
field metadata.
field metadata.

[source,console]
----
Expand Down Expand Up @@ -1068,100 +1068,73 @@ NOTE: Approximate kNN search always uses the
the global top `k` matches across shards. You cannot set the
`search_type` explicitly when running kNN search.


[discrete]
[[exact-knn]]
=== Exact kNN
[[dense-vector-knn-search-reranking]]
==== Oversampling and rescoring for quantized vectors

To run an exact kNN search, use a `script_score` query with a vector function.
When using <<dense-vector-quantization,quantized vectors>> for kNN search, you can optionally rescore results to balance performance and accuracy, by doing:

. Explicitly map one or more `dense_vector` fields. If you don't intend to use
the field for approximate kNN, set the `index` mapping option to `false`. This
can significantly improve indexing speed.
+
[source,console]
----
PUT product-index
{
"mappings": {
"properties": {
"product-vector": {
"type": "dense_vector",
"dims": 5,
"index": false
},
"price": {
"type": "long"
}
}
}
}
----
* *Oversampling*: Retrieve more candidates per shard.
* *Rescoring*: Use the original vector values for re-calculating the score on the oversampled candidates.

. Index your data.
+
[source,console]
----
POST product-index/_bulk?refresh=true
{ "index": { "_id": "1" } }
{ "product-vector": [230.0, 300.33, -34.8988, 15.555, -200.0], "price": 1599 }
{ "index": { "_id": "2" } }
{ "product-vector": [-0.5, 100.0, -13.0, 14.8, -156.0], "price": 799 }
{ "index": { "_id": "3" } }
{ "product-vector": [0.5, 111.3, -13.0, 14.8, -156.0], "price": 1099 }
...
----
//TEST[continued]
//TEST[s/\.\.\.//]
As the non-quantized, original vectors are used to calculate the final score on the top results, rescoring combines:

* The performance and memory gains of approximate retrieval using quantized vectors for retrieving the top candidates.
* The accuracy of using the original vectors for rescoring the top candidates.

All forms of quantization will result in some accuracy loss and as the quantization level increases the accuracy loss will also increase.
Generally, we have found that:

* `int8` requires minimal if any rescoring
* `int4` requires some rescoring for higher accuracy and larger recall scenarios. Generally, oversampling by 1.5x-2x recovers most of the accuracy loss.
* `bbq` requires rescoring except on exceptionally large indices or models specifically designed for quantization. We have found that between 3x-5x oversampling is generally sufficient. But for fewer dimensions or vectors that do not quantize well, higher oversampling may be required.

You can use the `rescore_vector` preview:[] option to automatically perform reranking.
When a rescore `num_candidates_factor` parameter is specified, the approximate kNN search will retrieve the top `num_candidates * oversample` candidates per shard.
It will then use the original vectors to rescore them, and return the top `k` results.

Here is an example of using the `rescore_vector` option with the `num_candidates_factor` parameter:

. Use the <<search-search,search API>> to run a `script_score` query containing
a <<vector-functions,vector function>>.
+
TIP: To limit the number of matched documents passed to the vector function, we
recommend you specify a filter query in the `script_score.query` parameter. If
needed, you can use a <<query-dsl-match-all-query,`match_all` query>> in this
parameter to match all documents. However, matching all documents can
significantly increase search latency.
+
[source,console]
----
POST product-index/_search
POST image-index/_search
{
"query": {
"script_score": {
"query" : {
"bool" : {
"filter" : {
"range" : {
"price" : {
"gte": 1000
}
}
}
}
},
"script": {
"source": "cosineSimilarity(params.queryVector, 'product-vector') + 1.0",
"params": {
"queryVector": [-0.5, 90.0, -10, 14.8, -156.0]
}
}
"knn": {
"field": "image-vector",
"query_vector": [-5, 9, -12],
"k": 10,
"num_candidates": 100,
"rescore_vector": {
"num_candidates_factor": 2.0
}
}
},
"fields": [ "title", "file-type" ]
}
----
//TEST[continued]
// TEST[s/"k": 10/"k": 3/]
// TEST[s/"num_candidates": 100/"num_candidates": 3/]

This example will:

* Search using approximate kNN with `num_candidates` set to 200 (`num_candidates` * `num_candidates_factor`).
* Rescore the top 200 candidates per shard using the original, non quantized vectors.
* Merge the rescored canddidates from all shards, and return the top 10 (`k`) results.

[discrete]
[[dense-vector-knn-search-reranking]]
==== Oversampling and rescoring for quantized vectors
[[dense-vector-knn-search-reranking-rescore-additional]]
===== Additional rescoring techniques

All forms of quantization will result in some accuracy loss and as the quantization level increases the accuracy loss will also increase.
Generally, we have found that:
- `int8` requires minimal if any rescoring
- `int4` requires some rescoring for higher accuracy and larger recall scenarios. Generally, oversampling by 1.5x-2x recovers most of the accuracy loss.
- `bbq` requires rescoring except on exceptionally large indices or models specifically designed for quantization. We have found that between 3x-5x oversampling is generally sufficient. But for fewer dimensions or vectors that do not quantize well, higher oversampling may be required.
The following sections provide additional ways of rescoring:

[discrete]
[[dense-vector-knn-search-reranking-rescore-section]]
====== Use the `rescore` section for top-level kNN search

You can use this option when you don't want to rescore on each shard, but on the top results from all shards.

There are two main ways to oversample and rescore. The first is to utilize the <<rescore, rescore section>> in the `_search` request.
Use the <<rescore, rescore section>> in the `_search` request to rescore the top results from a kNN search.

Here is an example using the top level `knn` search with oversampling and using `rescore` to rerank the results:

Expand Down Expand Up @@ -1210,8 +1183,16 @@ gathering 20 nearest neighbors according to quantized scoring and rescoring with
<5> The weight of the original query, here we simply throw away the original score
<6> The weight of the rescore query, here we only use the rescore query

The second way is to score per shard with the <<query-dsl-knn-query, knn query>> and <<query-dsl-script-score-query, script_score query >>. Generally, this means that there will be more rescoring per shard, but this
can increase overall recall at the cost of compute.

[discrete]
[[dense-vector-knn-search-reranking-script-score]]
====== Use a `script_score` query to rescore per shard

You can use this option when you want to rescore on each shard and want more fine-grained control on the rescoring
than the `rescore_vector` option provides.

Use rescore per shard with the <<query-dsl-knn-query, knn query>> and <<query-dsl-script-score-query, script_score query >>.
Generally, this means that there will be more rescoring per shard, but this can increase overall recall at the cost of compute.

[source,console]
--------------------------------------------------
Expand Down Expand Up @@ -1243,3 +1224,87 @@ POST /my-index/_search
<3> The number of candidates to use for the initial approximate `knn` search. This will search using the quantized vectors
and return the top 20 candidates per shard to then be scored
<4> The script to score the results. Script score will interact directly with the originally provided float32 vector.


[discrete]
[[exact-knn]]
=== Exact kNN

To run an exact kNN search, use a `script_score` query with a vector function.

. Explicitly map one or more `dense_vector` fields. If you don't intend to use
the field for approximate kNN, set the `index` mapping option to `false`. This
can significantly improve indexing speed.
+
[source,console]
----
PUT product-index
{
"mappings": {
"properties": {
"product-vector": {
"type": "dense_vector",
"dims": 5,
"index": false
},
"price": {
"type": "long"
}
}
}
}
----

. Index your data.
+
[source,console]
----
POST product-index/_bulk?refresh=true
{ "index": { "_id": "1" } }
{ "product-vector": [230.0, 300.33, -34.8988, 15.555, -200.0], "price": 1599 }
{ "index": { "_id": "2" } }
{ "product-vector": [-0.5, 100.0, -13.0, 14.8, -156.0], "price": 799 }
{ "index": { "_id": "3" } }
{ "product-vector": [0.5, 111.3, -13.0, 14.8, -156.0], "price": 1099 }
...
----
//TEST[continued]
//TEST[s/\.\.\.//]

. Use the <<search-search,search API>> to run a `script_score` query containing
a <<vector-functions,vector function>>.
+
TIP: To limit the number of matched documents passed to the vector function, we
recommend you specify a filter query in the `script_score.query` parameter. If
needed, you can use a <<query-dsl-match-all-query,`match_all` query>> in this
parameter to match all documents. However, matching all documents can
significantly increase search latency.
+
[source,console]
----
POST product-index/_search
{
"query": {
"script_score": {
"query" : {
"bool" : {
"filter" : {
"range" : {
"price" : {
"gte": 1000
}
}
}
}
},
"script": {
"source": "cosineSimilarity(params.queryVector, 'product-vector') + 1.0",
"params": {
"queryVector": [-0.5, 90.0, -10, 14.8, -156.0]
}
}
}
}
}
----
//TEST[continued]
2 changes: 2 additions & 0 deletions docs/reference/search/search.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -528,6 +528,8 @@ not both. Refer to <<knn-semantic-search>> to learn more.
(Optional, float)
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-similarity]

include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-rescore-vector]

====

[[search-api-min-score]]
Expand Down