Add docs

carlosdelest · carlosdelest · commit fb366e0ca5bf · 2024-12-11T09:27:27.000+01:00
diff --git a/docs/reference/mapping/types/dense-vector.asciidoc b/docs/reference/mapping/types/dense-vector.asciidoc
@@ -127,6 +127,8 @@ When using a quantized format, you may want to oversample and rescore the result
 To use a quantized index, you can set your index type to `int8_hnsw`, `int4_hnsw`, or `bbq_hnsw`. When indexing `float` vectors, the current default
 index type is `int8_hnsw`.
 
+Quantized vectors can use <<knn-quantized-vector-rescoring,rescoring>> to improve accuracy on approximate kNN search results.
+
 NOTE: Quantization will continue to keep the raw float vector values on disk for reranking, reindexing, and quantization improvements over the lifetime of the data.
 This means disk usage will increase by ~25% for `int8`, ~12.5% for `int4`, and ~3.1% for `bbq` due to the overhead of storing the quantized and raw vectors.
 
diff --git a/docs/reference/query-dsl/knn-query.asciidoc b/docs/reference/query-dsl/knn-query.asciidoc
@@ -134,6 +134,13 @@ documents are then scored according to <<dense-vector-similarity, `similarity`>>
 and the provided `boost` is applied.
 --
 
+`rescore`::
++
+--
+(Optional, object) Rescoring to apply to quantized vectors.
+include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-rescore]
+--
+
 `boost`::
 +
 --
diff --git a/docs/reference/rest-api/common-parms.asciidoc b/docs/reference/rest-api/common-parms.asciidoc
@@ -1346,3 +1346,19 @@ tag::rrf-filter[]
 Applies the specified <<query-dsl-bool-query, boolean query filter>> to all of the specified sub-retrievers,
 according to each retriever's specifications.
 end::rrf-filter[]
+
+tag::knn-rescore[]
+
+NOTE: Rescoring only makes sense for quantized vectors; when <<dense-vector-quantization,quantization>> is not used, the original vectors are used for scoring.
+Rescore option will be ignored for non-quantized `dense_vector` fields.
+
+`oversample`::
+(Required, float)
++
+Applies the specified oversample factor to the approximate kNN search.
+The approximate kNN search will retrieve the top `k * oversample` candidates per shard,
+and then use the original vectors for rescoring.
+The top `k` rescored candidates will be returned as results.
+
+See <<knn-quantized-vector-rescoring,rescoring quantized vectors>> for details.
+end::knn-rescore[]
diff --git a/docs/reference/search/retriever.asciidoc b/docs/reference/search/retriever.asciidoc
@@ -224,6 +224,13 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-filter]
 +
 include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-similarity]
 
+`rescore`::
++
+--
+(Optional, object) Rescoring to apply to quantized vectors.
+include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-rescore]
+--
+
 ===== Restrictions
 
 The parameters `query_vector` and `query_vector_builder` cannot be used together.
@@ -446,15 +453,15 @@ This examples demonstrates how to deploy the Elastic Rerank model and use it to
 
 Follow these steps:
 
-. Create an inference endpoint for the `rerank` task using the <<put-inference-api, Create {infer} API>>. 
+. Create an inference endpoint for the `rerank` task using the <<put-inference-api, Create {infer} API>>.
 +
 [source,console]
 ----
 PUT _inference/rerank/my-elastic-rerank
 {
   "service": "elasticsearch",
   "service_settings": {
-    "model_id": ".rerank-v1", 
+    "model_id": ".rerank-v1",
     "num_threads": 1,
     "adaptive_allocations": { <1>
       "enabled": true,
@@ -465,7 +472,7 @@ PUT _inference/rerank/my-elastic-rerank
 }
 ----
 // TEST[skip:uses ML]
-<1> {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[Adaptive allocations] will be enabled with the minimum of 1 and the maximum of 10 allocations. 
+<1> {ml-docs}/ml-nlp-auto-scale.html#nlp-model-adaptive-allocations[Adaptive allocations] will be enabled with the minimum of 1 and the maximum of 10 allocations.
 +
 . Define a `text_similarity_rerank` retriever:
 +
diff --git a/docs/reference/search/search-your-data/knn-search.asciidoc b/docs/reference/search/search-your-data/knn-search.asciidoc
@@ -1012,6 +1012,55 @@ Now the result will contain the nearest found paragraph when searching.
 // TESTRESPONSE[s/"took": 4/"took" : "$body.took"/]
 
 
+[discrete]
+[[knn-quantized-vector-rescoring]]
+==== Rescoring results for quantized vectors
+
+When using <<dense-vector-quantization,quantized vectors>> for kNN search, you can optionally rescore results to balance performance and accuracy.
+Rescoring works by retrieving more results per shard using approximate kNN, and then use the original vector values for rescoring these results.
+As the non-quantized, original vectors are used to calculate the final score on the top results, rescoring combines:
+- The performance and memory gains of approximate retrieval using quantized vectors on the top candidates.
+- The accuracy of using the original vectors for rescoring the top candidates.
+
+Rescoring won't be as accurate as an <<exact-knn,exact kNN search>>, as some of the top results may not be retrieved using approximate kNN search.
+But the results retrieved by rescoring from the top candidates will have the same score and relative ordering as would be retrieved using exact kNN search.
+
+You can use the `rescore` option to specify an `oversample` parameter.
+When `oversample` is specified, the approximate kNN search will retrieve the top `k * oversample` candidates per shard.
+It will then use the original vectors to rescore them, and return the top `k` results.
+
+`num_candidates` will not be affected by oversample, besides ensuring that there are at least `k * oversample` candidates per shard.
+
+Here is an example of using the `rescore` option with the `oversample` parameter:
+
+[source,console]
+----
+POST image-index/_search
+{
+  "knn": {
+    "field": "image-vector",
+    "query_vector": [-5, 9, -12],
+    "k": 10,
+    "num_candidates": 100,
+    "rescore": {
+      "oversample": 2.0
+    }
+  },
+  "fields": [ "title", "file-type" ]
+}
+----
+//TEST[continued]
+// TEST[s/"k": 10/"k": 3/]
+// TEST[s/"num_candidates": 100/"num_candidates": 3/]
+
+This example will effectively:
+- Search using approximate kNN with `num_candidates` set to 100.
+- Rescore the top 20 (`k * oversample`) candidates per shard using the original vectors.
+- Return the top 10 (`k`) results from the rescored candidates.
+
+NOTE: Rescoring only makes sense for quantized vectors; when <<dense-vector-quantization,quantization>> is not used, the original vectors are used for scoring.
+Rescore option will be ignored for non-quantized `dense_vector` fields.
+
 [discrete]
 [[knn-indexing-considerations]]
 ==== Indexing considerations