elastic · kderusso · Nov 6, 2024 · Oct 25, 2024 · Oct 29, 2024 · Oct 29, 2024
diff --git a/docs/reference/search/retriever.asciidoc b/docs/reference/search/retriever.asciidoc
@@ -1,14 +1,12 @@
 [[retriever]]
 === Retriever
 
-A retriever is a specification to describe top documents returned from a
-search. A retriever replaces other elements of the <<search-search, search API>>
+A retriever is a specification to describe top documents returned from a search.
+A retriever replaces other elements of the <<search-search, search API>>
 that also return top documents such as <<query-dsl, `query`>> and
-<<search-api-knn, `knn`>>. A retriever may have child retrievers where a
-retriever with two or more children is considered a compound retriever. This
-allows for complex behavior to be depicted in a tree-like structure, called
-the retriever tree, to better clarify the order of operations that occur
-during a search.
+<<search-api-knn, `knn`>>.
+A retriever may have child retrievers where a retriever with two or more children is considered a compound retriever.
+This allows for complex behavior to be depicted in a tree-like structure, called the retriever tree, to better clarify the order of operations that occur during a search.
 
 [TIP]
 ====
@@ -29,6 +27,9 @@ A <<rrf-retriever, retriever>> that produces top documents from <<rrf, reciproca
 `text_similarity_reranker`::
 A <<text-similarity-reranker-retriever, retriever>> that enhances search results by re-ranking documents based on semantic similarity to a specified inference text, using a machine learning model.
 
+`rule`::
+A <<rule-retriever, retriever>> that applies contextual <<query-rules>> to pin or exclude documents for specific queries.
+
 [[standard-retriever]]
 ==== Standard Retriever
 
@@ -44,8 +45,7 @@ Defines a query to retrieve a set of top documents.
 `filter`::
 (Optional, <<query-dsl, query object or list of query objects>>)
 +
-Applies a <<query-dsl-bool-query, boolean query filter>> to this retriever
-where all documents must match this query but do not contribute to the score.
+Applies a <<query-dsl-bool-query, boolean query filter>> to this retriever where all documents must match this query but do not contribute to the score.
 
 `search_after`::
 (Optional, <<search-after, search after object>>)
@@ -56,14 +56,13 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=terminate_after]
 
 `sort`::
 +
-(Optional, <<sort-search-results, sort object>>)
-A sort object that that specifies the order of matching documents.
+(Optional, <<sort-search-results, sort object>>) A sort object that that specifies the order of matching documents.
 
 `min_score`::
 (Optional, `float`)
 +
-Minimum <<relevance-scores, `_score`>> for matching documents. Documents with a
-lower `_score` are not included in the top documents.
+Minimum <<relevance-scores, `_score`>> for matching documents.
+Documents with a lower `_score` are not included in the top documents.
 
 `collapse`::
 (Optional, <<collapse-search-results, collapse object>>)
@@ -72,8 +71,7 @@ Collapses the top documents by a specified key into a single top document per ke
 
 ===== Restrictions
 
-When a retriever tree contains a compound retriever (a retriever with two or more child
-retrievers) the <<search-after, search after>> parameter is not supported.
+When a retriever tree contains a compound retriever (a retriever with two or more child retrievers) the <<search-after, search after>> parameter is not supported.
 
 [discrete]
 [[standard-retriever-example]]
@@ -105,12 +103,39 @@ POST /restaurants/_bulk?refresh
 {"region": "Austria", "year": "2020", "vector": [10, 22, 79]}
 {"index":{}}
 {"region": "France", "year": "2020", "vector": [10, 22, 80]}
+
+PUT /movies
+
+PUT _query_rules/my-ruleset
+{
+    "rules": [
+        {
+            "rule_id": "my-rule1",
+            "type": "pinned",
+            "criteria": [
+                {
+                    "type": "exact",
+                    "metadata": "query_string",
+                    "values": [ "pugs" ]
+                }
+            ],
+            "actions": {
+                "ids": [
+                    "id1"
+                ]
+            }
+        }
+    ]
+}
+
 ----
 // TESTSETUP
 
 [source,console]
 --------------------------------------------------
 DELETE /restaurants
+
+DELETE /movies
 --------------------------------------------------
 // TEARDOWN
 ////
@@ -143,11 +168,13 @@ GET /restaurants/_search
   }
 }
 ----
+
 <1> Opens the `retriever` object.
 <2> The `standard` retriever is used for defining traditional {es} queries.
 <3> The entry point for defining the search query.
 <4> The `bool` object allows for combining multiple query clauses logically.
-<5> The `should` array indicates conditions under which a document will match. Documents matching these conditions will increase their relevancy score.
+<5> The `should` array indicates conditions under which a document will match.
+Documents matching these conditions will increase their relevancy score.
 <6> The `match` object finds documents where the `region` field contains the word "Austria."
 <7> The `filter` array provides filtering conditions that must be met but do not contribute to the relevancy score.
 <8> The `term` object is used for exact matches, in this case, filtering documents by the `year` field.
@@ -178,8 +205,8 @@ Defines a <<knn-semantic-search, model>> to build a query vector.
 `k`::
 (Required, integer)
 +
-Number of nearest neighbors to return as top hits. This value must be fewer than
-or equal to `num_candidates`.
+Number of nearest neighbors to return as top hits.
+This value must be fewer than or equal to `num_candidates`.
 
 `num_candidates`::
 (Required, integer)
@@ -222,16 +249,15 @@ GET /restaurants/_search
 <1> Configuration for k-nearest neighbor (knn) search, which is based on vector similarity.
 <2> Specifies the field name that contains the vectors.
 <3> The query vector against which document vectors are compared in the `knn` search.
-<4> The number of nearest neighbors to return as top hits. This value must be fewer than or equal to `num_candidates`.
+<4> The number of nearest neighbors to return as top hits.
+This value must be fewer than or equal to `num_candidates`.
 <5> The size of the initial candidate set from which the final `k` nearest neighbors are selected.
 
 [[rrf-retriever]]
 ==== RRF Retriever
 
-An <<rrf, RRF>> retriever returns top documents based on the RRF formula,
-equally weighting two or more child retrievers.
-Reciprocal rank fusion (RRF) is a method for combining multiple result
-sets with different relevance indicators into a single result set.
+An <<rrf, RRF>> retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers.
+Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set.
 
 ===== Parameters
 
@@ -357,7 +383,8 @@ Refer to <<semantic-reranking>> for a high level overview of semantic re-ranking
 ===== Prerequisites
 
 To use `text_similarity_reranker` you must first set up a `rerank` task using the <<put-inference-api, Create {infer} API>>.
-The `rerank` task should be set up with a machine learning model that can compute text similarity. Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third-party text similarity models supported by {es}.
+The `rerank` task should be set up with a machine learning model that can compute text similarity.
+Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third-party text similarity models supported by {es}.
 
 Currently you can:
 
@@ -368,6 +395,7 @@ Currently you can:
 ** Refer to the <<text-similarity-reranker-retriever-example-eland,example>> on this page for a step-by-step guide.
 
-
+[discrete]
+[[text-similarity-reranker-retriever-parameters]]
-
+[discrete]
+[[text-similarity-reranker-retriever-parameters]]
 ===== Parameters
+
 `retriever`::
 (Required, <<retriever, retriever>>)
 +
@@ -376,7 +404,8 @@ The child retriever that generates the initial set of top documents to be re-ran
 `field`::
 (Required, `string`)
 +
-The document field to be used for text similarity comparisons. This field should contain the text that will be evaluated against the `inferenceText`.
+The document field to be used for text similarity comparisons.
+This field should contain the text that will be evaluated against the `inferenceText`.
 
 `inference_id`::
 (Required, `string`)
@@ -391,25 +420,28 @@ The text snippet used as the basis for similarity comparison.
 `rank_window_size`::
 (Optional, `int`)
 +
-The number of top documents to consider in the re-ranking process. Defaults to `10`.
+The number of top documents to consider in the re-ranking process.
+Defaults to `10`.
 
 `min_score`::
 (Optional, `float`)
 +
-Sets a minimum threshold score for including documents in the re-ranked results. Documents with similarity scores below this threshold will be excluded. Note that score calculations vary depending on the model used.
+Sets a minimum threshold score for including documents in the re-ranked results.
+Documents with similarity scores below this threshold will be excluded.
+Note that score calculations vary depending on the model used.
 
 `filter`::
 (Optional, <<query-dsl, query object or list of query objects>>)
 +
 Applies the specified <<query-dsl-bool-query, boolean query filter>> to the child  <<retriever, retriever>>.
-If the child retriever already specifies any filters, then this top-level filter is applied in conjuction
-with the filter defined in the child retriever.
+If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever.
 
 [discrete]
 [[text-similarity-reranker-retriever-example-cohere]]
 ==== Example: Cohere Rerank
 
-This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API. This approach eliminate the need to generate and store embeddings for all indexed documents.
+This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API.
+This approach eliminate the need to generate and store embeddings for all indexed documents.
 This requires a <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type.
 
 [source,console]
@@ -459,7 +491,9 @@ Follow these steps to load the model and create a semantic re-ranker.
 python -m pip install eland[pytorch]
 ----
 +
-. Upload the model to {es} using Eland. This example assumes you have an Elastic Cloud deployment and an API key. Refer to the https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-auth[Eland documentation] for more authentication options.
+. Upload the model to {es} using Eland.
+This example assumes you have an Elastic Cloud deployment and an API key.
+Refer to the https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-auth[Eland documentation] for more authentication options.
 +
 [source,sh]
 ----
@@ -517,14 +551,137 @@ POST movies/_search
 This retriever uses a standard `match` query to search the `movie` index for films tagged with the genre "drama".
 It then re-ranks the results based on semantic similarity to the text in the `inference_text` parameter, using the model we uploaded to {es}.
 
+[[rule-retriever]]
+==== Query Rules Retriever
+
+The `rule` retriever offers users fine-grained control over the search results by applying contextual <<query-rules>> to pin or exclude documents for specific queries.
+This retriever performs similar functionality to the <<query-dsl-rule-query>> but works out of the box with other retrievers including reranking retrievers like <<text-similarity-reranker-retriever, text_similarity_reranker>> and <<rrf-retriever, rrf>>.
+
+===== Prerequisites
+
+To use the `rule` retriever you must first create one or more query rulesets using the <<query-rules-apis, query rules management APIs>>.
+
+===== Parameters
+
+`retriever`::
+(Required, <<retriever, retriever>>)
++
+The child retriever that returns the results we will apply query rules on top of.
+This can be a standalone retriever such as the <<standard-retriever, standard>> or <<knn-retriever, knn>> retriever, or it can be a compound retriever.
+
+`ruleset_ids`::
+(Required, `array`)
++
+An array of one or more unique <<query-rules-apis, query ruleset>> ID with query-based rules to match and apply as applicable.
+Rulesets and their associated rules are evaluated in the order in which they are specified in the query and ruleset.
+The maximum number of rulesets to specify is 10.
+
+`match_criteria`::
+(Required, `object`)
++
+Defines the match criteria to apply to rules in the given query ruleset.
+Match criteria should match the keys defined in the `criteria.metadata` field of the rule.
+
+`rank_window_size`::
+(Optional, `int`)
++
+The number of top documents to return from the `rule` retriever.
+Defaults to `10`.
+
+[discrete]
+[[rule-retriever-example]]
+==== Example: Rule retriever
+
+This example shows the rule retriever executed without any additional retrievers.
+It runs the query defined by the `retriever` and applies the rules from `my-ruleset` on top of the returned results.
+
+[source,console]
+----
+GET movies/_search
+{
+  "retriever": {
+    "rule": {
+      "match_criteria": {
+        "query_string": "harry potter"
+      },
+      "ruleset_ids": [
+        "my-ruleset"
+      ],
+      "retriever": {
+        "standard": {
+          "query": {
+            "query_string": {
+              "query": "harry potter"
+            }
+          }
+        }
+      }
+    }
+  }
+}
+----
+
+[discrete]
+[[rule-retriever-example-rrf]]
+==== Example: Rule retriever combined with RRF
+
+This example shows how to combine the `rule` retriever with other rerank retrievers such as <<rrf-retriever, rrf>> or <<text-similarity-reranker-retriever, text_similarity_reranker>>.
+
+[WARNING]
+====
+The `rule` retriever will apply rules to any documents returned from its defined `retriever` or any of its sub-retrievers.
+This means that for the best results, the `rule` retriever should be the outermost defined retriever.
+Nesting a `rule` retriever as a sub-retriever under a reranker such as `rrf` or `text_similarity_reranker` may not produce the expected results.
+====
+
+[source,console]
+----
+GET movies/_search
+{
+  "retriever": {
+    "rule": {
+      "match_criteria": {
+        "query_string": "harry potter"
+      },
+      "ruleset_ids": [
+        "my-ruleset"
+      ],
+      "retriever": {
+        "rrf": {
+          "retrievers": [
+            {
+              "standard": {
+                "query": {
+                  "query_string": {
+                    "query": "sorcerer's stone"
+                  }
+                }
+              }
+            },
+            {
+              "standard": {
+                "query": {
+                  "query_string": {
+                    "query": "chamber of secrets"
+                  }
+                }
+              }
+            }
+          ]
+        }
+      }
+    }
+  }
+}
+----
+
 ==== Using `from` and `size` with a retriever tree
 
 The <<search-from-param, `from`>> and <<search-size-param, `size`>>
 parameters are provided globally as part of the general
-<<search-search, search API>>. They are applied to all retrievers in a
-retriever tree unless a specific retriever overrides the `size` parameter
-using a different parameter such as `rank_window_size`. Though, the final
-search hits are always limited to `size`.
+<<search-search, search API>>.
+They are applied to all retrievers in a retriever tree unless a specific retriever overrides the `size` parameter using a different parameter such as `rank_window_size`.
+Though, the final search hits are always limited to `size`.
 
 ==== Using aggregations with a retriever tree
 
@@ -534,12 +691,12 @@ clauses in a <<query-dsl-bool-query, boolean query>>.
 
 ==== Restrictions on search parameters when specifying a retriever
 
-When a retriever is specified as part of a search the following elements are not allowed
-at the top-level and instead are only allowed as elements of specific retrievers:
+When a retriever is specified as part of a search the following elements are not allowed at the top-level and instead are only allowed as elements of specific retrievers:
 
 * <<request-body-search-query, `query`>>
 * <<search-api-knn, `knn`>>
 * <<search-after, `search_after`>>
 * <<request-body-search-terminate-after, `terminate_after`>>
 * <<search-sort-param, `sort`>>
 * <<rescore, `rescore`>>
+