Skip to content

Commit 9c4ac53

Browse files
committed
Add initial query rules retriever docs
1 parent 5e98251 commit 9c4ac53

File tree

2 files changed

+197
-64
lines changed

2 files changed

+197
-64
lines changed

docs/reference/search/retriever.asciidoc

Lines changed: 167 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,12 @@
11
[[retriever]]
22
=== Retriever
33

4-
A retriever is a specification to describe top documents returned from a
5-
search. A retriever replaces other elements of the <<search-search, search API>>
4+
A retriever is a specification to describe top documents returned from a search.
5+
A retriever replaces other elements of the <<search-search, search API>>
66
that also return top documents such as <<query-dsl, `query`>> and
7-
<<search-api-knn, `knn`>>. A retriever may have child retrievers where a
8-
retriever with two or more children is considered a compound retriever. This
9-
allows for complex behavior to be depicted in a tree-like structure, called
10-
the retriever tree, to better clarify the order of operations that occur
11-
during a search.
7+
<<search-api-knn, `knn`>>.
8+
A retriever may have child retrievers where a retriever with two or more children is considered a compound retriever.
9+
This allows for complex behavior to be depicted in a tree-like structure, called the retriever tree, to better clarify the order of operations that occur during a search.
1210

1311
[TIP]
1412
====
@@ -29,6 +27,9 @@ A <<rrf-retriever, retriever>> that produces top documents from <<rrf, reciproca
2927
`text_similarity_reranker`::
3028
A <<text-similarity-reranker-retriever, retriever>> that enhances search results by re-ranking documents based on semantic similarity to a specified inference text, using a machine learning model.
3129

30+
`rule`::
31+
A <<rule-retriever, retriever>> that applies contextual <<query-rules>> to pin or exclude documents for specific queries.
32+
3233
[[standard-retriever]]
3334
==== Standard Retriever
3435

@@ -44,8 +45,7 @@ Defines a query to retrieve a set of top documents.
4445
`filter`::
4546
(Optional, <<query-dsl, query object or list of query objects>>)
4647
+
47-
Applies a <<query-dsl-bool-query, boolean query filter>> to this retriever
48-
where all documents must match this query but do not contribute to the score.
48+
Applies a <<query-dsl-bool-query, boolean query filter>> to this retriever where all documents must match this query but do not contribute to the score.
4949

5050
`search_after`::
5151
(Optional, <<search-after, search after object>>)
@@ -56,14 +56,13 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=terminate_after]
5656

5757
`sort`::
5858
+
59-
(Optional, <<sort-search-results, sort object>>)
60-
A sort object that that specifies the order of matching documents.
59+
(Optional, <<sort-search-results, sort object>>) A sort object that that specifies the order of matching documents.
6160

6261
`min_score`::
6362
(Optional, `float`)
6463
+
65-
Minimum <<relevance-scores, `_score`>> for matching documents. Documents with a
66-
lower `_score` are not included in the top documents.
64+
Minimum <<relevance-scores, `_score`>> for matching documents.
65+
Documents with a lower `_score` are not included in the top documents.
6766

6867
`collapse`::
6968
(Optional, <<collapse-search-results, collapse object>>)
@@ -72,8 +71,7 @@ Collapses the top documents by a specified key into a single top document per ke
7271

7372
===== Restrictions
7473

75-
When a retriever tree contains a compound retriever (a retriever with two or more child
76-
retrievers) the <<search-after, search after>> parameter is not supported.
74+
When a retriever tree contains a compound retriever (a retriever with two or more child retrievers) the <<search-after, search after>> parameter is not supported.
7775

7876
[discrete]
7977
[[standard-retriever-example]]
@@ -143,11 +141,13 @@ GET /restaurants/_search
143141
}
144142
}
145143
----
144+
146145
<1> Opens the `retriever` object.
147146
<2> The `standard` retriever is used for defining traditional {es} queries.
148147
<3> The entry point for defining the search query.
149148
<4> The `bool` object allows for combining multiple query clauses logically.
150-
<5> The `should` array indicates conditions under which a document will match. Documents matching these conditions will increase their relevancy score.
149+
<5> The `should` array indicates conditions under which a document will match.
150+
Documents matching these conditions will increase their relevancy score.
151151
<6> The `match` object finds documents where the `region` field contains the word "Austria."
152152
<7> The `filter` array provides filtering conditions that must be met but do not contribute to the relevancy score.
153153
<8> The `term` object is used for exact matches, in this case, filtering documents by the `year` field.
@@ -178,8 +178,8 @@ Defines a <<knn-semantic-search, model>> to build a query vector.
178178
`k`::
179179
(Required, integer)
180180
+
181-
Number of nearest neighbors to return as top hits. This value must be fewer than
182-
or equal to `num_candidates`.
181+
Number of nearest neighbors to return as top hits.
182+
This value must be fewer than or equal to `num_candidates`.
183183

184184
`num_candidates`::
185185
(Required, integer)
@@ -222,16 +222,15 @@ GET /restaurants/_search
222222
<1> Configuration for k-nearest neighbor (knn) search, which is based on vector similarity.
223223
<2> Specifies the field name that contains the vectors.
224224
<3> The query vector against which document vectors are compared in the `knn` search.
225-
<4> The number of nearest neighbors to return as top hits. This value must be fewer than or equal to `num_candidates`.
225+
<4> The number of nearest neighbors to return as top hits.
226+
This value must be fewer than or equal to `num_candidates`.
226227
<5> The size of the initial candidate set from which the final `k` nearest neighbors are selected.
227228

228229
[[rrf-retriever]]
229230
==== RRF Retriever
230231

231-
An <<rrf, RRF>> retriever returns top documents based on the RRF formula,
232-
equally weighting two or more child retrievers.
233-
Reciprocal rank fusion (RRF) is a method for combining multiple result
234-
sets with different relevance indicators into a single result set.
232+
An <<rrf, RRF>> retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers.
233+
Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set.
235234

236235
===== Parameters
237236

@@ -357,7 +356,8 @@ Refer to <<semantic-reranking>> for a high level overview of semantic re-ranking
357356
===== Prerequisites
358357

359358
To use `text_similarity_reranker` you must first set up a `rerank` task using the <<put-inference-api, Create {infer} API>>.
360-
The `rerank` task should be set up with a machine learning model that can compute text similarity. Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third-party text similarity models supported by {es}.
359+
The `rerank` task should be set up with a machine learning model that can compute text similarity.
360+
Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third-party text similarity models supported by {es}.
361361

362362
Currently you can:
363363

@@ -368,6 +368,7 @@ Currently you can:
368368
** Refer to the <<text-similarity-reranker-retriever-example-eland,example>> on this page for a step-by-step guide.
369369

370370
===== Parameters
371+
371372
`retriever`::
372373
(Required, <<retriever, retriever>>)
373374
+
@@ -376,7 +377,8 @@ The child retriever that generates the initial set of top documents to be re-ran
376377
`field`::
377378
(Required, `string`)
378379
+
379-
The document field to be used for text similarity comparisons. This field should contain the text that will be evaluated against the `inferenceText`.
380+
The document field to be used for text similarity comparisons.
381+
This field should contain the text that will be evaluated against the `inferenceText`.
380382

381383
`inference_id`::
382384
(Required, `string`)
@@ -391,25 +393,28 @@ The text snippet used as the basis for similarity comparison.
391393
`rank_window_size`::
392394
(Optional, `int`)
393395
+
394-
The number of top documents to consider in the re-ranking process. Defaults to `10`.
396+
The number of top documents to consider in the re-ranking process.
397+
Defaults to `10`.
395398

396399
`min_score`::
397400
(Optional, `float`)
398401
+
399-
Sets a minimum threshold score for including documents in the re-ranked results. Documents with similarity scores below this threshold will be excluded. Note that score calculations vary depending on the model used.
402+
Sets a minimum threshold score for including documents in the re-ranked results.
403+
Documents with similarity scores below this threshold will be excluded.
404+
Note that score calculations vary depending on the model used.
400405

401406
`filter`::
402407
(Optional, <<query-dsl, query object or list of query objects>>)
403408
+
404409
Applies the specified <<query-dsl-bool-query, boolean query filter>> to the child <<retriever, retriever>>.
405-
If the child retriever already specifies any filters, then this top-level filter is applied in conjuction
406-
with the filter defined in the child retriever.
410+
If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever.
407411

408412
[discrete]
409413
[[text-similarity-reranker-retriever-example-cohere]]
410414
==== Example: Cohere Rerank
411415

412-
This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API. This approach eliminate the need to generate and store embeddings for all indexed documents.
416+
This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API.
417+
This approach eliminate the need to generate and store embeddings for all indexed documents.
413418
This requires a <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type.
414419

415420
[source,console]
@@ -459,7 +464,9 @@ Follow these steps to load the model and create a semantic re-ranker.
459464
python -m pip install eland[pytorch]
460465
----
461466
+
462-
. Upload the model to {es} using Eland. This example assumes you have an Elastic Cloud deployment and an API key. Refer to the https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-auth[Eland documentation] for more authentication options.
467+
. Upload the model to {es} using Eland.
468+
This example assumes you have an Elastic Cloud deployment and an API key.
469+
Refer to the https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-auth[Eland documentation] for more authentication options.
463470
+
464471
[source,sh]
465472
----
@@ -517,14 +524,137 @@ POST movies/_search
517524
This retriever uses a standard `match` query to search the `movie` index for films tagged with the genre "drama".
518525
It then re-ranks the results based on semantic similarity to the text in the `inference_text` parameter, using the model we uploaded to {es}.
519526

527+
[[rule-retriever]]
528+
==== Query Rules Retriever
529+
530+
The `rule` retriever offers users fine-grained control over the search results by applying contextual <<query-rules>> to pin or exclude documents for specific queries.
531+
This retriever performs similar functionality to the <<query-dsl-rule-query>> but works out of the box with other retrievers including reranking retrievers like <<text-similarity-reranker-retriever, text_similarity_reranker>> and <<rrf-retriever, rrf>>.
532+
533+
===== Prerequisites
534+
535+
To use the `rule` retriever you must first create one or more query rulesets using the <<query-rules-apis, query rules management APIs>>.
536+
537+
===== Parameters
538+
539+
`retriever`::
540+
(Required, <<retriever, retriever>>)
541+
+
542+
The child retriever that returns the results we will apply query rules on top of.
543+
This can be a standalone retriever such as the <<standard-retriever, standard>> or <<knn-retriever, knn>> retriever, or it can be a compound retriever.
544+
545+
`ruleset_ids`::
546+
(Required, `array`)
547+
+
548+
An array of one or more unique <<query-rules-apis, query ruleset>> ID with query-based rules to match and apply as applicable.
549+
Rulesets and their associated rules are evaluated in the order in which they are specified in the query and ruleset.
550+
The maximum number of rulesets to specify is 10.
551+
552+
`match_criteria`::
553+
(Required, `object`)
554+
+
555+
Defines the match criteria to apply to rules in the given query ruleset.
556+
Match criteria should match the keys defined in the `criteria.metadata` field of the rule.
557+
558+
`rank_window_size`::
559+
(Optional, `int`)
560+
+
561+
The number of top documents to return from the `rule` retriever.
562+
Defaults to `10`.
563+
564+
[discrete]
565+
[[rule-retriever-example]]
566+
==== Example: Rule retriever
567+
568+
This example shows the rule retriever executed without any additional retrievers.
569+
It runs the query defined by the `retriever` and applies the rules from `my-ruleset` on top of the returned results.
570+
571+
[source,console]
572+
----
573+
GET my-index/_search
574+
{
575+
"retriever": {
576+
"rule": {
577+
"match_criteria": {
578+
"query_string": "pugs"
579+
},
580+
"ruleset_ids": [
581+
"my-ruleset"
582+
],
583+
"retriever": {
584+
"standard": {
585+
"query": {
586+
"query_string": {
587+
"query": "pugs"
588+
}
589+
}
590+
}
591+
}
592+
}
593+
}
594+
}
595+
----
596+
597+
[discrete]
598+
[[rule-retriever-example-rrf]]
599+
==== Example: Rule retriever combined with RRF
600+
601+
This example shows how to combine the `rule` retriever with other rerank retrievers such as <<rrf-retriever, rrf>> or <<text-similarity-reranker-retriever, text_similarity_reranker>>.
602+
603+
[WARNING]
604+
====
605+
The `rule` retriever will apply rules to any documents returned from its defined `retriever` or any of its sub-retrievers.
606+
This means that for the best results, the `rule` retriever should be the outermost defined retriever.
607+
Nesting a `rule` retriever as a sub-retriever under a reranker such as `rrf` or `text_similarity_reranker` may not produce the expected results.
608+
====
609+
610+
[source,console]
611+
----
612+
GET my-index/_search
613+
{
614+
"retriever": {
615+
"rule": {
616+
"match_criteria": {
617+
"query_string": "pugs"
618+
},
619+
"ruleset_ids": [
620+
"my-ruleset"
621+
],
622+
"retriever": {
623+
"rrf": {
624+
"retrievers": [
625+
{
626+
"standard": {
627+
"query": {
628+
"query_string": {
629+
"query": "beagles"
630+
}
631+
}
632+
}
633+
},
634+
{
635+
"standard": {
636+
"query": {
637+
"query_string": {
638+
"query": "chihuahuas"
639+
}
640+
}
641+
}
642+
}
643+
]
644+
}
645+
}
646+
}
647+
}
648+
}
649+
----
650+
520651
==== Using `from` and `size` with a retriever tree
521652

522653
The <<search-from-param, `from`>> and <<search-size-param, `size`>>
523654
parameters are provided globally as part of the general
524-
<<search-search, search API>>. They are applied to all retrievers in a
525-
retriever tree unless a specific retriever overrides the `size` parameter
526-
using a different parameter such as `rank_window_size`. Though, the final
527-
search hits are always limited to `size`.
655+
<<search-search, search API>>.
656+
They are applied to all retrievers in a retriever tree unless a specific retriever overrides the `size` parameter using a different parameter such as `rank_window_size`.
657+
Though, the final search hits are always limited to `size`.
528658

529659
==== Using aggregations with a retriever tree
530660

@@ -534,12 +664,12 @@ clauses in a <<query-dsl-bool-query, boolean query>>.
534664

535665
==== Restrictions on search parameters when specifying a retriever
536666

537-
When a retriever is specified as part of a search the following elements are not allowed
538-
at the top-level and instead are only allowed as elements of specific retrievers:
667+
When a retriever is specified as part of a search the following elements are not allowed at the top-level and instead are only allowed as elements of specific retrievers:
539668

540669
* <<request-body-search-query, `query`>>
541670
* <<search-api-knn, `knn`>>
542671
* <<search-after, `search_after`>>
543672
* <<request-body-search-terminate-after, `terminate_after`>>
544673
* <<search-sort-param, `sort`>>
545674
* <<rescore, `rescore`>>
675+

0 commit comments

Comments
 (0)