-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add documentation for query rules retriever #115696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
9c4ac53
2fbaeca
b539fc5
7a6e979
57c5cb7
08c5754
d2f11ef
4f8637f
e33afec
09f743a
23ef92f
140cbd6
653062e
7a9d83e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||
---|---|---|---|---|---|---|---|---|
@@ -1,14 +1,12 @@ | ||||||||
[[retriever]] | ||||||||
=== Retriever | ||||||||
|
||||||||
A retriever is a specification to describe top documents returned from a | ||||||||
search. A retriever replaces other elements of the <<search-search, search API>> | ||||||||
A retriever is a specification to describe top documents returned from a search. | ||||||||
A retriever replaces other elements of the <<search-search, search API>> | ||||||||
that also return top documents such as <<query-dsl, `query`>> and | ||||||||
<<search-api-knn, `knn`>>. A retriever may have child retrievers where a | ||||||||
retriever with two or more children is considered a compound retriever. This | ||||||||
allows for complex behavior to be depicted in a tree-like structure, called | ||||||||
the retriever tree, to better clarify the order of operations that occur | ||||||||
during a search. | ||||||||
<<search-api-knn, `knn`>>. | ||||||||
A retriever may have child retrievers where a retriever with two or more children is considered a compound retriever. | ||||||||
This allows for complex behavior to be depicted in a tree-like structure, called the retriever tree, to better clarify the order of operations that occur during a search. | ||||||||
|
||||||||
[TIP] | ||||||||
==== | ||||||||
|
@@ -29,6 +27,9 @@ A <<rrf-retriever, retriever>> that produces top documents from <<rrf, reciproca | |||||||
`text_similarity_reranker`:: | ||||||||
A <<text-similarity-reranker-retriever, retriever>> that enhances search results by re-ranking documents based on semantic similarity to a specified inference text, using a machine learning model. | ||||||||
|
||||||||
`rule`:: | ||||||||
A <<rule-retriever, retriever>> that applies contextual <<query-rules>> to pin or exclude documents for specific queries. | ||||||||
|
||||||||
[[standard-retriever]] | ||||||||
==== Standard Retriever | ||||||||
|
||||||||
|
@@ -44,8 +45,7 @@ Defines a query to retrieve a set of top documents. | |||||||
`filter`:: | ||||||||
(Optional, <<query-dsl, query object or list of query objects>>) | ||||||||
+ | ||||||||
Applies a <<query-dsl-bool-query, boolean query filter>> to this retriever | ||||||||
where all documents must match this query but do not contribute to the score. | ||||||||
Applies a <<query-dsl-bool-query, boolean query filter>> to this retriever where all documents must match this query but do not contribute to the score. | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
|
||||||||
`search_after`:: | ||||||||
(Optional, <<search-after, search after object>>) | ||||||||
|
@@ -56,14 +56,13 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=terminate_after] | |||||||
|
||||||||
`sort`:: | ||||||||
+ | ||||||||
(Optional, <<sort-search-results, sort object>>) | ||||||||
A sort object that that specifies the order of matching documents. | ||||||||
(Optional, <<sort-search-results, sort object>>) A sort object that that specifies the order of matching documents. | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
|
||||||||
`min_score`:: | ||||||||
(Optional, `float`) | ||||||||
+ | ||||||||
Minimum <<relevance-scores, `_score`>> for matching documents. Documents with a | ||||||||
lower `_score` are not included in the top documents. | ||||||||
Minimum <<relevance-scores, `_score`>> for matching documents. | ||||||||
Documents with a lower `_score` are not included in the top documents. | ||||||||
|
||||||||
`collapse`:: | ||||||||
(Optional, <<collapse-search-results, collapse object>>) | ||||||||
|
@@ -72,8 +71,7 @@ Collapses the top documents by a specified key into a single top document per ke | |||||||
|
||||||||
===== Restrictions | ||||||||
|
||||||||
When a retriever tree contains a compound retriever (a retriever with two or more child | ||||||||
retrievers) the <<search-after, search after>> parameter is not supported. | ||||||||
When a retriever tree contains a compound retriever (a retriever with two or more child retrievers) the <<search-after, search after>> parameter is not supported. | ||||||||
|
||||||||
[discrete] | ||||||||
[[standard-retriever-example]] | ||||||||
|
@@ -105,12 +103,39 @@ POST /restaurants/_bulk?refresh | |||||||
{"region": "Austria", "year": "2020", "vector": [10, 22, 79]} | ||||||||
{"index":{}} | ||||||||
{"region": "France", "year": "2020", "vector": [10, 22, 80]} | ||||||||
|
||||||||
PUT /movies | ||||||||
|
||||||||
PUT _query_rules/my-ruleset | ||||||||
{ | ||||||||
"rules": [ | ||||||||
{ | ||||||||
"rule_id": "my-rule1", | ||||||||
"type": "pinned", | ||||||||
"criteria": [ | ||||||||
{ | ||||||||
"type": "exact", | ||||||||
"metadata": "query_string", | ||||||||
"values": [ "pugs" ] | ||||||||
} | ||||||||
], | ||||||||
"actions": { | ||||||||
"ids": [ | ||||||||
"id1" | ||||||||
] | ||||||||
} | ||||||||
} | ||||||||
] | ||||||||
} | ||||||||
|
||||||||
---- | ||||||||
// TESTSETUP | ||||||||
|
||||||||
[source,console] | ||||||||
-------------------------------------------------- | ||||||||
DELETE /restaurants | ||||||||
|
||||||||
DELETE /movies | ||||||||
-------------------------------------------------- | ||||||||
// TEARDOWN | ||||||||
//// | ||||||||
|
@@ -143,11 +168,13 @@ GET /restaurants/_search | |||||||
} | ||||||||
} | ||||||||
---- | ||||||||
|
||||||||
<1> Opens the `retriever` object. | ||||||||
<2> The `standard` retriever is used for defining traditional {es} queries. | ||||||||
<3> The entry point for defining the search query. | ||||||||
<4> The `bool` object allows for combining multiple query clauses logically. | ||||||||
<5> The `should` array indicates conditions under which a document will match. Documents matching these conditions will increase their relevancy score. | ||||||||
<5> The `should` array indicates conditions under which a document will match. | ||||||||
Documents matching these conditions will increase their relevancy score. | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
<6> The `match` object finds documents where the `region` field contains the word "Austria." | ||||||||
<7> The `filter` array provides filtering conditions that must be met but do not contribute to the relevancy score. | ||||||||
<8> The `term` object is used for exact matches, in this case, filtering documents by the `year` field. | ||||||||
|
@@ -178,8 +205,8 @@ Defines a <<knn-semantic-search, model>> to build a query vector. | |||||||
`k`:: | ||||||||
(Required, integer) | ||||||||
+ | ||||||||
Number of nearest neighbors to return as top hits. This value must be fewer than | ||||||||
or equal to `num_candidates`. | ||||||||
Number of nearest neighbors to return as top hits. | ||||||||
This value must be fewer than or equal to `num_candidates`. | ||||||||
|
||||||||
`num_candidates`:: | ||||||||
(Required, integer) | ||||||||
|
@@ -222,16 +249,15 @@ GET /restaurants/_search | |||||||
<1> Configuration for k-nearest neighbor (knn) search, which is based on vector similarity. | ||||||||
<2> Specifies the field name that contains the vectors. | ||||||||
<3> The query vector against which document vectors are compared in the `knn` search. | ||||||||
<4> The number of nearest neighbors to return as top hits. This value must be fewer than or equal to `num_candidates`. | ||||||||
<4> The number of nearest neighbors to return as top hits. | ||||||||
This value must be fewer than or equal to `num_candidates`. | ||||||||
<5> The size of the initial candidate set from which the final `k` nearest neighbors are selected. | ||||||||
|
||||||||
[[rrf-retriever]] | ||||||||
==== RRF Retriever | ||||||||
|
||||||||
An <<rrf, RRF>> retriever returns top documents based on the RRF formula, | ||||||||
equally weighting two or more child retrievers. | ||||||||
Reciprocal rank fusion (RRF) is a method for combining multiple result | ||||||||
sets with different relevance indicators into a single result set. | ||||||||
An <<rrf, RRF>> retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers. | ||||||||
Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set. | ||||||||
|
||||||||
===== Parameters | ||||||||
|
||||||||
|
@@ -357,7 +383,8 @@ Refer to <<semantic-reranking>> for a high level overview of semantic re-ranking | |||||||
===== Prerequisites | ||||||||
|
||||||||
To use `text_similarity_reranker` you must first set up a `rerank` task using the <<put-inference-api, Create {infer} API>>. | ||||||||
The `rerank` task should be set up with a machine learning model that can compute text similarity. Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third-party text similarity models supported by {es}. | ||||||||
The `rerank` task should be set up with a machine learning model that can compute text similarity. | ||||||||
Refer to {ml-docs}/ml-nlp-model-ref.html#ml-nlp-model-ref-text-similarity[the Elastic NLP model reference] for a list of third-party text similarity models supported by {es}. | ||||||||
|
||||||||
Currently you can: | ||||||||
|
||||||||
|
@@ -368,6 +395,7 @@ Currently you can: | |||||||
** Refer to the <<text-similarity-reranker-retriever-example-eland,example>> on this page for a step-by-step guide. | ||||||||
|
||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the other suggestion is the important one, so committed that first to see 👁️ |
||||||||
===== Parameters | ||||||||
|
||||||||
`retriever`:: | ||||||||
(Required, <<retriever, retriever>>) | ||||||||
+ | ||||||||
|
@@ -376,7 +404,8 @@ The child retriever that generates the initial set of top documents to be re-ran | |||||||
`field`:: | ||||||||
(Required, `string`) | ||||||||
+ | ||||||||
The document field to be used for text similarity comparisons. This field should contain the text that will be evaluated against the `inferenceText`. | ||||||||
The document field to be used for text similarity comparisons. | ||||||||
This field should contain the text that will be evaluated against the `inferenceText`. | ||||||||
|
||||||||
`inference_id`:: | ||||||||
(Required, `string`) | ||||||||
|
@@ -391,25 +420,28 @@ The text snippet used as the basis for similarity comparison. | |||||||
`rank_window_size`:: | ||||||||
(Optional, `int`) | ||||||||
+ | ||||||||
The number of top documents to consider in the re-ranking process. Defaults to `10`. | ||||||||
The number of top documents to consider in the re-ranking process. | ||||||||
Defaults to `10`. | ||||||||
|
||||||||
`min_score`:: | ||||||||
(Optional, `float`) | ||||||||
+ | ||||||||
Sets a minimum threshold score for including documents in the re-ranked results. Documents with similarity scores below this threshold will be excluded. Note that score calculations vary depending on the model used. | ||||||||
Sets a minimum threshold score for including documents in the re-ranked results. | ||||||||
Documents with similarity scores below this threshold will be excluded. | ||||||||
Note that score calculations vary depending on the model used. | ||||||||
|
||||||||
`filter`:: | ||||||||
(Optional, <<query-dsl, query object or list of query objects>>) | ||||||||
+ | ||||||||
Applies the specified <<query-dsl-bool-query, boolean query filter>> to the child <<retriever, retriever>>. | ||||||||
If the child retriever already specifies any filters, then this top-level filter is applied in conjuction | ||||||||
with the filter defined in the child retriever. | ||||||||
If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever. | ||||||||
|
||||||||
[discrete] | ||||||||
[[text-similarity-reranker-retriever-example-cohere]] | ||||||||
==== Example: Cohere Rerank | ||||||||
|
||||||||
This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API. This approach eliminate the need to generate and store embeddings for all indexed documents. | ||||||||
This example enables out-of-the-box semantic search by re-ranking top documents using the Cohere Rerank API. | ||||||||
This approach eliminate the need to generate and store embeddings for all indexed documents. | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
This requires a <<infer-service-cohere,Cohere Rerank inference endpoint>> using the `rerank` task type. | ||||||||
|
||||||||
[source,console] | ||||||||
|
@@ -459,7 +491,9 @@ Follow these steps to load the model and create a semantic re-ranker. | |||||||
python -m pip install eland[pytorch] | ||||||||
---- | ||||||||
+ | ||||||||
. Upload the model to {es} using Eland. This example assumes you have an Elastic Cloud deployment and an API key. Refer to the https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-auth[Eland documentation] for more authentication options. | ||||||||
. Upload the model to {es} using Eland. | ||||||||
This example assumes you have an Elastic Cloud deployment and an API key. | ||||||||
Refer to the https://www.elastic.co/guide/en/elasticsearch/client/eland/current/machine-learning.html#ml-nlp-pytorch-auth[Eland documentation] for more authentication options. | ||||||||
+ | ||||||||
[source,sh] | ||||||||
---- | ||||||||
|
@@ -517,14 +551,137 @@ POST movies/_search | |||||||
This retriever uses a standard `match` query to search the `movie` index for films tagged with the genre "drama". | ||||||||
It then re-ranks the results based on semantic similarity to the text in the `inference_text` parameter, using the model we uploaded to {es}. | ||||||||
|
||||||||
[[rule-retriever]] | ||||||||
kderusso marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
==== Query Rules Retriever | ||||||||
|
||||||||
The `rule` retriever offers users fine-grained control over the search results by applying contextual <<query-rules>> to pin or exclude documents for specific queries. | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
This retriever performs similar functionality to the <<query-dsl-rule-query>> but works out of the box with other retrievers including reranking retrievers like <<text-similarity-reranker-retriever, text_similarity_reranker>> and <<rrf-retriever, rrf>>. | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
|
||||||||
===== Prerequisites | ||||||||
|
||||||||
To use the `rule` retriever you must first create one or more query rulesets using the <<query-rules-apis, query rules management APIs>>. | ||||||||
|
||||||||
===== Parameters | ||||||||
leemthompo marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||
|
||||||||
`retriever`:: | ||||||||
(Required, <<retriever, retriever>>) | ||||||||
+ | ||||||||
The child retriever that returns the results we will apply query rules on top of. | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
This can be a standalone retriever such as the <<standard-retriever, standard>> or <<knn-retriever, knn>> retriever, or it can be a compound retriever. | ||||||||
|
||||||||
`ruleset_ids`:: | ||||||||
(Required, `array`) | ||||||||
+ | ||||||||
An array of one or more unique <<query-rules-apis, query ruleset>> ID with query-based rules to match and apply as applicable. | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
Rulesets and their associated rules are evaluated in the order in which they are specified in the query and ruleset. | ||||||||
The maximum number of rulesets to specify is 10. | ||||||||
|
||||||||
`match_criteria`:: | ||||||||
(Required, `object`) | ||||||||
+ | ||||||||
Defines the match criteria to apply to rules in the given query ruleset. | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
Match criteria should match the keys defined in the `criteria.metadata` field of the rule. | ||||||||
|
||||||||
`rank_window_size`:: | ||||||||
(Optional, `int`) | ||||||||
+ | ||||||||
The number of top documents to return from the `rule` retriever. | ||||||||
Defaults to `10`. | ||||||||
|
||||||||
[discrete] | ||||||||
[[rule-retriever-example]] | ||||||||
==== Example: Rule retriever | ||||||||
|
||||||||
This example shows the rule retriever executed without any additional retrievers. | ||||||||
It runs the query defined by the `retriever` and applies the rules from `my-ruleset` on top of the returned results. | ||||||||
|
||||||||
[source,console] | ||||||||
---- | ||||||||
GET movies/_search | ||||||||
{ | ||||||||
"retriever": { | ||||||||
"rule": { | ||||||||
"match_criteria": { | ||||||||
"query_string": "harry potter" | ||||||||
}, | ||||||||
"ruleset_ids": [ | ||||||||
"my-ruleset" | ||||||||
], | ||||||||
"retriever": { | ||||||||
"standard": { | ||||||||
"query": { | ||||||||
"query_string": { | ||||||||
"query": "harry potter" | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
---- | ||||||||
|
||||||||
[discrete] | ||||||||
[[rule-retriever-example-rrf]] | ||||||||
==== Example: Rule retriever combined with RRF | ||||||||
|
||||||||
This example shows how to combine the `rule` retriever with other rerank retrievers such as <<rrf-retriever, rrf>> or <<text-similarity-reranker-retriever, text_similarity_reranker>>. | ||||||||
|
||||||||
[WARNING] | ||||||||
==== | ||||||||
The `rule` retriever will apply rules to any documents returned from its defined `retriever` or any of its sub-retrievers. | ||||||||
This means that for the best results, the `rule` retriever should be the outermost defined retriever. | ||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps we could hammer this home with an annotation on the example? |
||||||||
Nesting a `rule` retriever as a sub-retriever under a reranker such as `rrf` or `text_similarity_reranker` may not produce the expected results. | ||||||||
==== | ||||||||
|
||||||||
[source,console] | ||||||||
---- | ||||||||
GET movies/_search | ||||||||
{ | ||||||||
"retriever": { | ||||||||
"rule": { | ||||||||
"match_criteria": { | ||||||||
"query_string": "harry potter" | ||||||||
}, | ||||||||
"ruleset_ids": [ | ||||||||
"my-ruleset" | ||||||||
], | ||||||||
"retriever": { | ||||||||
"rrf": { | ||||||||
"retrievers": [ | ||||||||
{ | ||||||||
"standard": { | ||||||||
"query": { | ||||||||
"query_string": { | ||||||||
"query": "sorcerer's stone" | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
}, | ||||||||
{ | ||||||||
"standard": { | ||||||||
"query": { | ||||||||
"query_string": { | ||||||||
"query": "chamber of secrets" | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
] | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
} | ||||||||
---- | ||||||||
|
||||||||
==== Using `from` and `size` with a retriever tree | ||||||||
|
||||||||
The <<search-from-param, `from`>> and <<search-size-param, `size`>> | ||||||||
parameters are provided globally as part of the general | ||||||||
<<search-search, search API>>. They are applied to all retrievers in a | ||||||||
retriever tree unless a specific retriever overrides the `size` parameter | ||||||||
using a different parameter such as `rank_window_size`. Though, the final | ||||||||
search hits are always limited to `size`. | ||||||||
<<search-search, search API>>. | ||||||||
They are applied to all retrievers in a retriever tree unless a specific retriever overrides the `size` parameter using a different parameter such as `rank_window_size`. | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
Though, the final search hits are always limited to `size`. | ||||||||
|
||||||||
==== Using aggregations with a retriever tree | ||||||||
|
||||||||
|
@@ -534,12 +691,12 @@ clauses in a <<query-dsl-bool-query, boolean query>>. | |||||||
|
||||||||
==== Restrictions on search parameters when specifying a retriever | ||||||||
|
||||||||
When a retriever is specified as part of a search the following elements are not allowed | ||||||||
at the top-level and instead are only allowed as elements of specific retrievers: | ||||||||
When a retriever is specified as part of a search the following elements are not allowed at the top-level and instead are only allowed as elements of specific retrievers: | ||||||||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||
|
||||||||
* <<request-body-search-query, `query`>> | ||||||||
* <<search-api-knn, `knn`>> | ||||||||
* <<search-after, `search_after`>> | ||||||||
* <<request-body-search-terminate-after, `terminate_after`>> | ||||||||
* <<search-sort-param, `sort`>> | ||||||||
* <<rescore, `rescore`>> | ||||||||
|
Uh oh!
There was an error while loading. Please reload this page.