Skip to content

Commit 7563a72

Browse files
authored
Updating retriever documentation to better explain how filters are applied (#112201)
1 parent b685a43 commit 7563a72

File tree

7 files changed

+53
-34
lines changed

7 files changed

+53
-34
lines changed

docs/reference/rest-api/common-parms.asciidoc

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1327,13 +1327,21 @@ that lower ranked documents have more influence. This value must be greater than
13271327
equal to `1`. Defaults to `60`.
13281328
end::rrf-rank-constant[]
13291329

1330-
tag::rrf-window-size[]
1331-
`window_size`::
1330+
tag::rrf-rank-window-size[]
1331+
`rank_window_size`::
13321332
(Optional, integer)
13331333
+
13341334
This value determines the size of the individual result sets per
13351335
query. A higher value will improve result relevance at the cost of performance. The final
13361336
ranked result set is pruned down to the search request's <<search-size-param, size>>.
1337-
`window_size` must be greater than or equal to `size` and greater than or equal to `1`.
1337+
`rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`.
13381338
Defaults to the `size` parameter.
1339-
end::rrf-window-size[]
1339+
end::rrf-rank-window-size[]
1340+
1341+
tag::rrf-filter[]
1342+
`filter`::
1343+
(Optional, <<query-dsl, query object or list of query objects>>)
1344+
+
1345+
Applies the specified <<query-dsl-bool-query, boolean query filter>> to all of the specified sub-retrievers,
1346+
according to each retriever's specifications.
1347+
end::rrf-filter[]

docs/reference/search/retriever.asciidoc

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ GET my-embeddings/_search
198198

199199
An <<rrf, RRF>> retriever returns top documents based on the RRF formula,
200200
equally weighting two or more child retrievers.
201-
Reciprocal rank fusion (RRF) is a method for combining multiple result
201+
Reciprocal rank fusion (RRF) is a method for combining multiple result
202202
sets with different relevance indicators into a single result set.
203203

204204
===== Parameters
@@ -207,7 +207,9 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-retrievers]
207207

208208
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-constant]
209209

210-
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-window-size]
210+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-window-size]
211+
212+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-filter]
211213

212214
===== Restrictions
213215

@@ -225,7 +227,7 @@ A simple hybrid search example (lexical search + dense vector search) combining
225227
----
226228
GET /restaurants/_search
227229
{
228-
"retriever": {
230+
"retriever": {
229231
"rrf": { <1>
230232
"retrievers": [ <2>
231233
{
@@ -340,6 +342,10 @@ Currently you can:
340342
** Refer to the <<text-similarity-reranker-retriever-example-eland,example>> on this page for a step-by-step guide.
341343

342344
===== Parameters
345+
`retriever`::
346+
(Required, <<retriever, retriever>>)
347+
+
348+
The child retriever that generates the initial set of top documents to be re-ranked.
343349

344350
`field`::
345351
(Required, `string`)
@@ -366,6 +372,13 @@ The number of top documents to consider in the re-ranking process. Defaults to `
366372
+
367373
Sets a minimum threshold score for including documents in the re-ranked results. Documents with similarity scores below this threshold will be excluded. Note that score calculations vary depending on the model used.
368374

375+
`filter`::
376+
(Optional, <<query-dsl, query object or list of query objects>>)
377+
+
378+
Applies the specified <<query-dsl-bool-query, boolean query filter>> to the child <<retriever, retriever>>.
379+
If the child retriever already specifies any filters, then this top-level filter is applied in conjuction
380+
with the filter defined in the child retriever.
381+
369382
===== Restrictions
370383

371384
A text similarity re-ranker retriever is a compound retriever. Child retrievers may not use elements that are restricted by having a compound retriever as part of the retriever tree.
@@ -441,13 +454,13 @@ eland_import_hub_model \
441454
+
442455
[source,js]
443456
----
444-
PUT _inference/rerank/my-msmarco-minilm-model
457+
PUT _inference/rerank/my-msmarco-minilm-model
445458
{
446459
"service": "elasticsearch",
447460
"service_settings": {
448461
"num_allocations": 1,
449462
"num_threads": 1,
450-
"model_id": "cross-encoder__ms-marco-minilm-l-6-v2"
463+
"model_id": "cross-encoder__ms-marco-minilm-l-6-v2"
451464
}
452465
}
453466
----

docs/reference/search/rrf.asciidoc

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,7 @@
11
[[rrf]]
22
=== Reciprocal rank fusion
33

4-
preview::["This functionality is in technical preview and may be changed or removed in a future release.
5-
The syntax will likely change before GA.
6-
Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features."]
4+
preview::["This functionality is in technical preview and may be changed or removed in a future release. The syntax will likely change before GA. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features."]
75

86
https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf[Reciprocal rank fusion (RRF)]
97
is a method for combining multiple result sets with different relevance indicators into a single result set.
@@ -43,7 +41,7 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-retrievers]
4341

4442
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-constant]
4543

46-
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-window-size]
44+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-window-size]
4745

4846
An example request using RRF:
4947

docs/reference/search/search-your-data/retrievers-reranking/retrievers-overview.asciidoc

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -13,23 +13,23 @@ For implementation details, including notable restrictions, check out the
1313

1414
[discrete]
1515
[[retrievers-overview-types]]
16-
==== Retriever types
16+
==== Retriever types
1717

1818
Retrievers come in various types, each tailored for different search operations.
1919
The following retrievers are currently available:
2020

21-
* <<standard-retriever,*Standard Retriever*>>. Returns top documents from a
22-
traditional https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl.html[query].
23-
Mimics a traditional query but in the context of a retriever framework. This
24-
ensures backward compatibility as existing `_search` requests remain supported.
25-
That way you can transition to the new abstraction at your own pace without
21+
* <<standard-retriever,*Standard Retriever*>>. Returns top documents from a
22+
traditional https://www.elastic.co/guide/en/elasticsearch/reference/master/query-dsl.html[query].
23+
Mimics a traditional query but in the context of a retriever framework. This
24+
ensures backward compatibility as existing `_search` requests remain supported.
25+
That way you can transition to the new abstraction at your own pace without
2626
mixing syntaxes.
27-
* <<knn-retriever,*kNN Retriever*>>. Returns top documents from a <<search-api-knn,knn search>>,
27+
* <<knn-retriever,*kNN Retriever*>>. Returns top documents from a <<search-api-knn,knn search>>,
2828
in the context of a retriever framework.
2929
* <<rrf-retriever,*RRF Retriever*>>. Combines and ranks multiple first-stage retrievers using
30-
the reciprocal rank fusion (RRF) algorithm. Allows you to combine multiple result sets
30+
the reciprocal rank fusion (RRF) algorithm. Allows you to combine multiple result sets
3131
with different relevance indicators into a single result set.
32-
An RRF retriever is a *compound retriever*, where its `filter` element is
32+
An RRF retriever is a *compound retriever*, where its `filter` element is
3333
propagated to its sub retrievers.
3434
+
3535
Sub retrievers may not use elements that are restricted by having a compound retriever as part of the retriever tree.
@@ -38,7 +38,7 @@ See the <<rrf-using-multiple-standard-retrievers,RRF documentation>> for detaile
3838
Requires first creating a `rerank` task using the <<put-inference-api,{es} Inference API>>.
3939

4040
[discrete]
41-
==== What makes retrievers useful?
41+
==== What makes retrievers useful?
4242

4343
Here's an overview of what makes retrievers useful and how they differ from regular queries.
4444

@@ -140,7 +140,7 @@ GET example-index/_search
140140
],
141141
"rank":{
142142
"rrf":{
143-
"window_size":50,
143+
"rank_window_size":50,
144144
"rank_constant":20
145145
}
146146
}
@@ -155,14 +155,14 @@ GET example-index/_search
155155

156156
Here are some important terms:
157157

158-
* *Retrieval Pipeline*. Defines the entire retrieval and ranking logic to
158+
* *Retrieval Pipeline*. Defines the entire retrieval and ranking logic to
159159
produce top hits.
160160
* *Retriever Tree*. A hierarchical structure that defines how retrievers interact.
161161
* *First-stage Retriever*. Returns an initial set of candidate documents.
162-
* *Compound Retriever*. Builds on one or more retrievers,
162+
* *Compound Retriever*. Builds on one or more retrievers,
163163
enhancing document retrieval and ranking logic.
164-
* *Combiners*. Compound retrievers that merge top hits
165-
from multiple sub-retrievers.
164+
* *Combiners*. Compound retrievers that merge top hits
165+
from multiple sub-retrievers.
166166
* *Rerankers*. Special compound retrievers that reorder hits and may adjust the number of hits, with distinctions between first-stage and second-stage rerankers.
167167

168168
[discrete]
@@ -180,4 +180,4 @@ Refer to the {kibana-ref}/playground.html[Playground documentation] for more inf
180180
[[retrievers-overview-api-reference]]
181181
==== API reference
182182

183-
For implementation details, including notable restrictions, check out the <<retriever,reference documentation>> in the Search API docs.
183+
For implementation details, including notable restrictions, check out the <<retriever,reference documentation>> in the Search API docs.

server/src/main/java/org/elasticsearch/action/search/RankFeaturePhase.java

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,8 @@
2828

2929
/**
3030
* This search phase is responsible for executing any re-ranking needed for the given search request, iff that is applicable.
31-
* It starts by retrieving {@code num_shards * window_size} results from the query phase and reduces them to a global list of
32-
* the top {@code window_size} results. It then reaches out to the shards to extract the needed feature data,
31+
* It starts by retrieving {@code num_shards * rank_window_size} results from the query phase and reduces them to a global list of
32+
* the top {@code rank_window_size} results. It then reaches out to the shards to extract the needed feature data,
3333
* and finally passes all this information to the appropriate {@code RankFeatureRankCoordinatorContext} which is responsible for reranking
3434
* the results. If no rank query is specified, it proceeds directly to the next phase (FetchSearchPhase) by first reducing the results.
3535
*/
@@ -88,7 +88,7 @@ public void onFailure(Exception e) {
8888

8989
void innerRun() throws Exception {
9090
// if the RankBuilder specifies a QueryPhaseCoordinatorContext, it will be called as part of the reduce call
91-
// to operate on the first `window_size * num_shards` results and merge them appropriately.
91+
// to operate on the first `rank_window_size * num_shards` results and merge them appropriately.
9292
SearchPhaseController.ReducedQueryPhase reducedQueryPhase = queryPhaseResults.reduce();
9393
RankFeaturePhaseRankCoordinatorContext rankFeaturePhaseRankCoordinatorContext = coordinatorContext(context.getRequest().source());
9494
if (rankFeaturePhaseRankCoordinatorContext != null) {

server/src/main/java/org/elasticsearch/search/rank/context/QueryPhaseRankCoordinatorContext.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
/**
1818
* {@link QueryPhaseRankCoordinatorContext} is running on the coordinator node and is
1919
* responsible for combining the query phase results from the shards and rank them accordingly.
20-
* The output is a `window_size` ranked list of ordered results from all shards.
20+
* The output is a `rank_window_size` ranked list of ordered results from all shards.
2121
* Note: Currently this can use only sort by score; sort by field is not supported.
2222
*/
2323
public abstract class QueryPhaseRankCoordinatorContext {

server/src/main/java/org/elasticsearch/search/rank/context/QueryPhaseRankShardContext.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
import java.util.List;
1616

1717
/**
18-
* {@link QueryPhaseRankShardContext} is used to generate the top {@code window_size}
18+
* {@link QueryPhaseRankShardContext} is used to generate the top {@code rank_window_size}
1919
* results on each shard. It specifies the queries to run during {@code QueryPhase} and is responsible for combining all query scores and
2020
* order all results through the {@link QueryPhaseRankShardContext#combineQueryPhaseResults} method.
2121
*/

0 commit comments

Comments
 (0)