Skip to content

Commit 39b1b40

Browse files
authored
Fix to properly account min_score when having multiple nested retrievers (elastic#142212) (elastic#142365)
1 parent 9f87b5b commit 39b1b40

File tree

6 files changed

+176
-6
lines changed

6 files changed

+176
-6
lines changed

docs/reference/elasticsearch/rest-apis/retrievers.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,16 @@ The [`from`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operati
5757
[Aggregations](/reference/aggregations/index.md) are globally specified as part of a search request. The query used for an aggregation is the combination of all leaf retrievers as `should` clauses in a [boolean query](/reference/query-languages/query-dsl/query-dsl-bool-query.md).
5858

5959

60+
### Using `min_score` with compound retrievers [retriever-min-score-compound]
61+
62+
When using `min_score` with compound retrievers (such as [`rrf`](retrievers/rrf-retriever.md) or [`linear`](retrievers/linear-retriever.md)), documents are filtered **after** the compound scoring has been applied. This is important to understand because:
63+
64+
1. **Document collection**: Each child retriever collects documents up to its `rank_window_size` limit.
65+
2. **Score computation**: The compound retriever computes final scores (e.g., RRF ranking, linear combination with normalization).
66+
3. **Threshold filtering**: Documents with a final score below `min_score` are excluded from the results.
67+
68+
Because `min_score` is applied after score normalization or RRF computation, the `total_hits` value reflects only the documents that pass the threshold after the compound scoring, not the total number of documents matched by the child retrievers.
69+
6070
### Restrictions on search parameters when specifying a retriever [retriever-restrictions]
6171

6272
When a retriever is specified as part of a search, the following elements are not allowed at the top-level:

docs/reference/elasticsearch/rest-apis/retrievers/retrievers-examples.md

Lines changed: 104 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ This returns the following response based on the final rrf score for each result
199199
::::
200200

201201

202-
### Using the expanded format with weights
202+
### Using the expanded format with weights
203203
```{applies_to}
204204
stack: ga 9.2
205205
```
@@ -1405,6 +1405,109 @@ GET retrievers_example/_search
14051405

14061406

14071407

1408+
## Example: Multi-level nested retrievers with min_score [retrievers-examples-nested-retrievers-min-score]
1409+
1410+
This example demonstrates how `min_score` works with deeply nested compound retrievers. The retriever tree structure is:
1411+
1412+
- **Outer RRF** (rank_window_size: 20)
1413+
- **Inner RRF** (rank_window_size: 50)
1414+
- **Linear** (rank_window_size: 100, minmax normalizer, min_score: 0.5)
1415+
- **Standard** (query_string query)
1416+
1417+
Documents are first matched by the innermost `standard` retriever using a query_string query. The `linear` retriever then normalizes these scores using the minmax normalizer and filters based on the specified `min_score`. Then, the inner `rrf` retriever computes the scores for the subset of documents that pass `min_score`, and finally the outer `rrf` retriever produces the final ranking.
1418+
1419+
The `total_hits` value in this scenario is constrained by the `rank_window_size` parameters, as parent retrievers only have access to the top documents from their children.
1420+
1421+
**Pagination behavior**: When using `from` and `size` at the top level of the search request, pagination is limited to the documents available at the outermost retriever's `rank_window_size`. In this example, even though the inner retrievers process more documents (100 for `linear`, 50 for inner `rrf`), the outer `rrf` only receives 50 documents and produces a final set of 20 documents. Therefore, `from` and `size` can only paginate through these top 20 documents.
1422+
1423+
```console
1424+
GET /retrievers_example/_search
1425+
{
1426+
"retriever": {
1427+
"rrf": {
1428+
"retrievers": [
1429+
{
1430+
"rrf": {
1431+
"retrievers": [
1432+
{
1433+
"linear": {
1434+
"min_score": 0.5,
1435+
"retrievers": [
1436+
{
1437+
"retriever": {
1438+
"standard": {
1439+
"query": {
1440+
"query_string": {
1441+
"query": "artificial intelligence",
1442+
"default_field": "text"
1443+
}
1444+
}
1445+
}
1446+
}
1447+
}
1448+
],
1449+
"rank_window_size": 100,
1450+
"normalizer": "minmax"
1451+
}
1452+
}
1453+
],
1454+
"rank_window_size": 50
1455+
}
1456+
}
1457+
],
1458+
"rank_window_size": 20
1459+
}
1460+
},
1461+
"size": 10
1462+
}
1463+
```
1464+
% TEST[continued]
1465+
1466+
In this example:
1467+
1. The `standard` retriever matches documents containing "artificial intelligence" in the text field.
1468+
2. The `linear` retriever normalizes these scores to a 0-1 range using minmax normalization and filters documents with score less than 0.5.
1469+
3. The inner `rrf` computes RRF scores based on document ranks.
1470+
4. The outer `rrf` produces the final RRF scores.
1471+
5. The `total_hits` value reflects only the documents passing the inner `min_score` threshold.
1472+
1473+
::::{dropdown} Example response
1474+
```console-result
1475+
{
1476+
"took": 42,
1477+
"timed_out": false,
1478+
"_shards": {
1479+
"total": 1,
1480+
"successful": 1,
1481+
"skipped": 0,
1482+
"failed": 0
1483+
},
1484+
"hits": {
1485+
"total": {
1486+
"value": 1,
1487+
"relation": "eq"
1488+
},
1489+
"max_score": 0.016393442,
1490+
"hits": [
1491+
{
1492+
"_index": "retrievers_example",
1493+
"_id": "2",
1494+
"_score": 0.016393442,
1495+
"_source": {
1496+
"vector": [0.12, 0.56, 0.78],
1497+
"text": "Artificial intelligence is transforming medicine, from advancing diagnostics and tailoring treatment plans to empowering predictive patient care for improved health outcomes.",
1498+
"year": 2023,
1499+
"topic": ["ai", "medicine"],
1500+
"timestamp": "2022-01-01T12:10:30"
1501+
}
1502+
}
1503+
]
1504+
}
1505+
}
1506+
```
1507+
% TESTRESPONSE[s/"took": 42/"took" : $body.took/]
1508+
::::
1509+
1510+
14081511
## Example: Explainability with multiple retrievers [retrievers-examples-explain-multiple-rrf]
14091512

14101513
By adding `explain: true` to the request, each retriever will now provide a detailed explanation of all the steps and calculations required to compute the final score. Composability is fully supported in the context of `explain`, and each retriever will provide its own explanation, as shown in the example below.

server/src/main/java/org/elasticsearch/search/retriever/RankDocsRetrieverBuilder.java

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99

1010
package org.elasticsearch.search.retriever;
1111

12+
import org.elasticsearch.features.NodeFeature;
1213
import org.elasticsearch.index.query.BoolQueryBuilder;
1314
import org.elasticsearch.index.query.QueryBuilder;
1415
import org.elasticsearch.index.query.QueryRewriteContext;
@@ -29,6 +30,10 @@
2930
*/
3031
public class RankDocsRetrieverBuilder extends RetrieverBuilder {
3132

33+
public static final NodeFeature NESTED_RETRIEVER_MIN_SCORE_TOTAL_HITS_FIX = new NodeFeature(
34+
"nested_retriever_min_score_total_hits_fix"
35+
);
36+
3237
public static final String NAME = "rank_docs_retriever";
3338
final int rankWindowSize;
3439
final List<RetrieverBuilder> sources;
@@ -49,8 +54,9 @@ public String getName() {
4954
return NAME;
5055
}
5156

52-
private boolean sourceHasMinScore() {
53-
return this.minScore != null || sources.stream().anyMatch(x -> x.minScore() != null);
57+
@Override
58+
protected boolean hasMinScore() {
59+
return this.minScore != null || sources.stream().anyMatch(RetrieverBuilder::hasMinScore);
5460
}
5561

5662
private boolean sourceShouldRewrite(QueryRewriteContext ctx) throws IOException {
@@ -125,15 +131,15 @@ public void extractToSearchSourceBuilder(SearchSourceBuilder searchSourceBuilder
125131
);
126132
}
127133
} else {
128-
rankQuery = new RankDocsQueryBuilder(rankDocResults, null, false);
134+
rankQuery = new RankDocsQueryBuilder(rankDocResults, null, true);
129135
}
130136
rankQuery.queryName(retrieverName());
131137
// ignore prefilters of this level, they were already propagated to children
132138
searchSourceBuilder.query(rankQuery);
133139
if (searchSourceBuilder.size() < 0) {
134140
searchSourceBuilder.size(rankWindowSize);
135141
}
136-
if (sourceHasMinScore()) {
142+
if (hasMinScore()) {
137143
searchSourceBuilder.minScore(this.minScore == null ? Float.MIN_VALUE : this.minScore);
138144
}
139145

server/src/main/java/org/elasticsearch/search/retriever/RetrieverBuilder.java

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -313,5 +313,9 @@ public String retrieverName() {
313313
return retrieverName;
314314
}
315315

316+
protected boolean hasMinScore() {
317+
return this.minScore != null;
318+
}
319+
316320
// ---- END FOR TESTING ----
317321
}

x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/RankRRFFeatures.java

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import java.util.Set;
1616

1717
import static org.elasticsearch.search.retriever.CompoundRetrieverBuilder.INNER_RETRIEVERS_FILTER_SUPPORT;
18+
import static org.elasticsearch.search.retriever.RankDocsRetrieverBuilder.NESTED_RETRIEVER_MIN_SCORE_TOTAL_HITS_FIX;
1819
import static org.elasticsearch.xpack.rank.linear.L2ScoreNormalizer.LINEAR_RETRIEVER_L2_NORM;
1920
import static org.elasticsearch.xpack.rank.linear.LinearRetrieverBuilder.LINEAR_RETRIEVER_MINSCORE_FIX;
2021
import static org.elasticsearch.xpack.rank.linear.MinMaxScoreNormalizer.LINEAR_RETRIEVER_MINMAX_SINGLE_DOC_FIX;
@@ -42,7 +43,8 @@ public Set<NodeFeature> getTestFeatures() {
4243
RRFRetrieverBuilder.SIMPLIFIED_WEIGHTED_SUPPORT,
4344
LINEAR_RETRIEVER_TOP_LEVEL_NORMALIZER,
4445
LinearRetrieverBuilder.MULTI_INDEX_SIMPLIFIED_FORMAT_SUPPORT,
45-
RRFRetrieverBuilder.MULTI_INDEX_SIMPLIFIED_FORMAT_SUPPORT
46+
RRFRetrieverBuilder.MULTI_INDEX_SIMPLIFIED_FORMAT_SUPPORT,
47+
NESTED_RETRIEVER_MIN_SCORE_TOTAL_HITS_FIX
4648
);
4749
}
4850
}

x-pack/plugin/rank-rrf/src/yamlRestTest/resources/rest-api-spec/test/rrf/700_rrf_retriever_search_api_compatibility.yml

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1148,3 +1148,48 @@ setup:
11481148
- match: { hits.hits.1._id: "3" }
11491149

11501150

1151+
---
1152+
"nested compound retrievers with min_score correctly compute total_hits":
1153+
- requires:
1154+
cluster_features: "nested_retriever_min_score_total_hits_fix"
1155+
reason: "requires fix for correct total_hits computation with nested min_score"
1156+
1157+
- do:
1158+
search:
1159+
index: test
1160+
body:
1161+
retriever:
1162+
rrf:
1163+
retrievers:
1164+
- rrf:
1165+
retrievers:
1166+
- linear:
1167+
retrievers: [
1168+
{
1169+
retriever: {
1170+
standard: {
1171+
query: {
1172+
query_string: {
1173+
query: "term1 OR term2 OR term3",
1174+
default_field: "text"
1175+
}
1176+
}
1177+
}
1178+
}
1179+
}
1180+
]
1181+
rank_window_size: 100
1182+
min_score: 0.5
1183+
normalizer: "minmax"
1184+
rank_window_size: 100
1185+
rank_window_size: 100
1186+
size: 10
1187+
1188+
- match: { hits.total.value: 3 }
1189+
- match: { hits.total.relation: "eq" }
1190+
- length: { hits.hits: 3 }
1191+
- match: { hits.hits.0._id: "1" }
1192+
- match: { hits.hits.1._id: "2" }
1193+
- match: { hits.hits.2._id: "3" }
1194+
1195+

0 commit comments

Comments
 (0)