Skip to content

Commit 4b1f912

Browse files
committed
Resolving bugs
1 parent c9bdb0c commit 4b1f912

File tree

4 files changed

+38
-51
lines changed

4 files changed

+38
-51
lines changed

docs/reference/elasticsearch/rest-apis/retrievers.md

Lines changed: 27 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ A standard retriever returns top documents from a traditional [query](/reference
6060
`filter`
6161
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md))
6262

63-
Applies a [boolean query filter](/reference/query-languages/query-dsl-bool-query.md) to this retriever, where all documents must match this query but do not contribute to the score.
63+
Applies a [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to this retriever, where all documents must match this query but do not contribute to the score.
6464

6565

6666
`search_after`
@@ -84,7 +84,7 @@ A standard retriever returns top documents from a traditional [query](/reference
8484
`min_score`
8585
: (Optional, `float`)
8686

87-
Minimum [`_score`](/reference/query-languages/query-filter-context.md#relevance-scores) for matching documents. Documents with a lower `_score` are not included in the top documents.
87+
Minimum [`_score`](/reference/query-languages/query-dsl/query-filter-context.md#relevance-scores) for matching documents. Documents with a lower `_score` are not included in the top documents.
8888

8989

9090
`collapse`
@@ -271,12 +271,12 @@ Each entry specifies the following parameters:
271271
`weight`
272272
: (Optional, float)
273273

274-
The weight that each score of this retriever's top docs will be multiplied with. Must be greater or equal to 0. Defaults to 1.0.
274+
The weight that each score of this retrievers top docs will be multiplied with. Must be greater or equal to 0. Defaults to 1.0.
275275

276276
`normalizer`
277277
: (Optional, String)
278278

279-
Specifies how we will normalize the retriever's scores, before applying the specified `weight`. Available values are: `minmax`, and `none`. Defaults to `none`.
279+
Specifies how we will normalize the retrievers scores, before applying the specified `weight`. Available values are: `minmax`, and `none`. Defaults to `none`.
280280

281281
* `none`
282282
* `minmax` : A `MinMaxScoreNormalizer` that normalizes scores based on the following formula
@@ -291,24 +291,17 @@ See also [this hybrid search example](docs-content://solutions/search/retrievers
291291
`rank_window_size`
292292
: (Optional, integer)
293293

294-
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search request's [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. Defaults to the `size` parameter.
295-
294+
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search request’s [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. Defaults to the `size` parameter.
296295

297296
`min_score`
298297
: (Optional, float)
299298

300-
Minimum score threshold for documents to be included in the final result set. Documents with scores below this threshold will be filtered out. Must be greater than or equal to 0. Defaults to 0.
301-
299+
Minimum score threshold for documents to be included in the final result set. Documents with scores below this threshold will be filtered out. Must be greater than or equal to 0 if explicitly set. If not set, defaults to minimum float value, meaning no documents are filtered based on score .
302300

303301
`filter`
304302
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md))
305303

306-
Applies the specified [boolean query filter](/reference/query-languages/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever's specifications.
307-
308-
309-
### Example: Hybrid search with min_score [linear-retriever-example]
310-
311-
This example demonstrates how to use the Linear retriever to combine a standard retriever with a kNN retriever, applying weights, normalization, and a minimum score threshold:
304+
Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever’s specifications.
312305

313306
```console
314307
GET /restaurants/_search
@@ -353,23 +346,24 @@ GET /restaurants/_search
353346
}
354347
```
355348

356-
1. Defines a retriever tree with a Linear retriever.
357-
2. The sub-retrievers array.
358-
3. The first sub-retriever is a `standard` retriever.
359-
4. The weight applied to the scores from the standard retriever (2.0).
360-
5. The normalization method applied to the standard retriever's scores.
361-
6. The second sub-retriever is a `knn` retriever.
362-
7. The weight applied to the scores from the kNN retriever (1.0).
363-
8. The normalization method applied to the kNN retriever's scores.
364-
9. The rank window size for the Linear retriever.
365-
10. The minimum score threshold - documents with a combined score below 1.5 will be filtered out from the final result set.
349+
1. Defines a retriever tree using the `linear` retriever type.
350+
2. The array of retrievers to be combined.
351+
3. A `standard` retriever used for traditional full-text search.
352+
4. Weight applied to the score from the `standard` retriever.
353+
5. Normalization method (`minmax`) applied to the `standard` retriever score.
354+
6. A `knn` retriever used for vector-based similarity search.
355+
7. Weight applied to the score from the `knn` retriever.
356+
8. Normalization method (`minmax`) applied to the `knn` retriever score.
357+
9. The number of top documents considered for scoring in the linear combination.
358+
10. Minimum score threshold for the final result set — documents below this combined score will be excluded.
366359

367360

368361
## RRF Retriever [rrf-retriever]
369362

370363
An [RRF](/reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md) retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers. Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set.
371364

372365

366+
373367
#### Parameters [rrf-retriever-parameters]
374368

375369
`retrievers`
@@ -387,13 +381,13 @@ An [RRF](/reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md) retriever
387381
`rank_window_size`
388382
: (Optional, integer)
389383

390-
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search request's [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. Defaults to the `size` parameter.
384+
This value determines the size of the individual result sets per query. A higher value will improve result relevance at the cost of performance. The final ranked result set is pruned down to the search requests [size](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search#search-size-param). `rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`. Defaults to the `size` parameter.
391385

392386

393387
`filter`
394388
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md))
395389

396-
Applies the specified [boolean query filter](/reference/query-languages/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever's specifications.
390+
Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retrievers specifications.
397391

398392

399393

@@ -502,12 +496,12 @@ For compound retrievers like `rrf`, the `window_size` parameter defines the tota
502496

503497
When using the `rescorer`, an error is returned if the following conditions are not met:
504498

505-
* The minimum configured rescore's `window_size` is:
499+
* The minimum configured rescores `window_size` is:
506500

507501
* Greater than or equal to the `size` of the parent retriever for nested `rescorer` setups.
508502
* Greater than or equal to the `size` of the search request when used as the primary retriever in the tree.
509503

510-
* And the maximum rescore's `window_size` is:
504+
* And the maximum rescores `window_size` is:
511505

512506
* Smaller than or equal to the `size` or `rank_window_size` of the child retriever.
513507

@@ -530,7 +524,7 @@ When using the `rescorer`, an error is returned if the following conditions are
530524
`filter`
531525
: (Optional. [query object or list of query objects](/reference/query-languages/querydsl.md))
532526

533-
Applies a [boolean query filter](/reference/query-languages/query-dsl-bool-query.md) to the retriever, ensuring that all documents match the filter criteria without affecting their scores.
527+
Applies a [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to the retriever, ensuring that all documents match the filter criteria without affecting their scores.
534528

535529

536530

@@ -699,7 +693,7 @@ score = ln(score), if score < 0
699693
`filter`
700694
: (Optional, [query object or list of query objects](/reference/query-languages/querydsl.md))
701695

702-
Applies the specified [boolean query filter](/reference/query-languages/query-dsl-bool-query.md) to the child `retriever`. If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever.
696+
Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to the child `retriever`. If the child retriever already specifies any filters, then this top-level filter is applied in conjuction with the filter defined in the child retriever.
703697

704698

705699

@@ -733,7 +727,7 @@ Follow these steps:
733727
}
734728
```
735729

736-
1. [Adaptive allocations](docs-content://explore-analyze/machine-learning/nlp/ml-nlp-auto-scale.md#nlp-model-adaptive-allocations) will be enabled with the minimum of 1 and the maximum of 10 allocations.
730+
1. [Adaptive allocations](docs-content://deploy-manage/autoscaling/trained-model-autoscaling.md#enabling-autoscaling-through-apis-adaptive-allocations) will be enabled with the minimum of 1 and the maximum of 10 allocations.
737731

738732
2. Define a `text_similarity_rerank` retriever:
739733

@@ -867,7 +861,7 @@ Follow these steps to load the model and create a semantic re-ranker.
867861

868862
## Query Rules Retriever [rule-retriever]
869863

870-
The `rule` retriever enables fine-grained control over search results by applying contextual [query rules](/reference/elasticsearch/rest-apis/searching-with-query-rules.md#query-rules) to pin or exclude documents for specific queries. This retriever has similar functionality to the [rule query](/reference/query-languages/query-dsl-rule-query.md), but works out of the box with other retrievers.
864+
The `rule` retriever enables fine-grained control over search results by applying contextual [query rules](/reference/elasticsearch/rest-apis/searching-with-query-rules.md#query-rules) to pin or exclude documents for specific queries. This retriever has similar functionality to the [rule query](/reference/query-languages/query-dsl/query-dsl-rule-query.md), but works out of the box with other retrievers.
871865

872866
### Prerequisites [_prerequisites_16]
873867

@@ -996,7 +990,7 @@ The [`from`](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operati
996990

997991
### Using aggregations with a retriever tree [retriever-aggregations]
998992

999-
[Aggregations](/reference/data-analysis/aggregations/index.md) are globally specified as part of a search request. The query used for an aggregation is the combination of all leaf retrievers as `should` clauses in a [boolean query](/reference/query-languages/query-dsl-bool-query.md).
993+
[Aggregations](/reference/aggregations/index.md) are globally specified as part of a search request. The query used for an aggregation is the combination of all leaf retrievers as `should` clauses in a [boolean query](/reference/query-languages/query-dsl/query-dsl-bool-query.md).
1000994

1001995

1002996
### Restrictions on search parameters when specifying a retriever [retriever-restrictions]

x-pack/plugin/rank-rrf/src/internalClusterTest/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverIT.java

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -913,13 +913,9 @@ public void testLinearWithMinScore() {
913913
SearchRequestBuilder req = prepareSearchWithPIT(source);
914914
ElasticsearchAssertions.assertResponse(req, resp -> {
915915
assertNotNull(resp.pointInTimeId());
916-
// TotalHits reflects the original query scope before compound minScore filtering.
917-
// Asserting on hits.length verifies the retriever's minScore correctly filtered the returned hits.
918-
// assertNotNull(resp.getHits().getTotalHits()); // getTotalHits() might still be non-null
919-
// assertThat(resp.getHits().getTotalHits().value(), equalTo(1L)); // This assertion is incorrect based on expected behavior
920-
// assertThat(resp.getHits().getTotalHits().relation(), equalTo(TotalHits.Relation.EQUAL_TO)); // Relation also reflects
921-
// pre-filtering count
922916
assertThat(resp.getHits().getHits().length, equalTo(1)); // Verify actual returned hits count
917+
// The total hits count reflects matches before min_score filtering.
918+
assertThat(resp.getHits().getTotalHits().value(), equalTo(2L));
923919
assertThat(resp.getHits().getAt(0).getId(), equalTo("doc_2"));
924920
assertThat((double) resp.getHits().getAt(0).getScore(), closeTo(30.0f, 0.001f));
925921
});
@@ -944,7 +940,8 @@ public void testLinearWithMinScore() {
944940
ElasticsearchAssertions.assertResponse(req, resp -> {
945941
assertNotNull(resp.pointInTimeId());
946942
assertNotNull(resp.getHits().getTotalHits());
947-
assertThat(resp.getHits().getTotalHits().value(), equalTo(3L));
943+
// The total hits count reflects matches before min_score filtering.
944+
assertThat(resp.getHits().getTotalHits().value(), equalTo(6L));
948945
assertThat(resp.getHits().getTotalHits().relation(), equalTo(TotalHits.Relation.EQUAL_TO));
949946
assertThat(resp.getHits().getHits().length, equalTo(3));
950947
assertThat(resp.getHits().getAt(0).getScore(), equalTo(30.0f));
@@ -992,7 +989,8 @@ public void testLinearWithMinScoreAndNormalization() {
992989
ElasticsearchAssertions.assertResponse(req, resp -> {
993990
assertNull(resp.pointInTimeId());
994991
assertNotNull(resp.getHits().getTotalHits());
995-
assertThat(resp.getHits().getTotalHits().value(), equalTo(4L));
992+
// The total hits count reflects matches before min_score filtering.
993+
assertThat(resp.getHits().getTotalHits().value(), equalTo(6L));
996994
assertThat(resp.getHits().getTotalHits().relation(), equalTo(TotalHits.Relation.EQUAL_TO));
997995
assertThat(resp.getHits().getHits().length, equalTo(4));
998996
assertThat(resp.getHits().getAt(0).getId(), equalTo("doc_2"));
@@ -1023,7 +1021,8 @@ public void testLinearWithMinScoreAndNormalization() {
10231021
ElasticsearchAssertions.assertResponse(req, resp -> {
10241022
assertNotNull(resp.pointInTimeId());
10251023
assertNotNull(resp.getHits().getTotalHits());
1026-
assertThat(resp.getHits().getTotalHits().value(), equalTo(3L));
1024+
// The total hits count reflects matches before min_score filtering.
1025+
assertThat(resp.getHits().getTotalHits().value(), equalTo(6L));
10271026
assertThat(resp.getHits().getHits().length, equalTo(3));
10281027
assertThat(resp.getHits().getAt(0).getId(), equalTo("doc_2"));
10291028
assertThat((double) resp.getHits().getAt(0).getScore(), closeTo(1.9f, 0.1f));

x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -191,9 +191,8 @@ protected RankDoc[] combineInnerRetrieverResults(List<ScoreDoc[]> rankResults, b
191191
}
192192
// sort the results based on the final score, tiebreaker based on smaller doc id
193193
LinearRankDoc[] sortedResults = docsToRankResults.values().toArray(LinearRankDoc[]::new);
194-
Arrays.sort(sortedResults); // Sorts descending by score (highest first)
194+
Arrays.sort(sortedResults);
195195

196-
// Find the number of results that meet the minScore threshold
197196
int validCount = 0;
198197
while (validCount < sortedResults.length && sortedResults[validCount].score >= minScore) {
199198
validCount++;
@@ -207,7 +206,6 @@ protected RankDoc[] combineInnerRetrieverResults(List<ScoreDoc[]> rankResults, b
207206
topResults[rank].rank = rank + 1;
208207
}
209208

210-
System.out.println("topResults: " + topResults.length);
211209
return topResults;
212210
}
213211

x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/MinMaxScoreNormalizer.java

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,8 +9,6 @@
99

1010
import org.apache.lucene.search.ScoreDoc;
1111

12-
import static org.elasticsearch.xpack.rank.linear.LinearRetrieverBuilder.DEFAULT_SCORE;
13-
1412
public class MinMaxScoreNormalizer extends ScoreNormalizer {
1513

1614
public static final MinMaxScoreNormalizer INSTANCE = new MinMaxScoreNormalizer();
@@ -55,10 +53,8 @@ public ScoreDoc[] normalizeScores(ScoreDoc[] docs) {
5553
boolean minEqualsMax = Math.abs(min - max) < EPSILON;
5654
for (int i = 0; i < docs.length; i++) {
5755
float score;
58-
if (Float.isNaN(docs[i].score)) {
59-
score = DEFAULT_SCORE;
60-
} else if (minEqualsMax) {
61-
score = docs[i].score;
56+
if (minEqualsMax) {
57+
score = min;
6258
} else {
6359
score = (docs[i].score - min) / (max - min);
6460
}

0 commit comments

Comments
 (0)