Skip to content

Commit 375814d

Browse files
authored
Adding linear retriever to support weighted sums of sub-retrievers (#120222)
1 parent e48a205 commit 375814d

File tree

30 files changed

+3139
-40
lines changed

30 files changed

+3139
-40
lines changed

docs/changelog/120222.yaml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
pr: 120222
2+
summary: Adding linear retriever to support weighted sums of sub-retrievers
3+
area: "Search"
4+
type: enhancement
5+
issues: []

docs/reference/rest-api/common-parms.asciidoc

Lines changed: 43 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1338,7 +1338,7 @@ that lower ranked documents have more influence. This value must be greater than
13381338
equal to `1`. Defaults to `60`.
13391339
end::rrf-rank-constant[]
13401340

1341-
tag::rrf-rank-window-size[]
1341+
tag::compound-retriever-rank-window-size[]
13421342
`rank_window_size`::
13431343
(Optional, integer)
13441344
+
@@ -1347,15 +1347,54 @@ query. A higher value will improve result relevance at the cost of performance.
13471347
ranked result set is pruned down to the search request's <<search-size-param, size>>.
13481348
`rank_window_size` must be greater than or equal to `size` and greater than or equal to `1`.
13491349
Defaults to the `size` parameter.
1350-
end::rrf-rank-window-size[]
1350+
end::compound-retriever-rank-window-size[]
13511351

1352-
tag::rrf-filter[]
1352+
tag::compound-retriever-filter[]
13531353
`filter`::
13541354
(Optional, <<query-dsl, query object or list of query objects>>)
13551355
+
13561356
Applies the specified <<query-dsl-bool-query, boolean query filter>> to all of the specified sub-retrievers,
13571357
according to each retriever's specifications.
1358-
end::rrf-filter[]
1358+
end::compound-retriever-filter[]
1359+
1360+
tag::linear-retriever-components[]
1361+
`retrievers`::
1362+
(Required, array of objects)
1363+
+
1364+
A list of the sub-retrievers' configuration, that we will take into account and whose result sets
1365+
we will merge through a weighted sum. Each configuration can have a different weight and normalization depending
1366+
on the specified retriever.
1367+
1368+
Each entry specifies the following parameters:
1369+
1370+
* `retriever`::
1371+
(Required, a <<retriever, retriever>> object)
1372+
+
1373+
Specifies the retriever for which we will compute the top documents for. The retriever will produce `rank_window_size`
1374+
results, which will later be merged based on the specified `weight` and `normalizer`.
1375+
1376+
* `weight`::
1377+
(Optional, float)
1378+
+
1379+
The weight that each score of this retriever's top docs will be multiplied with. Must be greater or equal to 0. Defaults to 1.0.
1380+
1381+
* `normalizer`::
1382+
(Optional, String)
1383+
+
1384+
Specifies how we will normalize the retriever's scores, before applying the specified `weight`.
1385+
Available values are: `minmax`, and `none`. Defaults to `none`.
1386+
1387+
** `none`
1388+
** `minmax` :
1389+
A `MinMaxScoreNormalizer` that normalizes scores based on the following formula
1390+
+
1391+
```
1392+
score = (score - min) / (max - min)
1393+
```
1394+
1395+
See also <<retrievers-examples-linear-retriever, this hybrid search example>> using a linear retriever on how to
1396+
independently configure and apply normalizers to retrievers.
1397+
end::linear-retriever-components[]
13591398

13601399
tag::knn-rescore-vector[]
13611400

docs/reference/search/retriever.asciidoc

Lines changed: 27 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,9 @@ A <<standard-retriever, retriever>> that replaces the functionality of a traditi
2828
`knn`::
2929
A <<knn-retriever, retriever>> that replaces the functionality of a <<search-api-knn, knn search>>.
3030

31+
`linear`::
32+
A <<linear-retriever, retriever>> that linearly combines the scores of other retrievers for the top documents.
33+
3134
`rescorer`::
3235
A <<rescorer-retriever, retriever>> that replaces the functionality of the <<rescore, query rescorer>>.
3336

@@ -45,6 +48,8 @@ A <<rule-retriever, retriever>> that applies contextual <<query-rules>> to pin o
4548

4649
A standard retriever returns top documents from a traditional <<query-dsl, query>>.
4750

51+
[discrete]
52+
[[standard-retriever-parameters]]
4853
===== Parameters:
4954

5055
`query`::
@@ -195,6 +200,8 @@ Documents matching these conditions will have increased relevancy scores.
195200

196201
A kNN retriever returns top documents from a <<knn-search, k-nearest neighbor search (kNN)>>.
197202

203+
[discrete]
204+
[[knn-retriever-parameters]]
198205
===== Parameters
199206

200207
`field`::
@@ -265,21 +272,37 @@ GET /restaurants/_search
265272
This value must be fewer than or equal to `num_candidates`.
266273
<5> The size of the initial candidate set from which the final `k` nearest neighbors are selected.
267274

275+
[[linear-retriever]]
276+
==== Linear Retriever
277+
A retriever that normalizes and linearly combines the scores of other retrievers.
278+
279+
[discrete]
280+
[[linear-retriever-parameters]]
281+
===== Parameters
282+
283+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=linear-retriever-components]
284+
285+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size]
286+
287+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-filter]
288+
268289
[[rrf-retriever]]
269290
==== RRF Retriever
270291

271292
An <<rrf, RRF>> retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers.
272293
Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set.
273294

295+
[discrete]
296+
[[rrf-retriever-parameters]]
274297
===== Parameters
275298

276299
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-retrievers]
277300

278301
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-constant]
279302

280-
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-window-size]
303+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size]
281304

282-
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-filter]
305+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-filter]
283306

284307
[discrete]
285308
[[rrf-retriever-example-hybrid]]
@@ -540,6 +563,8 @@ score = ln(score), if score < 0
540563
----
541564
====
542565

566+
[discrete]
567+
[[text-similarity-reranker-retriever-parameters]]
543568
===== Parameters
544569

545570
`retriever`::

docs/reference/search/rrf.asciidoc

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-retrievers]
4545

4646
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-constant]
4747

48-
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=rrf-rank-window-size]
48+
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=compound-retriever-rank-window-size]
4949

5050
An example request using RRF:
5151

@@ -791,11 +791,11 @@ A more specific example of highlighting in RRF can also be found in the <<retrie
791791

792792
==== Inner hits in RRF
793793

794-
The `rrf` retriever supports <<inner-hits,inner hits>> functionality, allowing you to retrieve
795-
related nested or parent/child documents alongside your main search results. Inner hits can be
796-
specified as part of any nested sub-retriever and will be propagated to the top-level parent
797-
retriever. Note that the inner hit computation will take place only at end of `rrf` retriever's
798-
evaluation on the top matching documents, and not as part of the query execution of the nested
794+
The `rrf` retriever supports <<inner-hits,inner hits>> functionality, allowing you to retrieve
795+
related nested or parent/child documents alongside your main search results. Inner hits can be
796+
specified as part of any nested sub-retriever and will be propagated to the top-level parent
797+
retriever. Note that the inner hit computation will take place only at end of `rrf` retriever's
798+
evaluation on the top matching documents, and not as part of the query execution of the nested
799799
sub-retrievers.
800800

801801
[IMPORTANT]

0 commit comments

Comments
 (0)