Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,9 @@ First, let’s examine how to combine two different types of queries: a `kNN` qu
While these queries may produce scores in different ranges, we can use Reciprocal Rank Fusion (`rrf`) to combine the results and generate a merged final result list.

To implement this in the retriever framework, we start with the top-level element: our `rrf` retriever.
This retriever operates on top of two other retrievers: a `knn` retriever and a `standard` retriever. Our query structure would look like this:
This retriever operates on top of two other retrievers: a `knn` retriever and a `standard` retriever.
We can specify weights to adjust the influence of each retriever on the final ranking.
In this example, we're giving the `standard` retriever twice the influence of the `knn` retriever:

```console
GET /retrievers_example/_search
Expand All @@ -122,26 +124,32 @@ GET /retrievers_example/_search
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"query_string": {
"query": "(information retrieval) OR (artificial intelligence)",
"default_field": "text"
"retriever": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old syntax still works when not specifying weights, correct? Should we keep both examples in the docs?

"standard": {
"query": {
"query_string": {
"query": "(information retrieval) OR (artificial intelligence)",
"default_field": "text"
}
}
}
}
},
"weight": 2.0
},
{
"knn": {
"field": "vector",
"query_vector": [
0.23,
0.67,
0.89
],
"k": 3,
"num_candidates": 5
}
"retriever": {
"knn": {
"field": "vector",
"query_vector": [
0.23,
0.67,
0.89
],
"k": 3,
"num_candidates": 5
}
},
"weight": 1.0
}
],
"rank_window_size": 10,
Expand Down
102 changes: 100 additions & 2 deletions docs/reference/elasticsearch/rest-apis/retrievers/rrf-retriever.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ applies_to:

# RRF retriever [rrf-retriever]

An [RRF](/reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md) retriever returns top documents based on the RRF formula, equally weighting two or more child retrievers.
An [RRF](/reference/elasticsearch/rest-apis/reciprocal-rank-fusion.md) retriever returns top documents based on the RRF formula, combining two or more child retrievers.
Reciprocal rank fusion (RRF) is a method for combining multiple result sets with different relevance indicators into a single result set.


Expand All @@ -32,7 +32,8 @@ Combining `query` and `retrievers` is not supported.
: (Optional, array of retriever objects)

A list of child retrievers to specify which sets of returned top documents will have the RRF formula applied to them.
Each child retriever carries an equal weight as part of the RRF formula. Two or more child retrievers are required.
Two or more child retrievers are required.
Each retriever can optionally include a weight to adjust its influence on the final ranking.

`rank_constant`
: (Optional, integer)
Expand All @@ -53,6 +54,34 @@ Combining `query` and `retrievers` is not supported.

Applies the specified [boolean query filter](/reference/query-languages/query-dsl/query-dsl-bool-query.md) to all of the specified sub-retrievers, according to each retriever’s specifications.

Each entry in the `retrievers` array can be specified in two ways:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems confusing. I would replace this paragraph with bringing back the earlier example, and have two complete examples, one specifying weights and one not.


**Without custom weight** (uses default weight of `1.0`):
```json
{ "standard": { "query": {...} } }
```

**With custom weight** {applies_to}`stack: ga 9.2`:
```json
{ "retriever": { "standard": { "query": {...} } }, "weight": 2.0 }
```

When you need to specify a custom weight, wrap your retriever in an object with `retriever` and `weight` fields. {applies_to}`stack: ga 9.2`

The wrapped form supports these parameters:

`retriever`
: (Optional, a retriever object)

Specifies a child retriever. Any valid retriever type can be used (e.g., `standard`, `knn`, `text_similarity_reranker`, etc.).

`weight` {applies_to}`stack: ga 9.2`
: (Optional, float)

The weight that each score of this retriever's top docs will be multiplied in the RRF formula. Higher values increase this retriever's influence on the final ranking. Must be non-negative.

When weight is not specified, all retrievers are equally weighted against each other (each with a weight of 1.0).

## Example: Hybrid search [rrf-retriever-example-hybrid]

A simple hybrid search example (lexical search + dense vector search) combining a `standard` retriever with a `knn` retriever using RRF:
Expand Down Expand Up @@ -182,6 +211,75 @@ GET /restaurants/_search
5. The rank constant for the RRF retriever.
6. The rank window size for the RRF retriever.

## Example: Weighted hybrid search [rrf-retriever-example-weighted]

{applies_to}`stack: ga 9.2`

This example demonstrates how to use weights to adjust the influence of different retrievers in the RRF ranking.
In this case, we're giving the `standard` retriever more importance (weight 2.0) compared to the `knn` retriever (weight 1.0):

```console
GET /restaurants/_search
{
"retriever": {
"rrf": {
"retrievers": [
{
"retriever": { <1>
"standard": {
"query": {
"multi_match": {
"query": "Austria",
"fields": ["city", "region"]
}
}
}
},
"weight": 2.0 <2>
},
{
"retriever": { <3>
"knn": {
"field": "vector",
"query_vector": [10, 22, 77],
"k": 10,
"num_candidates": 10
}
},
"weight": 1.0 <4>
}
],
"rank_constant": 60,
"rank_window_size": 50
}
}
}
```
% TEST[continued]

1. The first retriever in weighted format.
2. This retriever has a weight of 2.0, giving it twice the influence of the kNN retriever.
3. The second retriever in weighted format.
4. This retriever has a weight of 1.0 (default weight).

::::{note}
You can mix weighted and non-weighted formats in the same query.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK, provided you link to the more explicit examples in the retriever docs that I suggested above.

The direct format (without explicit `retriever` wrapper) uses the default weight of `1.0`:

```json
{
"rrf": {
"retrievers": [
{ "standard": { "query": {...} } },
{ "retriever": { "knn": {...} }, "weight": 2.0 }
]
}
}
```

In this example, the `standard` retriever uses weight `1.0` (default), while the `knn` retriever uses weight `2.0`.
::::

## Example: Hybrid search with sparse vectors [rrf-retriever-example-hybrid-sparse]

A more complex hybrid search example (lexical search + ELSER sparse vector search + dense vector search) using RRF:
Expand Down
Loading