-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Adds Internal mechanisms and retriever for MMR based result diversification #135880
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 38 commits
23f5caa
23494e4
1cba1ff
70407f2
a7c7070
d760242
5be922f
1c4e3a3
bac22dd
297fdfa
7bde1f9
5d0683c
5d32183
19d4699
dab4d45
a36a828
eaf4a56
a2528e8
60abbfe
561b0a6
f7f0321
992d81b
1c1c4ab
3eeb09c
b36f30b
3936641
4ec1df6
02b4a6e
d498f21
1dfb3b9
831f4b9
572a1a0
f9423f3
8c65c53
e7947ab
9184711
76d6b74
a433b4a
cdfda17
f542cfe
7f59eae
3805ee5
ad0c7cc
a9a2b99
9cc4f1b
caa8864
97b4b40
f9376a1
8d09700
fa00fe0
ecaba54
63193ef
dcf9055
b01a259
39d9def
812e9a3
09a3005
7cee161
2b8e413
53831db
0685486
916f418
cd8eca9
b99f2d6
41a57c5
eca3bed
039d705
9ca42b4
49112a0
df2466b
9692564
026ef4c
11a0dcd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 135873 | ||
| summary: Adds retriever for result diversification using MMR | ||
| area: Search | ||
| type: enhancement | ||
| issues: [ ] |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| --- | ||
| applies_to: | ||
| stack: all | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| serverless: | ||
| --- | ||
|
|
||
| # Diversify retriever [diversify-retriever] | ||
|
|
||
| The diversify retriever is able to pare down results from another retriever to | ||
| apply diversification to the top-N results. | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| This is particularly useful in cases where you need to have relevant, but | ||
| non-similar results returned from your query. An example of this may be to | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| provide more diverse context to a RAG prompt. | ||
|
|
||
| Using MMR (Maximum Marginal Relevance) diversification, the retriever discards | ||
kderusso marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| any inner retriever results that are too similar to each other based on | ||
| the `field` parameter and in reference to any `query_vector` that is provided. | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Note that the order of the results from the inner retriever is not changed. | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Parameters [diversify-retriever-parameters] | ||
|
|
||
| `type` | ||
| : (Required, string) | ||
|
|
||
| The type of diversification to use. Currently only `mmr` (maximum marginal relevance) is supported. | ||
|
|
||
| `field` | ||
| : (Required, string) | ||
|
|
||
| The name of the field that will use its values for the diversification process. | ||
| The field must be a `dense_vector` type. | ||
|
|
||
| `num_candidates` | ||
| : (Required, integer) | ||
|
|
||
| The maximum number of top-N results to return. | ||
|
|
||
| `retriever` | ||
| : (Required, retriever object) | ||
|
|
||
| A single child retriever to specify which sets of returned top documents will have the diversification applied to them. | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Note that although some of the inner retriever's results may be removed, the rank and order will not change. | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| `query_vector` | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we also support the query_vector_builder option? if not let's add it as a follow up
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call - I'll add a follow up to this one
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI - there's a task on the backlog for adding
markjhoy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| : (Optional, array of `float` or `byte`) | ||
|
|
||
| Query vector. Must have the same number of dimensions as the vector field you are searching against. | ||
| Must be either an array of floats or a hex-encoded byte vector. | ||
|
|
||
| `lambda` | ||
| : (Required if `mmr` is used, float) | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| A number between 0.0 and 1.0 specifying how much weight for diversification should be given to the query vector as opposed to the amount of weight given to the field values. | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ## Example | ||
|
|
||
| The following example uses a MMR diversification retriever to diversify and | ||
| return the top three results from the inner standard retriever. | ||
| The lambda is set at 0.7 which favors the weight from the comparisons of the | ||
| vectors in `my_dense_field_vector` over the query vector for determining the | ||
| differencs between the documents. | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| ```console | ||
| GET my_index/_search | ||
markjhoy marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| { | ||
| "retriever": { | ||
| "diversify": { | ||
| "type": "mmr", | ||
| "field": "my_dense_vector_field", | ||
| "lambda": 0.7, | ||
| "num_candidates": 3 | ||
markjhoy marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| "query_vector": [0.1, 0.2, 0.3], | ||
| "retriever": { | ||
| "standard": { | ||
| "query": { | ||
| "match": { | ||
| "title": "elasticsearch" | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.