-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Add a generic rescorer
retriever based on the search request's rescore functionality
#118585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 11 commits
Commits
Show all changes
61 commits
Select commit
Hold shift + click to select a range
979e34c
Add an inference metadata fields instead of storing the inference in …
jimczi 08f58e7
Merge remote-tracking branch 'upstream/main' into inference_metadata_…
jimczi 8f5e234
iter
jimczi 78ff84f
Merge remote-tracking branch 'upstream/main' into inference_metadata_…
jimczi 1d1e819
iter
jimczi 4f29bd8
iter
jimczi 50222ed
spotless
jimczi 5ab3e35
iter
jimczi 9acf74d
iter
jimczi 7bcd2b1
Merge remote-tracking branch 'upstream/main' into inference_metadata_…
jimczi cdae3cf
iter
jimczi fe60268
iter
jimczi 184174f
Merge remote-tracking branch 'upstream/main' into inference_metadata_…
jimczi f204cc3
iter
jimczi d710efc
iter
jimczi b0a15cc
Merge remote-tracking branch 'upstream/main' into inference_metadata_…
jimczi 22db126
iter
jimczi 825d626
iter
jimczi 2cb7963
iter
jimczi bcce2ea
Remove value fetcher as it also retrieves copy_to fields
jimczi d1b8d61
Merge remote-tracking branch 'upstream/main' into inference_metadata_…
jimczi 9deee7e
Merge branch 'main' into inference_metadata_fields
Mikep86 1e76873
Merge branch 'main' into inference_metadata_fields
Mikep86 9b36f55
Merge branch 'main' into inference_metadata_fields
Mikep86 14eeb27
Merge branch 'main' into inference_metadata_fields
Mikep86 cc0f394
Merge branch 'main' into inference_metadata_fields
Mikep86 3117cf6
Support `_shard_doc` as a sort tiebreaker for query rescoring
jimczi 9feae33
Add a new `rescorer` retriever
jimczi dd0d830
Update docs/changelog/118585.yaml
jimczi de13467
Add inference metadata fields feature flag and fix texts (#118190)
Mikep86 5966d1c
remove yaml test now that _shard_doc can be used without a PIT
jimczi 768d716
Add downstream validation of the rank_window_size in the compound ret…
jimczi 9634945
address review comments
jimczi 54458e3
add missing size validation
jimczi c9d21f5
fix license header
jimczi 61f025e
Merge branch 'main' into inference_metadata_fields
Mikep86 c0c3b08
Merge remote-tracking branch 'upstream/main' into rescorer_retriever
jimczi f0c4236
fix doc
jimczi 44a5132
[CI] Auto commit changes from spotless
71c4823
fix tests
jimczi fa45c50
Merge branch 'main' into inference_metadata_fields
Mikep86 cb86fd4
Inference Metadata Fields - Chunk On Delimiter (#118694)
Mikep86 e80fca1
Merge branch 'main' into inference_metadata_fields
Mikep86 80c0bae
Merge remote-tracking branch 'upstream/main' into rescorer_retriever
jimczi b1ddf79
fix wrong link in docs
jimczi 9533c7b
Merge branch 'main' into inference_metadata_fields
Mikep86 3a3e8cf
Update docs/reference/search/retriever.asciidoc
jimczi 3f421ef
Merge remote-tracking branch 'upstream/inference_metadata_fields' int…
jimczi 6e2a44c
improve error message related to window size
jimczi 582d88a
Merge remote-tracking branch 'origin/rescorer_retriever' into rescore…
jimczi 8d21423
Revert "Merge remote-tracking branch 'upstream/inference_metadata_fie…
jimczi fd66bd8
Merge remote-tracking branch 'upstream/main' into rescorer_retriever
jimczi 84dcda0
revert changes after wrong merge
jimczi 2ef709a
revert
jimczi e754439
Merge remote-tracking branch 'upstream/main' into rescorer_retriever
jimczi 701066e
adapt error message
jimczi ce1d2d5
fix error message (bis)
jimczi fd40ce2
[CI] Auto commit changes from spotless
bfad0ea
Merge remote-tracking branch 'upstream/main' into rescorer_retriever
jimczi 9ea3fcc
Merge remote-tracking branch 'origin/rescorer_retriever' into rescore…
jimczi 653e855
fix error message (part 3)
jimczi File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
pr: 118585 | ||
summary: Add a generic `rescorer` retriever based on the search request's rescore | ||
functionality | ||
area: Ranking | ||
type: feature | ||
issues: | ||
- 118327 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
225 changes: 225 additions & 0 deletions
225
...src/yamlRestTest/resources/rest-api-spec/test/search.retrievers/30_rescorer_retriever.yml
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,225 @@ | ||
setup: | ||
- requires: | ||
cluster_features: [ "search.retriever.rescorer.enabled" ] | ||
reason: "Support for rescorer retriever" | ||
|
||
- do: | ||
indices.create: | ||
index: test | ||
body: | ||
settings: | ||
number_of_shards: 1 | ||
number_of_replicas: 0 | ||
mappings: | ||
properties: | ||
available: | ||
type: boolean | ||
features: | ||
type: rank_features | ||
|
||
- do: | ||
bulk: | ||
refresh: true | ||
index: test | ||
body: | ||
- '{"index": {"_id": 1 }}' | ||
- '{"features": { "first_stage": 1, "second_stage": 10}, "available": true, "group": 1}' | ||
- '{"index": {"_id": 2 }}' | ||
- '{"features": { "first_stage": 2, "second_stage": 9}, "available": false, "group": 1}' | ||
- '{"index": {"_id": 3 }}' | ||
- '{"features": { "first_stage": 3, "second_stage": 8}, "available": false, "group": 3}' | ||
- '{"index": {"_id": 4 }}' | ||
- '{"features": { "first_stage": 4, "second_stage": 7}, "available": true, "group": 1}' | ||
- '{"index": {"_id": 5 }}' | ||
- '{"features": { "first_stage": 5, "second_stage": 6}, "available": true, "group": 3}' | ||
- '{"index": {"_id": 6 }}' | ||
- '{"features": { "first_stage": 6, "second_stage": 5}, "available": false, "group": 2}' | ||
- '{"index": {"_id": 7 }}' | ||
- '{"features": { "first_stage": 7, "second_stage": 4}, "available": true, "group": 3}' | ||
- '{"index": {"_id": 8 }}' | ||
- '{"features": { "first_stage": 8, "second_stage": 3}, "available": true, "group": 1}' | ||
- '{"index": {"_id": 9 }}' | ||
- '{"features": { "first_stage": 9, "second_stage": 2}, "available": true, "group": 2}' | ||
- '{"index": {"_id": 10 }}' | ||
- '{"features": { "first_stage": 10, "second_stage": 1}, "available": false, "group": 1}' | ||
|
||
--- | ||
"Rescorer retriever basic": | ||
pmpailis marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- do: | ||
search: | ||
index: test | ||
body: | ||
retriever: | ||
rescorer: | ||
rescore: | ||
window_size: 10 | ||
query: | ||
rescore_query: | ||
rank_feature: | ||
field: "features.second_stage" | ||
linear: { } | ||
query_weight: 0 | ||
retriever: | ||
standard: | ||
query: | ||
rank_feature: | ||
field: "features.first_stage" | ||
linear: { } | ||
size: 2 | ||
|
||
- match: { hits.total.value: 10 } | ||
- match: { hits.hits.0._id: "1" } | ||
- match: { hits.hits.0._score: 10.0 } | ||
- match: { hits.hits.1._id: "2" } | ||
- match: { hits.hits.1._score: 9.0 } | ||
|
||
- do: | ||
search: | ||
index: test | ||
body: | ||
retriever: | ||
rescorer: | ||
rescore: | ||
window_size: 3 | ||
query: | ||
rescore_query: | ||
rank_feature: | ||
field: "features.second_stage" | ||
linear: {} | ||
query_weight: 0 | ||
retriever: | ||
standard: | ||
query: | ||
rank_feature: | ||
field: "features.first_stage" | ||
linear: {} | ||
size: 2 | ||
|
||
- match: {hits.total.value: 10} | ||
- match: {hits.hits.0._id: "8"} | ||
- match: { hits.hits.0._score: 3.0 } | ||
- match: {hits.hits.1._id: "9"} | ||
- match: { hits.hits.1._score: 2.0 } | ||
|
||
--- | ||
"Rescorer retriever with pre-filters": | ||
- do: | ||
search: | ||
index: test | ||
body: | ||
retriever: | ||
rescorer: | ||
filter: | ||
match: | ||
available: true | ||
rescore: | ||
window_size: 10 | ||
query: | ||
rescore_query: | ||
rank_feature: | ||
field: "features.second_stage" | ||
linear: { } | ||
query_weight: 0 | ||
retriever: | ||
standard: | ||
query: | ||
rank_feature: | ||
field: "features.first_stage" | ||
linear: { } | ||
size: 2 | ||
|
||
- match: { hits.total.value: 6 } | ||
- match: { hits.hits.0._id: "1" } | ||
- match: { hits.hits.0._score: 10.0 } | ||
- match: { hits.hits.1._id: "4" } | ||
- match: { hits.hits.1._score: 7.0 } | ||
|
||
- do: | ||
search: | ||
index: test | ||
body: | ||
retriever: | ||
rescorer: | ||
rescore: | ||
window_size: 4 | ||
query: | ||
rescore_query: | ||
rank_feature: | ||
field: "features.second_stage" | ||
linear: { } | ||
query_weight: 0 | ||
retriever: | ||
standard: | ||
filter: | ||
match: | ||
available: true | ||
query: | ||
rank_feature: | ||
field: "features.first_stage" | ||
linear: { } | ||
size: 2 | ||
|
||
- match: { hits.total.value: 6 } | ||
- match: { hits.hits.0._id: "5" } | ||
- match: { hits.hits.0._score: 6.0 } | ||
- match: { hits.hits.1._id: "7" } | ||
- match: { hits.hits.1._score: 4.0 } | ||
|
||
--- | ||
"Rescorer retriever and collapsing": | ||
- do: | ||
search: | ||
index: test | ||
body: | ||
retriever: | ||
rescorer: | ||
rescore: | ||
window_size: 10 | ||
query: | ||
rescore_query: | ||
rank_feature: | ||
field: "features.second_stage" | ||
linear: { } | ||
query_weight: 0 | ||
retriever: | ||
standard: | ||
query: | ||
rank_feature: | ||
field: "features.first_stage" | ||
linear: { } | ||
collapse: | ||
field: group | ||
size: 3 | ||
|
||
- match: { hits.total.value: 10 } | ||
- match: { hits.hits.0._id: "1" } | ||
- match: { hits.hits.0._score: 10.0 } | ||
- match: { hits.hits.1._id: "3" } | ||
- match: { hits.hits.1._score: 8.0 } | ||
- match: { hits.hits.2._id: "6" } | ||
- match: { hits.hits.2._score: 5.0 } | ||
|
||
--- | ||
"Rescorer retriever and invalid window size": | ||
- do: | ||
catch: "/\\[rescorer\\] requires \\[rank_window_size: 5\\] be greater than or equal to \\[size: 10\\]/" | ||
search: | ||
index: test | ||
body: | ||
retriever: | ||
rescorer: | ||
rescore: | ||
window_size: 5 | ||
query: | ||
rescore_query: | ||
rank_feature: | ||
field: "features.second_stage" | ||
linear: { } | ||
query_weight: 0 | ||
retriever: | ||
standard: | ||
query: | ||
rank_feature: | ||
field: "features.first_stage" | ||
linear: { } | ||
size: 10 |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we name this
window_size
to be in sync with the top-levelrescore
API? We have made a distinction on that and retrievers (where the top-level documents that a retriever operates upon are specified throughrank_window_size
), but I can see how that would make it more straightforward for users to move to the new framework.Either is fine though I guess :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I am re-using the rescorer builder parser to ease the integration. I like
window_size
better, what do you think of accepting both in theCompoundRetrieverBuilder
? I can rename in the documentation to consistently refer towindow_size
while still acceptingrank_window_size
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I thinks its fine accepting either argument but I'm not so sure about the documentation changes tbh. This was renamed in #106253, to differentiate at the time between rescore's window size (as one is applied on the shards and pagination/expected results could behave slightly differently). I don't have a strong preference however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am also fine keeping the distinction. The
window_size
here refers to rescorer so it would be inconsistent to name itrank_window_size
while the rescorer doc keeps the other name?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, ++ to keep it as
window_size
just for the rescorer retriever. (should we also accept rankWindowSize and/or properly handle the param references inCompoundRetrieverBuilder#validate
) ?