-
Notifications
You must be signed in to change notification settings - Fork 25.5k
Rework RRF to be evaluated during rewrite phase #112648
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework RRF to be evaluated during rewrite phase #112648
Conversation
…rough_rewrite_phase
…rough_rewrite_phase
…rough_rewrite_phase
…:pmpailis/elasticsearch into rework_rrf_to_work_through_rewrite_phase
…rough_rewrite_phase
…csRetrieverBuilderTests.java Co-authored-by: Benjamin Trent <[email protected]>
… the newly introduced cluster feature
Merged all different yaml tests files in ff05d13 :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. barring any breaking changes decision around _rank
and how rrf
behavior has changed (though since its all tech-preview, I personally think this is OK).
int rank = 1; | ||
for (ScoreDoc scoreDoc : rrfRankResult) { | ||
final int findex = index; | ||
final int frank = rank; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let me be frank
.... 😂
x-pack/plugin/rank-rrf/src/test/java/org/elasticsearch/xpack/rank/rrf/RRFRankDocTests.java
Outdated
Show resolved
Hide resolved
.../rank-rrf/src/yamlRestTest/resources/rest-api-spec/test/rrf/350_rrf_retriever_pagination.yml
Show resolved
Hide resolved
…uring different values in mutation tests
@elasticmachine update branch |
@elasticmachine update branch |
@benwtrent and @jimczi a huge thank you for reviewing and all support throughout ❤️ |
💚 Backport successful
|
In this PR we introduce a different way to evaluate the
rrf
retriever, by leveraging the query rewrite step at the beginning of query execution.The main points of this PR are:
CompoundRetrieverBuilder
class that defines a class of retrievers that operate on the top results of other retriever(s) and combine their results appropriately. All nested retrievers, after being rewritten themselves, are executed asynchronously throughmsearch
and then the compound retriever is responsible for merging the top results from each sub-retriever through the abstractcombineInnerRetrieverResults
method.RankDocsRetrieverBuilder
when extracting to source. The idea now is that we enhance this to also compute topDocs for the innerRetrievers when eitherprofile
orexplain
are present.profile
andexplain
currently require for the inner retrievers to be evaluated twice. This is something that we can build upon and improve maybe in subsequent PRs.RankDocsSortField
to encoderank
andscore
for each result (now namedRankDocsAndScoreSortField
)The high level idea now is that all compound retrievers (including
rrf
) will try to rewrite themselves into a "simpler"RankDocsRetrieverBuilder
by rewriting and executing asynchronously all nested retrievers. This gives us the option to support any arbitrary nesting of retrievers, as well as more fine-grained search params likecollapse
,highlight
, and nested queries.If any of the subsearches fail, we fail the request and return a 5xx response.
Note: We still support running
rrf
rank withsub_searches
but this uses the old approach which is a bit different as no nesting is supported (explain/profile are also evaluated differently).