-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Simplified Linear Retriever #129200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplified Linear Retriever #129200
Conversation
|
Hi @Mikep86, I've created a changelog YAML for you. |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Pinging @elastic/search-eng (Team:SearchOrg) |
|
Pinging @elastic/search-relevance (Team:Search - Relevance) |
server/src/main/java/org/elasticsearch/search/retriever/CompoundRetrieverBuilder.java
Outdated
Show resolved
Hide resolved
...lugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java
Show resolved
Hide resolved
...lugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java
Show resolved
Hide resolved
| RetrieverBuilder rewritten = this; | ||
|
|
||
| ResolvedIndices resolvedIndices = ctx.getResolvedIndices(); | ||
| if (resolvedIndices != null && query != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can still add a global prefilter on the top-level retriever, right? Should we also account for it when generating the expansions or is this something not to be supported?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should work as is since rewriting with pre-filters happens after this rewrite logic. I can add a test to confirm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of things to follow up on here:
- There are a whole class of bugs in the retriever framework right now caused by incomplete copies during rewrite. We should adopt a copy constructor approach (similar to how queries handle copy generation) to address all of these issues. This is outside the scope of this PR though, so I did the quick fix for now.
- I would have loved to add a unit test that has more access to the retriever structure to verify that the filters are being propagated as pre-filters, but that is difficult to do right now with the current rewrite logic. We rewrite all the way to a
RankDocsRetrieverBuilder, with no stopping cue for when all the pre-search rewrite activities (such as pre-filter propagation) are complete. It would help a lot with testability if we could refactor the rewrite process to add such a stopping cue. In the meantime, I added a YAML test to cover this, but that doesn't give us visibility into whether the filters are applied as actual pre-filters.
...rrf/src/main/java/org/elasticsearch/xpack/rank/simplified/SimplifiedInnerRetrieverUtils.java
Show resolved
Hide resolved
|
Just finished a first pass, seems really nice :) Will we also include doc changes & additional examples in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some minor feedback
...lugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilder.java
Outdated
Show resolved
Hide resolved
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Outdated
Show resolved
Hide resolved
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Outdated
Show resolved
Hide resolved
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Show resolved
Hide resolved
Yes, all that will come in a follow-up docs-focused PR :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work 👏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the approach, @Mikep86, it’s less invasive than I expected! 😉
I do have some concerns about the growing number of parameters in the linear retriever that aren't compatible with each other, but that's an inherent trade-off with the decision to add this functionality there, so I’m comfortable with it.
...rrf/src/main/java/org/elasticsearch/xpack/rank/simplified/SimplifiedInnerRetrieverUtils.java
Outdated
Show resolved
Hide resolved
| - match: { hits.hits.1._id: "2" } | ||
| - lte: { hits.hits.1._score: 2.0 } | ||
| - match: { hits.hits.2._id: "1" } | ||
| - lte: { hits.hits.2._score: 2.0 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add a test with a filter? We need to make sure that the filter is propagated correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, working on a unit test to verify pre-filter propagation now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...rf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderParsingTests.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work @Mikep86 ❤️ Only minor comment is the prefilter tests addition, but other than that it looks really nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| @@ -0,0 +1,5 @@ | |||
| pr: 129200 | |||
| summary: Simplified Linear Retriever | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be updated? Probably something like Add simplified syntax and hybrid support to linear retriever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this description is still succinct and accurate, good as is
💔 Backport failed
You can use sqren/backport to manually backport by running |
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
(cherry picked from commit fc77640) # Conflicts: # x-pack/plugin/inference/build.gradle # x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/RankRRFFeatures.java # x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRankBuilder.java
(cherry picked from commit fc77640) # Conflicts: # x-pack/plugin/inference/build.gradle # x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/RankRRFFeatures.java # x-pack/plugin/rank-rrf/src/main/java/org/elasticsearch/xpack/rank/rrf/RRFRankBuilder.java
Adds a simplified syntax for the
linearretriever:fieldsis optional. If it is not provided, we query the fields defined by theindex.query.default_fieldindex setting (which is*by default).This syntax automatically handles querying a mix of lexical fields (i.e. fields that support lexical search via
match) andsemantic_textfields. The fields are divided into lexical and semantic groups to create a 50/50 weight distribution between the two in the final score. This is achieved by creating a retriever tree that looks like:The end result is a score that ranges between 0-2, with up to 1 coming from the lexical matches and up to 1 coming from the semantic matches.
Common logic for generating the retriever tree is in
SimplifiedInnerRetrieverUtils, which will also be used by the simplifiedrrfretriever (see #128633 for a preview of that).