-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Simplified linear and RRF retrievers #128633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplified linear and RRF retrievers #128633
Conversation
|
Hi @Mikep86, I've created a changelog YAML for you. |
- Do not allow querying remote indices - Handle when index wildcard pattern does not match any indices
| - do: | ||
| catch: bad_request | ||
| search: | ||
| index: test-index | ||
| body: | ||
| retriever: | ||
| linear: | ||
| fields: [ "text_1", "text_2" ] | ||
| query: "" | ||
|
|
||
| - contains: { error.root_cause.0.reason: "[linear] [query] cannot be empty" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should handle this by rewriting to a MatchNoneQueryBuilder instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to think about whether we want to do a match_none here or even a match_all if the query is not provided. Or even perhaps an exists query if fields are provided. We could look at how match_phrase supports a zero terms query.
However, I think all of that could be done as followup work and for this PR throwing is probably fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass - looks really good, nice work! 👏 I left a few commends and questions!
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Show resolved
Hide resolved
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Show resolved
Hide resolved
| - do: | ||
| catch: bad_request | ||
| search: | ||
| index: test-index | ||
| body: | ||
| retriever: | ||
| linear: | ||
| fields: [ "text_1", "text_2" ] | ||
| query: "" | ||
|
|
||
| - contains: { error.root_cause.0.reason: "[linear] [query] cannot be empty" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to think about whether we want to do a match_none here or even a match_all if the query is not provided. Or even perhaps an exists query if fields are provided. We could look at how match_phrase supports a zero terms query.
However, I think all of that could be done as followup work and for this PR throwing is probably fine.
...-rrf/src/yamlRestTest/resources/rest-api-spec/test/linear/20_linear_retriever_simplified.yml
Show resolved
Hide resolved
.../rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderTests.java
Show resolved
Hide resolved
.../rank-rrf/src/test/java/org/elasticsearch/xpack/rank/linear/LinearRetrieverBuilderTests.java
Show resolved
Hide resolved
| assertEquals("[linear] does not support the simplified query format when querying remote indices", iae.getMessage()); | ||
| } | ||
|
|
||
| private static ResolvedIndices createMockResolvedIndices( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some duplication between this and the RRF tests, could that be cleaned up a bit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100%, there is duplication here. We could probably refactor this into a base class with extensions for linear & rrf. I didn't focus on this immediately in the interest of getting a complete PR out first. Time permitting, we can refcator in this PR or save it for a quick follow-up after feature freeze.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it as a followup post FF
| var localIndicesMetadata = resolvedIndices.getConcreteLocalIndicesMetadata(); | ||
| if (localIndicesMetadata.size() > 1) { | ||
| throw new IllegalArgumentException( | ||
| "[" + NAME + "] does not support the simplified query format when querying multiple indices" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same nitpick here about error messages. Also, could some more of this type of checking be shared with the linear retriever instead of duplicated?
| + " }" | ||
| + " }" | ||
| + "}"; | ||
| String restContent = """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, much more readable 😁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you split this pull request in two? One for the simplified linear retriever and one for RRF.
I wouldn't expect RRF and linear to implement the same logic and I find it difficult to read since we don't really describe how it works in the description.
|
Closing this in favor of separate PRs for linear & RRF. Linear retriever PR is first: #129200 |
Adds simplified syntaxes for the
linearandrrfretrievers:This PR isn't quite complete as YAML tests for the
rrfretriever haven't been added yet. But everything else should be in a reviewable state.