-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add query_vector_builder Support to Diversify Retriever
#139094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add query_vector_builder Support to Diversify Retriever
#139094
Conversation
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
…versify_retriever
|
FYI - I added a new YAML test at |
|
Pinging @elastic/search-eng (Team:SearchOrg) |
|
Pinging @elastic/search-relevance (Team:Search - Relevance) |
…versify_retriever
kderusso
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice start!
| Must be either an array of floats or a hex-encoded byte vector. | ||
| If you provide a `query_vector`, you cannot also provide a `query_vector_builder`. | ||
|
|
||
| `query_vector_builder` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't throw 🍅 at me 🙈
I think we should verify that we want the same query_vector_builder API here. There's some legacy API crust, for example the API asks for a model_id etc. and there's opportunity to modernize it with inference_id. This would be more work, though, so I'm not raising it as a blocker to the PR more as a discussion if that's something we want to bite off or if this is OK.
If we decide to go forward with query_vector_builder, perhaps not for this PR but the docs could use a bit of work here to define allowed parameters, and have something directly linkable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what you're saying - but I think it's a larger, and more discrete PR (as it also touches other users of the QueryVectorBuilder). The docs however are probably a good idea.
| && this.diversificationField.equals(other.diversificationField) | ||
| && Objects.equals(this.lambda, other.lambda) | ||
| && Objects.equals(this.queryVector, other.queryVector); | ||
| && ((queryVector == null && other.queryVector == null) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't Objects.equals take care of the null checks already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily in this case, as the query_vector is now a supplier, so we need to make sure the suppled value is equal, but if the supplier is null it'll throw when the .get() is issued.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: Typo in filename
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦
| @@ -0,0 +1,127 @@ | |||
| setup: | |||
| - requires: | |||
| cluster_features: [ "retriever.result_diversification_mmr" ] | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can re-use the cluster feature here, we need a new one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll double check. This was a carry over from the original YAML tests for the retriever, but may not be needed here...
| retriever: | ||
| knn: | ||
| field: "textvector" | ||
| query_vector: [ 0.5, 0.2, 0.4, 0.4 ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is using a query vector, not a query vector builder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's testing the query vector builder in the diversify retriever only - the inner retriever is irrelevant. But, I can change it.
| - length: { hits.hits: 3 } | ||
|
|
||
| --- | ||
| "Validate query_vector or query_vector_builder but not both": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also validate an incorrectly formatted query vector builder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be caught by the parser, but to be sure, I'll add a test in.
(DRAFT - needs tests and docs)
Adds
query_vector_buildersupport to the diversify retriever