-
Notifications
You must be signed in to change notification settings - Fork 25.6k
KNN Query visit_percentage #133753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KNN Query visit_percentage #133753
Conversation
…ticsearch into query_visit_percentage
…ticsearch into query_visit_percentage
…ticsearch into query_visit_percentage
`confidence_interval` | ||
: (Optional, float) Only applicable to `int8_hnsw`, `int4_hnsw`, `int8_flat`, and `int4_flat` index types. The confidence interval to use when quantizing the vectors. Can be any value between and including `0.90` and `1.0` or exactly `0`. When the value is `0`, this indicates that dynamic quantiles should be calculated for optimized quantization. When between `0.90` and `1.0`, this value restricts the values used when calculating the quantization thresholds. For example, a value of `0.95` will only use the middle 95% of the values when calculating the quantization thresholds (e.g. the highest and lowest 2.5% of values will be ignored). Defaults to `1/(dims + 1)` for `int8` quantized vectors and `0` for `int4` for dynamic quantile calculation. | ||
|
||
`default_visit_percentage` {applies_to}`stack: ga 9.1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we didn't add these in a previous PR I just added them here but could be convinced to pull docs out to a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benwtrent how do you feel about the docs going in with the belief we are going to yank out the feature flag immediately here. Would you be more comfortable with me pulling all of these out to a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read through again, looking good. Just need to test null cases.
I left some comments on ones that I found, but then I figured, eh, that is too many comments. So I stopped.
Basically, anything that is a "random input" deal, should test the null case. Additionally, the dense vector type tests should verify that when the value is passed, it actually is applied to the query.
The "refresh" popped up on the screen in the middle of the commenting. Sorry if any of them are on an old commit :(
server/src/internalClusterTest/java/org/elasticsearch/search/query/RescoreKnnVectorQueryIT.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/index/mapper/vectors/DenseVectorFieldTypeTests.java
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/search/retriever/KnnRetrieverBuilderParsingTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/search/retriever/RankDocsRetrieverBuilderTests.java
Outdated
Show resolved
Hide resolved
...er/src/test/java/org/elasticsearch/search/vectors/AbstractKnnVectorQueryBuilderTestCase.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/search/vectors/KnnSearchBuilderTests.java
Outdated
Show resolved
Hide resolved
server/src/test/java/org/elasticsearch/search/vectors/KnnSearchBuilderTests.java
Outdated
Show resolved
Hide resolved
test/framework/src/main/java/org/elasticsearch/search/RandomSearchRequestGenerator.java
Outdated
Show resolved
Hide resolved
I think I got all of them? I'm double checking here and reviewing tests in general. But I believe I got fixed up what you saw. I appreciate the feedback too; that was super helpful. Still curious if you think I should punt the docs update out of this PR or not given that stuff is still behind a feature flag or if you feel like we'll take the flag out shortly and it won't matter. |
Let's remove the docs changes |
…ticsearch into query_visit_percentage
docs changes pulled out to here: #134082 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just make sure that the new tests you added/mutated all pass with various seeds.
I would also maybe do some release tests verification locally to make sure you handled the feature flag.
I am particularly wondering about things like the more general "RandomSearchRequestGenerator" change and if adding the value, then the generated object gets serialized to xcontent, and then parsed (which doesn't get the field if the flag isn't set) will be missing the object.
I don't know if we have tests that do that, but I know that by default release tests do not run in PRs.
I've tried to run all of the tests multiple times and on several seeds and not seen any problems. Tried breaking |
Updating the query interface for knn to include
visit_percentage
Usage example with a basic knn query (note this PR updates retrievers as well). This is setup such that
visit_percentage
overridesnum_candidates
forbbq_disk
: