-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Implement v_hamming #132959
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement v_hamming #132959
Conversation
🔍 Preview links for changed docs |
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
|
Hi @svilen-mihaylov-elastic, I've created a changelog YAML for you. |
| } | ||
|
|
||
| public static float calculateSimilarity(float[] leftScratch, float[] rightScratch) { | ||
| byte[] a = new byte[leftScratch.length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Core of change. We assume here the floats as in range (0, 256), convert to byte vectors, and do the same as in ES815BitFlatVectorsFormat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per feedback, returning raw distance, not normalized between 0.0 and 1.0 as above.
leemthompo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple docs things: clearer release note text + need to fix applies_to :)
...ugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Hamming.java
Outdated
Show resolved
Hide resolved
| @Param( | ||
| name = "left", | ||
| type = { "dense_vector" }, | ||
| description = "first dense_vector to calculate hamming distance between" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be "first dense_vector to calculate Hamming distance"
the same for the other param.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we could be even more concise here, considering the context that the inputs are for the Hamming distance calculation is already clear?
For example:
First input vector
Second input vector
Since the type already specifies dense_vector, maybe the param description doesn't need to repeat it either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can update to the shortened version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leemthompo Following a quick conversation with @ioanatia, it's probably better to update the descriptions of this and other similarity functions together for consistency's sake. Have you had a chance to look at for example L1_Norm and L2_Norm? This CR is simply following the same pattern. If that's fine with you, we can start a separate CR with your proposed changes across all similarity functions. How does this sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @FunctionInfo( | ||
| returnType = "double", | ||
| preview = true, | ||
| description = "Calculates the hamming distance between two dense_vectors.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, but this should be Hamming
| for (int i = 0; i < leftScratch.length; i++) { | ||
| b[i] = (byte) rightScratch[i]; | ||
| } | ||
| return ((a.length * Byte.SIZE) - VectorUtil.xorBitCount(a, b)) / (float) (a.length * Byte.SIZE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need to return just VectorUtil.xorBitCount(a, b)
...ugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Hamming.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Liam Thompson <[email protected]>
…expression/function/vector/Hamming.java Co-authored-by: Liam Thompson <[email protected]>
…expression/function/vector/Hamming.java Co-authored-by: Liam Thompson <[email protected]>
tteofili
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Implements #132056