Skip to content

Conversation

@svilen-mihaylov-elastic
Copy link
Contributor

@svilen-mihaylov-elastic svilen-mihaylov-elastic commented Aug 14, 2025

Implements #132056

@github-actions
Copy link
Contributor

github-actions bot commented Aug 14, 2025

🔍 Preview links for changed docs

@svilen-mihaylov-elastic svilen-mihaylov-elastic added :Search Relevance/ES|QL Search functionality in ES|QL >feature priority:normal A label for assessing bug priority to be used by ES engineers labels Aug 15, 2025
@svilen-mihaylov-elastic svilen-mihaylov-elastic marked this pull request as ready for review August 15, 2025 13:04
@elasticsearchmachine elasticsearchmachine added the Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch label Aug 15, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

@elasticsearchmachine
Copy link
Collaborator

Hi @svilen-mihaylov-elastic, I've created a changelog YAML for you.

@svilen-mihaylov-elastic svilen-mihaylov-elastic requested a review from a team August 15, 2025 13:05
}

public static float calculateSimilarity(float[] leftScratch, float[] rightScratch) {
byte[] a = new byte[leftScratch.length];
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Core of change. We assume here the floats as in range (0, 256), convert to byte vectors, and do the same as in ES815BitFlatVectorsFormat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per feedback, returning raw distance, not normalized between 0.0 and 1.0 as above.

Copy link
Contributor

@leemthompo leemthompo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple docs things: clearer release note text + need to fix applies_to :)

@Param(
name = "left",
type = { "dense_vector" },
description = "first dense_vector to calculate hamming distance between"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be "first dense_vector to calculate Hamming distance"
the same for the other param.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could be even more concise here, considering the context that the inputs are for the Hamming distance calculation is already clear?

For example:

First input vector
Second input vector

Since the type already specifies dense_vector, maybe the param description doesn't need to repeat it either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can update to the shortened version.

Copy link
Contributor Author

@svilen-mihaylov-elastic svilen-mihaylov-elastic Aug 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leemthompo Following a quick conversation with @ioanatia, it's probably better to update the descriptions of this and other similarity functions together for consistency's sake. Have you had a chance to look at for example L1_Norm and L2_Norm? This CR is simply following the same pattern. If that's fine with you, we can start a separate CR with your proposed changes across all similarity functions. How does this sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svilen-mihaylov-elastic SGTM

  • A lot of these functions aren't actually published yet, remember you'll need to include them in the relevant functions page like this
  • You'll also need to add the specific functions to the relevant lists
  • Also this comment applies to all of these functions.

@FunctionInfo(
returnType = "double",
preview = true,
description = "Calculates the hamming distance between two dense_vectors.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but this should be Hamming

for (int i = 0; i < leftScratch.length; i++) {
b[i] = (byte) rightScratch[i];
}
return ((a.length * Byte.SIZE) - VectorUtil.xorBitCount(a, b)) / (float) (a.length * Byte.SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to return just VectorUtil.xorBitCount(a, b)

…expression/function/vector/Hamming.java

Co-authored-by: Liam Thompson <[email protected]>
…expression/function/vector/Hamming.java

Co-authored-by: Liam Thompson <[email protected]>
Copy link
Contributor

@tteofili tteofili left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@svilen-mihaylov-elastic svilen-mihaylov-elastic merged commit bcfd399 into elastic:main Aug 19, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

>feature priority:normal A label for assessing bug priority to be used by ES engineers :Search Relevance/ES|QL Search functionality in ES|QL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants