Skip to content

Conversation

@carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Jul 4, 2025

tracked in #130828
Implements CosineSimilarityFunction for ES|QL, and adds basic infrastructure for other vector similarity functions.

FROM colors
 | EVAL similarity = V_COSINE(rgb_vector, [0, 255, 255]) 
 | SOR similarity DESC

Adds a base class, VectorSimilarityFunction, that provides the building block for vector similarity functions.

There are pending validations that should be done for the function parameters:

  • Same number of dimensions
  • Same underlying dense_vector field type

We can work on these validations as follow ups, as they may depend on field_caps API returning that information.

@carlosdelest carlosdelest added >non-issue :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0 labels Jul 4, 2025
FloatBlock leftBlock = (FloatBlock) left.get(context).eval(page);
FloatBlock rightBlock = (FloatBlock) right.get(context).eval(page)
) {
int positionCount = page.getPositionCount();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrisHegarty I'm wondering if this is the right way to provide an evaluation for dense_vector based operations. Besides vector similarity functions, we will create vector operations (add, substract, dot product, etc).

Do you think we should create the necessary infrastructure for template based evaluators, or should having this ad-hoc evaluation work?

Is there anything we should be careful about when doing ad-hoc evaluation for vectorization purposes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is fine. The vectorization that we are looking for here is in the comparison operation itself, so when comparing float[]'s. Ultimately tho, we would want to be able to compare against mmap'ed off-heap data, but that is completely separate and can come later - since it would require a block backed by a memory segment. We had similar(ish), though different, with big array blocks. Would need to re-check the details.

@carlosdelest carlosdelest changed the title ESQL: Basic infrastructure for dense vector similarity functions ESQL: dense_vector cosine similarity function Jul 7, 2025
if (f instanceof In in) {
return processIn(in);
}
if (f instanceof VectorFunction) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needed to change the order to ensure VectorFunction are processed first, as similarity functions are scalar functions as well

required_capability: cosine_vector_similarity_function

row vector = [1, 2, 3]
| eval similarity = round(v_cosine(vector, [0, 1, 2]), 3)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this to work properly, we need to implement a conversion function so we can convert non-foldable values to dense_vector.

@carlosdelest carlosdelest marked this pull request as ready for review July 7, 2025 11:58
@elasticsearchmachine elasticsearchmachine added Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) and removed Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch labels Jul 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@carlosdelest carlosdelest requested a review from tteofili July 7, 2025 11:58
Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @carlosdelest !

/**
* Defines the named writables for vector functions in ESQL.
*/
public final class VectorWritables {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need this utility class just yet, but I'll assume you have plans to add more :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yeah, it's a bit premature yet - but we will be adding a number of vector similarity functions soon enough, and I wanted to provide places where it would be easy to look for them.

}
var wrapper = BlockUtils.wrapperFor(blockFactory, ElementType.fromJava(multiValue.get(0).getClass()), positions);
// dense_vector create internally float values, even if they are specified as doubles
ElementType elementType = lit.dataType() == DataType.DENSE_VECTOR
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this logic be in its own method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say no as this is a one-liner for getting the correct ElementType - there's no more logic than doing a specific check for dense_vector. I'd say, ff more special cases come into play then let's add it as it will become confusing.


import static org.apache.lucene.index.VectorSimilarityFunction.COSINE;

public class CosineSimilarity extends VectorSimilarityFunction {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to subclass different types of functions here? Why not just have an enum which specifies the type in VectorSimilarityFunction?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I think this aligns better with the current way ESQL functions work. I'm not sure that docs generation work with enums as of now as well.

Happy to review this when adding more functions though!

Copy link
Contributor

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we missing the docs that will be generated for the v_cosine function?
otherwise LGTM

@github-actions
Copy link
Contributor

github-actions bot commented Jul 8, 2025

🔍 Preview links for changed docs

/**
* Base class for vector similarity functions, which compute a similarity score between two dense vectors
*/
public abstract class VectorSimilarityFunction extends BinaryScalarFunction implements EvaluatorMapper, VectorFunction {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now VectorSimilarityFunction extends BinaryScalarFunction. That brings some simplifications to the code as we already have two params.

public void testDenseVectorImplicitCastingSimilarityFunctions() {
if (EsqlCapabilities.Cap.COSINE_VECTOR_SIMILARITY_FUNCTION.isEnabled()) {
checkDenseVectorImplicitCastingSimilarityFunction("v_cosine(vector, [0.342, 0.164, 0.234])", List.of(0.342f, 0.164f, 0.234f));
checkDenseVectorImplicitCastingSimilarityFunction("v_cosine(vector, [1, 2, 3])", List.of(1f, 2f, 3f));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checks casting is done for non-float values, and creates a float Literal

import static org.elasticsearch.xpack.esql.core.type.DataType.DOUBLE;
import static org.hamcrest.Matchers.equalTo;

public abstract class AbstractVectorSimilarityFunctionTestCase extends AbstractScalarFunctionTestCase {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New test case added that extends AbstractScalarFunctionTestCase. This brings quite a few tests like checking what happens with null values, evaluator type checks, etc.

import java.util.function.Supplier;

@FunctionName("v_cosine")
public class CosineSimilarityTests extends AbstractVectorSimilarityFunctionTestCase {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New functions test cases should be simple, all the heavy lifting is done in the abstract class

@carlosdelest
Copy link
Member Author

are we missing the docs that will be generated for the v_cosine function?

@ioanatia 🤦 yes we were. There were no AbstractFunctionTestCase test. I just added that along with some changes to implicit casting. Thanks!

@carlosdelest carlosdelest requested a review from ioanatia July 8, 2025 12:05
@carlosdelest carlosdelest enabled auto-merge (squash) July 8, 2025 14:58
@carlosdelest carlosdelest merged commit f1ddd4c into elastic:main Jul 15, 2025
32 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants