-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: dense_vector cosine similarity function #130641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL: dense_vector cosine similarity function #130641
Conversation
| FloatBlock leftBlock = (FloatBlock) left.get(context).eval(page); | ||
| FloatBlock rightBlock = (FloatBlock) right.get(context).eval(page) | ||
| ) { | ||
| int positionCount = page.getPositionCount(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ChrisHegarty I'm wondering if this is the right way to provide an evaluation for dense_vector based operations. Besides vector similarity functions, we will create vector operations (add, substract, dot product, etc).
Do you think we should create the necessary infrastructure for template based evaluators, or should having this ad-hoc evaluation work?
Is there anything we should be careful about when doing ad-hoc evaluation for vectorization purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is fine. The vectorization that we are looking for here is in the comparison operation itself, so when comparing float[]'s. Ultimately tho, we would want to be able to compare against mmap'ed off-heap data, but that is completely separate and can come later - since it would require a block backed by a memory segment. We had similar(ish), though different, with big array blocks. Would need to re-check the details.
| if (f instanceof In in) { | ||
| return processIn(in); | ||
| } | ||
| if (f instanceof VectorFunction) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needed to change the order to ensure VectorFunction are processed first, as similarity functions are scalar functions as well
| required_capability: cosine_vector_similarity_function | ||
|
|
||
| row vector = [1, 2, 3] | ||
| | eval similarity = round(v_cosine(vector, [0, 1, 2]), 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this to work properly, we need to implement a conversion function so we can convert non-foldable values to dense_vector.
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @carlosdelest !
| /** | ||
| * Defines the named writables for vector functions in ESQL. | ||
| */ | ||
| public final class VectorWritables { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we need this utility class just yet, but I'll assume you have plans to add more :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha yeah, it's a bit premature yet - but we will be adding a number of vector similarity functions soon enough, and I wanted to provide places where it would be easy to look for them.
| } | ||
| var wrapper = BlockUtils.wrapperFor(blockFactory, ElementType.fromJava(multiValue.get(0).getClass()), positions); | ||
| // dense_vector create internally float values, even if they are specified as doubles | ||
| ElementType elementType = lit.dataType() == DataType.DENSE_VECTOR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this logic be in its own method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say no as this is a one-liner for getting the correct ElementType - there's no more logic than doing a specific check for dense_vector. I'd say, ff more special cases come into play then let's add it as it will become confusing.
|
|
||
| import static org.apache.lucene.index.VectorSimilarityFunction.COSINE; | ||
|
|
||
| public class CosineSimilarity extends VectorSimilarityFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to subclass different types of functions here? Why not just have an enum which specifies the type in VectorSimilarityFunction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point - I think this aligns better with the current way ESQL functions work. I'm not sure that docs generation work with enums as of now as well.
Happy to review this when adding more functions though!
…ch-functions-basics' into non-issue/esql-vector-search-functions-basics
…-search-functions-basics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we missing the docs that will be generated for the v_cosine function?
otherwise LGTM
…milarityFunction to extend BinaryScalarFunction
🔍 Preview links for changed docs |
| /** | ||
| * Base class for vector similarity functions, which compute a similarity score between two dense vectors | ||
| */ | ||
| public abstract class VectorSimilarityFunction extends BinaryScalarFunction implements EvaluatorMapper, VectorFunction { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now VectorSimilarityFunction extends BinaryScalarFunction. That brings some simplifications to the code as we already have two params.
| public void testDenseVectorImplicitCastingSimilarityFunctions() { | ||
| if (EsqlCapabilities.Cap.COSINE_VECTOR_SIMILARITY_FUNCTION.isEnabled()) { | ||
| checkDenseVectorImplicitCastingSimilarityFunction("v_cosine(vector, [0.342, 0.164, 0.234])", List.of(0.342f, 0.164f, 0.234f)); | ||
| checkDenseVectorImplicitCastingSimilarityFunction("v_cosine(vector, [1, 2, 3])", List.of(1f, 2f, 3f)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checks casting is done for non-float values, and creates a float Literal
| import static org.elasticsearch.xpack.esql.core.type.DataType.DOUBLE; | ||
| import static org.hamcrest.Matchers.equalTo; | ||
|
|
||
| public abstract class AbstractVectorSimilarityFunctionTestCase extends AbstractScalarFunctionTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New test case added that extends AbstractScalarFunctionTestCase. This brings quite a few tests like checking what happens with null values, evaluator type checks, etc.
| import java.util.function.Supplier; | ||
|
|
||
| @FunctionName("v_cosine") | ||
| public class CosineSimilarityTests extends AbstractVectorSimilarityFunctionTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New functions test cases should be simple, all the heavy lifting is done in the abstract class
@ioanatia 🤦 yes we were. There were no |
…-search-functions-basics
…ch-functions-basics' into non-issue/esql-vector-search-functions-basics
tracked in #130828
Implements
CosineSimilarityFunctionfor ES|QL, and adds basic infrastructure for other vector similarity functions.Adds a base class,
VectorSimilarityFunction, that provides the building block for vector similarity functions.There are pending validations that should be done for the function parameters:
We can work on these validations as follow ups, as they may depend on field_caps API returning that information.