-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL - Add semantic_text support for knn function #133806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ESQL - Add semantic_text support for knn function #133806
Conversation
…-semantic-text' into non-issue/esql-knn-support-semantic-text
import org.junit.ClassRule; | ||
|
||
@ThreadLeakFilters(filters = TestClustersThreadFilter.class) | ||
public class KnnSemanticTextIT extends KnnSemanticTextTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added both single and multi node ITs, that extend from a common superclass (used SeamnticMatchTestCase
as a template)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds a test inference endpoint for text embedding tasks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adds a semantic_text
field that uses a dense_vector
field
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
…-semantic-text' into non-issue/esql-knn-support-semantic-text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
/** | ||
* Tests kNN queries on semantic_text fields. Mostly checks errors on the data node that can't be checked in other tests. | ||
*/ | ||
public class KnnSemanticTextTestCase extends ESRestTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we should add test cases here for using knn over 2 different dense endpoints. Note that this may conflict with this open PR: #133675
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's not needed, as the SemanticQueryBuilder is not used on this approach - it's just knn being done over two dense_vector fields. The inference_id
mechanism for retrieving the embeddings from the text is not used in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work
just one comment!
|
||
private TypeResolution resolveField() { | ||
return isNotNull(field(), sourceText(), FIRST).and(isType(field(), dt -> dt == DENSE_VECTOR, sourceText(), FIRST, "dense_vector")); | ||
return isNotNull(field(), sourceText(), FIRST).and( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we also update:
Line 105 in cb33548
@Param(name = "field", type = { "dense_vector" }, description = "Field that the query will target.") Expression field, |
and regenerate the docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, done in d8ef102
…pport-semantic-text # Conflicts: # x-pack/plugin/esql/qa/testFixtures/src/main/resources/knn-function.csv-spec # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Knn.java # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
🔍 Preview links for changed docs |
ℹ️ Important: Docs version tagging👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version. We use applies_to tags to mark version-specific features and changes. Expand for a quick overviewWhen to use applies_to tags:✅ At the page level to indicate which products/deployments the content applies to (mandatory) What NOT to do:❌ Don't remove or replace information that applies to an older version 🤔 Need help?
|
knn
function should be used withsemantic_text
fields, provided that the inference service is a text embedding task.As
field_caps
returns text as data type forsemantic_text
fields, we need to check fortext
on the knn verification. The query will fail on the data node in case the underlying field mapping is not adense_vector
.Closes #132066