Skip to content

Conversation

carlosdelest
Copy link
Member

@carlosdelest carlosdelest commented Aug 29, 2025

knn function should be used with semantic_text fields, provided that the inference service is a text embedding task.

As field_caps returns text as data type for semantic_text fields, we need to check for text on the knn verification. The query will fail on the data node in case the underlying field mapping is not a dense_vector.

Closes #132066

@carlosdelest carlosdelest added >non-issue :Analytics/ES|QL AKA ESQL Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0 :Search Relevance/ES|QL Search functionality in ES|QL labels Aug 29, 2025
import org.junit.ClassRule;

@ThreadLeakFilters(filters = TestClustersThreadFilter.class)
public class KnnSemanticTextIT extends KnnSemanticTextTestCase {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added both single and multi node ITs, that extend from a common superclass (used SeamnticMatchTestCase as a template)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a test inference endpoint for text embedding tasks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adds a semantic_text field that uses a dense_vector field

@carlosdelest carlosdelest marked this pull request as ready for review September 1, 2025 16:12
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-search-relevance (Team:Search Relevance)

Copy link
Member

@kderusso kderusso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

/**
* Tests kNN queries on semantic_text fields. Mostly checks errors on the data node that can't be checked in other tests.
*/
public class KnnSemanticTextTestCase extends ESRestTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add test cases here for using knn over 2 different dense endpoints. Note that this may conflict with this open PR: #133675

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not needed, as the SemanticQueryBuilder is not used on this approach - it's just knn being done over two dense_vector fields. The inference_id mechanism for retrieving the embeddings from the text is not used in this function.

Copy link
Contributor

@ioanatia ioanatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work
just one comment!


private TypeResolution resolveField() {
return isNotNull(field(), sourceText(), FIRST).and(isType(field(), dt -> dt == DENSE_VECTOR, sourceText(), FIRST, "dense_vector"));
return isNotNull(field(), sourceText(), FIRST).and(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also update:

@Param(name = "field", type = { "dense_vector" }, description = "Field that the query will target.") Expression field,

and regenerate the docs?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, done in d8ef102

…pport-semantic-text

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/knn-function.csv-spec
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Knn.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/analysis/VerifierTests.java
Copy link
Contributor

github-actions bot commented Sep 3, 2025

🔍 Preview links for changed docs

Copy link
Contributor

github-actions bot commented Sep 3, 2025

ℹ️ Important: Docs version tagging

👋 Thanks for updating the docs! Just a friendly reminder that our docs are now cumulative. This means all 9.x versions are documented on the same page and published off of the main branch, instead of creating separate pages for each minor version.

We use applies_to tags to mark version-specific features and changes.

Expand for a quick overview

When to use applies_to tags:

✅ At the page level to indicate which products/deployments the content applies to (mandatory)
✅ When features change state (e.g. preview, ga) in a specific version
✅ When availability differs across deployments and environments

What NOT to do:

❌ Don't remove or replace information that applies to an older version
❌ Don't add new information that applies to a specific version without an applies_to tag
❌ Don't forget that applies_to tags can be used at the page, section, and inline level

🤔 Need help?

@carlosdelest carlosdelest enabled auto-merge (squash) September 3, 2025 12:29
@carlosdelest carlosdelest merged commit b5a3e17 into elastic:main Sep 4, 2025
33 checks passed
elasticsearchmachine pushed a commit that referenced this pull request Sep 6, 2025
After merging two KNN PRs, the release tests started failing. This fixes
those tests.

Original PRs: * #133806 *
#133753
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue :Search Relevance/ES|QL Search functionality in ES|QL Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ES|QL: Add support for semantic_text in knn function

4 participants