Skip to content

Conversation

@afoucret
Copy link
Contributor

@afoucret afoucret commented Jul 11, 2025

Summary

Implements the TEXT_EMBEDDING function for ES|QL to generate dense vector embeddings from text using inference model.

Function Signature: TEXT_EMBEDDING(text: string, inference_id: string) -> dense_vector

Example Usage:

FROM documents 
| WHERE KNN(embedding_field, TEXT_EMBEDDING(content, "my-embedding-deployment"), 10)

Implementation Status

Completed in this PR:

  • TEXT_EMBEDDING_FUNCTION capability with snapshot build gating

  • Core Function Infrastructure

    • TextEmbedding function class with proper type validation and serialization
    • InferenceFunction interface for inference-based functions
    • Function registration in EsqlFunctionRegistry
  • Analysis of the inference function (validate existence and type of the inference endpoint)

    • Refactored pre-analysis, so it is able to collect inference ids form both Inference plans and inference function
    • Added validation for inference function in the analysis
  • Add a pre-optimizer async phase to the ES|QL query execution

  • Documentation generated from the annotations

  • Execute the inference in the pre-optimizer

  • Integration tests and end-to-end validation

🚧 TODO (Before Merge):

  • Better CSV tests
  • Integration tests

Notes

The function is enabled only in snapshot builds.
TEXT_EMBEDDING function is tracked into #131022

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.2.0 labels Jul 11, 2025
@afoucret afoucret marked this pull request as draft July 11, 2025 21:27
@github-actions
Copy link
Contributor

github-actions bot commented Jul 11, 2025

🔍 Preview links for changed docs

@afoucret afoucret changed the title [DRAFT] ESQL: Add EMBED_TEXT function for dense vector embeddings [DRAFT] ESQL: Add TEXT_EMBEDDING function for dense vector embeddings Jul 11, 2025
@afoucret afoucret force-pushed the esql-text-embedding-function branch 5 times, most recently from 8588320 to c30c0ec Compare July 17, 2025 13:50
@afoucret afoucret force-pushed the esql-text-embedding-function branch from c30c0ec to 63c5539 Compare July 17, 2025 15:09
@afoucret afoucret force-pushed the esql-text-embedding-function branch 4 times, most recently from 1e6b1ce to 94fdad1 Compare July 28, 2025 12:53
afoucret and others added 13 commits August 1, 2025 14:20
# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/analysis/Analyzer.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/vector/Knn.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/inference/InferenceResolver.java
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/inference/InferenceResolverTests.java
Implements the core evaluation logic for the TEXT_EMBEDDING function in ES|QL:
- Add InferenceFunctionEvaluator interface for all inference functions
- Implement TextEmbeddingFunctionEvaluator with support for float/byte/bit vectors
- Integration with InferenceRunner for async model execution
- Proper conversion of embedding results to DENSE_VECTOR data type
Integrates the TEXT_EMBEDDING function with the ESQL execution pipeline:
- Update PreOptimizer to handle TEXT_EMBEDDING function evaluation
- Add TextEmbedding function definition and type validation
- Integrate with InferenceServices for model execution
- Add comprehensive tests in PreOptimizerTests
- Update session and execution components for async function support
@afoucret afoucret force-pushed the esql-text-embedding-function branch from 94fdad1 to 71d591b Compare August 1, 2025 12:38
@afoucret
Copy link
Contributor Author

Replaced by #134573

@afoucret afoucret closed this Sep 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs:triage Requires assignment of a team area label v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants