Skip to content
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
4c8db9e
Introducing InferenceFunctionEvaluator to allow folding of constant i…
afoucret Sep 30, 2025
28e1185
Fold text embedding function to a constant during logical plan pre-op…
afoucret Sep 30, 2025
ffb3ad4
Add CSV tests for the TEXT_EMBEDDING function.
afoucret Sep 30, 2025
0c9b903
Merge branch 'main' of https://github.com/elastic/elasticsearch into …
afoucret Oct 1, 2025
dd5ec23
Update the doc generation now that dense vector is enabled.
afoucret Oct 1, 2025
aaec4e1
lint
afoucret Oct 1, 2025
2674d6c
Merge branch 'main' into esql_text_embedding_function_evaluator
afoucret Oct 2, 2025
3976822
Merge branch 'main' into esql_text_embedding_function_evaluator
afoucret Oct 2, 2025
4872c0c
Merge branch 'main' into esql_text_embedding_function_evaluator
afoucret Oct 2, 2025
7f9e9f4
Merge branch 'main' of https://github.com/elastic/elasticsearch into …
afoucret Oct 3, 2025
69928fb
Update renamed capability dense_vector_field_type_released in CSV tests.
afoucret Oct 3, 2025
bf7d5df
Adding a CSV tests with fork.
afoucret Oct 3, 2025
8656357
Merge branch 'main' into esql_text_embedding_function_evaluator
afoucret Oct 6, 2025
029cee6
Merge branch 'main' into esql_text_embedding_function_evaluator
afoucret Oct 6, 2025
7e97208
Merge branch 'main' into esql_text_embedding_function_evaluator
afoucret Oct 6, 2025
5fc7461
Merge branch 'main' into esql_text_embedding_function_evaluator
afoucret Oct 6, 2025
a2822e3
Merge branch 'main' into esql_text_embedding_function_evaluator
afoucret Oct 6, 2025
2b45433
Fixing flakiness in CSV tests.
afoucret Oct 7, 2025
f475dbb
Merge branch 'main' into esql_text_embedding_function_evaluator
afoucret Oct 7, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
import static org.elasticsearch.xpack.esql.action.EsqlCapabilities.Cap.RERANK;
import static org.elasticsearch.xpack.esql.action.EsqlCapabilities.Cap.SEMANTIC_TEXT_FIELD_CAPS;
import static org.elasticsearch.xpack.esql.action.EsqlCapabilities.Cap.SOURCE_FIELD_MAPPING;
import static org.elasticsearch.xpack.esql.action.EsqlCapabilities.Cap.TEXT_EMBEDDING_FUNCTION;
import static org.elasticsearch.xpack.esql.qa.rest.RestEsqlTestCase.assertNotPartial;
import static org.elasticsearch.xpack.esql.qa.rest.RestEsqlTestCase.hasCapabilities;

Expand Down Expand Up @@ -205,7 +206,8 @@ protected boolean requiresInferenceEndpoint() {
SEMANTIC_TEXT_FIELD_CAPS.capabilityName(),
RERANK.capabilityName(),
COMPLETION.capabilityName(),
KNN_FUNCTION_V5.capabilityName()
KNN_FUNCTION_V5.capabilityName(),
TEXT_EMBEDDING_FUNCTION.capabilityName()
).anyMatch(testCase.requiredCapabilities::contains);
}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,15 +1,68 @@
placeholder
text_embedding using a row source operator
required_capability: text_embedding_function
required_capability: not_existing_capability
required_capability: dense_vector_field_type

// tag::embedding-eval[]
ROW input="Who is Victor Hugo?"
| EVAL embedding = TEXT_EMBEDDING("Who is Victor Hugo?", "test_dense_inference")
;
// end::embedding-eval[]

input:keyword | embedding:dense_vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we get a test with multiple text_embedding calls with different query strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a CSV test with FORK that should be covering your ask:

FROM semantic_text METADATA _score
| FORK (EVAL query_embedding = TEXT_EMBEDDING("be excellent to each other", "test_dense_inference") | WHERE KNN(semantic_text_dense_field, query_embedding))
       (EVAL query_embedding = TEXT_EMBEDDING("live long and prosper", "test_dense_inference") | WHERE KNN(semantic_text_dense_field, query_embedding))
| KEEP semantic_text_field, query_embedding, _score, _fork
| EVAL _score = ROUND(_score, 4)
| SORT _score DESC
| LIMIT 10
;

semantic_text_field:text                                              | query_embedding:dense_vector | _fork:keyword | _score:double
be excellent to each other                                            | [45.0, 55.0, 54.0]           | fork1         | 1.0
live long and prosper                                                 | [50.0, 57.0, 56.0]           | fork2         | 1.0
be excellent to each other                                            | [50.0, 57.0, 56.0]           | fork2         | 0.0295
live long and prosper                                                 | [45.0, 55.0, 54.0]           | fork1         | 0.0295
all we have to decide is what to do with the time that is given to us | [45.0, 55.0, 54.0]           | fork1         | 0.0214
all we have to decide is what to do with the time that is given to us | [50.0, 57.0, 56.0]           | fork2         | 0.0109

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plus it tests inference function in the context of fork

Who is Victor Hugo? | [56.0, 50.0, 48.0]
;


text_embedding using a row source operator with query build using CONCAT
required_capability: text_embedding_function
required_capability: dense_vector_field_type

ROW input="Who is Victor Hugo?"
| EVAL embedding = TEXT_EMBEDDING(CONCAT("Who is ", "Victor Hugo?"), "test_dense_inference")
;

input:keyword | embedding:dense_vector
Who is Victor Hugo? | [56.0, 50.0, 48.0]
;


text_embedding with knn on semantic_text_dense_field
required_capability: text_embedding_function
required_capability: dense_vector_field_type
required_capability: knn_function_v5
required_capability: semantic_text_field_caps

FROM semantic_text METADATA _score
| EVAL query_embedding = TEXT_EMBEDDING("be excellent to each other", "test_dense_inference")
| WHERE KNN(semantic_text_dense_field, query_embedding)
| KEEP semantic_text_field, query_embedding, _score
| EVAL _score = ROUND(_score, 4)
| SORT _score DESC
| LIMIT 10
;

semantic_text_field:text | query_embedding:dense_vector | _score:double
be excellent to each other | [45.0, 55.0, 54.0] | 1.0
live long and prosper | [45.0, 55.0, 54.0] | 0.0295
all we have to decide is what to do with the time that is given to us | [45.0, 55.0, 54.0] | 0.0214
;

text_embedding with knn (inline) on semantic_text_dense_field
required_capability: text_embedding_function
required_capability: dense_vector_field_type
required_capability: knn_function_v5
required_capability: semantic_text_field_caps

FROM semantic_text METADATA _score
| WHERE KNN(semantic_text_dense_field, TEXT_EMBEDDING("be excellent to each other", "test_dense_inference"))
| KEEP semantic_text_field, _score
| EVAL _score = ROUND(_score, 4)
| SORT _score DESC
| LIMIT 10
;

semantic_text_field:text | _score:double
be excellent to each other | 1.0
live long and prosper | 0.0295
all we have to decide is what to do with the time that is given to us | 0.0214
;
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ public void esql(
indexResolver,
enrichPolicyResolver,
preAnalyzer,
new LogicalPlanPreOptimizer(new LogicalPreOptimizerContext(foldContext)),
new LogicalPlanPreOptimizer(new LogicalPreOptimizerContext(foldContext, services.inferenceService())),
functionRegistry,
new LogicalPlanOptimizer(new LogicalOptimizerContext(cfg, foldContext)),
mapper,
Expand Down
Loading
Loading