ESQL: Add asynchronous pre-optimization step for logical plan #131440

afoucret · 2025-07-17T12:28:32Z

Description

This PR adds a new asynchronous pre-optimization step to the ES|QL logical plan execution pipeline.
The pre-optimization step is positioned between the Analyzer and the Optimizer, allowing for asynchronous operations to be performed before once the logical plan is analyzed and before it is optimized.

Context / Use case

This infrastructure is required for the TEXT_EMBEDDING function implementation (issue #131022).
By evaluating text embeddings before query optimization, we ensure they benefit from all subsequent constant. optimizations.

The PreMapper was originally place before theOptimizer but is has been moved after, so it can be used for this purpose.

Key Changes

Added a new PRE_OPTIMIZED stage in the LogicalPlan
Created LogicalPlanPreOptimizer class for handling asynchronous pre-optimization
Created LogicalPreOptimizerContext to support the pre-optimization process
Updated EsqlSession to include the pre-optimization step in the execution flow

elasticsearchmachine · 2025-07-17T12:28:56Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

benchmarks/src/main/java/org/elasticsearch/benchmark/_nightly/esql/QueryPlanningBenchmark.java

.../esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanPreOptimizerTests.java

afoucret · 2025-07-18T13:52:04Z

@astefan Got a couple of CI failures caused by the branch being outdated but all is green right now.

astefan

Few notes and comments:

The text_embedding function is currently the main recipient of this PR at this time. It is supposed to call an external service and create what is, essentially, an array of floats that can be later used in the query
due to its nature, the performance of the text_embedding function mainly depends on the external service availability, which can be a bottleneck
because this step resembles the one of the enrich policies discovery or lookup indices resolution, it was decided that resolving the text_embedding output value is best placed in the overall "discovery" (index resolution, analyzer, logical optimizer, physical optimizer) set of steps. This set of steps is performed on the coordinator node
another argument in favor of using the pre-optimizer step is the one of using the Literal that comes out of the text_embedding function in further optimization rules from the LogicalPlanOptimizer. I personally do not believe this as a strong argument. There could be a similar optimization step on each data node.
also, calling the external service has a cost associated with it. One of the arguments in favor of calling this service on the coordinator node only is that the cost is greatly reduced this way. It is essentially one call. There are some downside to this decision:
- the coordinator becomes a bottleneck
- some steps that still validate the correctness of the query (everything that comes after the pre-optimization step - logical optimizer, physical optimizer, local logical optimizer, local physical optimizer) can wait unnecessarily a long time before replying back to the user with a potentially invalid query response
- any shard/index specific optimizations cannot be applied. For example, if the field that is used in the text_embedding function has a null value or no value on some of the shards, I am assuming that calling the external service for that specific value/field is unnnecessary.
I think it is OK to start with this pre-optimizer step only for constant values (for example TEXT_EMBEDDING("Who is Victor Hugo?", "test_dense_inference") only from the point of view of calling the external service only once
- there is also a LocalLogicalPlanOptimizer. If there is any shard/index specific behavior my preference would be to to do this "external service" call from each "relevant" data node or to have a heuristic logic that decides which approach is better: one call from coordinator or multiple calls from each data node.
I am not convinced that the EsqlSession should change the way it's handling the Listeners while calling the entire flow of analyzer, logical optimizer, physical optimizer, but I understand the need to have this call async. If it doesn't break any existent tests/behavior it's ok with me.

astefan

LGTM

…king * upstream/main: (100 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...

…-tracking * upstream/main: (44 commits) Term vector API on stateless search nodes (elastic#129902) TEST Fix ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp (elastic#131636) Add inference.put_custom rest-api-spec (elastic#131660) ESQL: Fewer serverless docs in tests (elastic#131651) Skip search on indices with INDEX_REFRESH_BLOCK (elastic#129132) Mute org.elasticsearch.indices.cluster.RemoteSearchForceConnectTimeoutIT testTimeoutSetting elastic#131656 [jdk] Resolve EA OpenJDK builds to our JDK archive (elastic#131237) Add optimized path for intermediate values aggregator (elastic#131390) Correctly handling download_database_on_pipeline_creation within a pipeline processor within a default or final pipeline (elastic#131236) Refresh potential lost connections at query start for `_search` (elastic#130463) Add template_id to patterned-text type (elastic#131401) Integrate LIKE/RLIKE LIST with ReplaceStringCasingWithInsensitiveRegexMatch rule (elastic#131531) [ES|QL] Add doc for the COMPLETION command (elastic#131010) ESQL: Add times to topn status (elastic#131555) ESQL: Add asynchronous pre-optimization step for logical plan (elastic#131440) ES|QL: Improve generative tests for FORK [130015] (elastic#131206) Update index mapping update privileges (elastic#130894) ESQL: Added Sample operator NamedWritable to plugin (elastic#131541) update `kibana_system` to grant it access to `.chat-*` system index (elastic#131419) Clarify heap size configuration (elastic#131607) ...

ESQL: Add asynchronous pre-optimization step for logical plan

0568a4d

afoucret added >non-issue :Analytics/ES|QL AKA ESQL v9.2.0 labels Jul 17, 2025

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 17, 2025

[CI] Auto commit changes from spotless

5da14bb

nik9000 requested a review from astefan July 17, 2025 13:25

afoucret commented Jul 17, 2025

View reviewed changes

benchmarks/src/main/java/org/elasticsearch/benchmark/_nightly/esql/QueryPlanningBenchmark.java Outdated Show resolved Hide resolved

afoucret commented Jul 17, 2025

View reviewed changes

.../esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanPreOptimizerTests.java Outdated Show resolved Hide resolved

afoucret and others added 2 commits July 17, 2025 15:37

Fix copy//paste

38aa860

Revert uselss change in QueryPlanningBenchmark

a4d27ac

afoucret commented Jul 17, 2025

View reviewed changes

.../esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanPreOptimizerTests.java Outdated Show resolved Hide resolved

Lint

d4e857d

afoucret commented Jul 17, 2025

View reviewed changes

.../esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanPreOptimizerTests.java Outdated Show resolved Hide resolved

Lint again

b061fc7

afoucret mentioned this pull request Jul 11, 2025

ES|QL: Add TEXT_EMBEDDING function #131022

Closed

6 tasks

afoucret and others added 3 commits July 17, 2025 17:05

Fix CsvTests

26d815e

Merge branch 'main' into esql-logical-plan-pre-optimizer

d2355b8

Merge branch 'main' into esql-logical-plan-pre-optimizer

4b0cca8

astefan reviewed Jul 18, 2025

View reviewed changes

astefan self-requested a review July 21, 2025 13:12

astefan approved these changes Jul 21, 2025

View reviewed changes

afoucret merged commit c666679 into elastic:main Jul 21, 2025
33 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ESQL: Add asynchronous pre-optimization step for logical plan #131440

ESQL: Add asynchronous pre-optimization step for logical plan #131440

Uh oh!

afoucret commented Jul 17, 2025

Uh oh!

elasticsearchmachine commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

afoucret commented Jul 18, 2025 •

edited

Loading

Uh oh!

astefan left a comment

Uh oh!

astefan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ESQL: Add asynchronous pre-optimization step for logical plan #131440

ESQL: Add asynchronous pre-optimization step for logical plan #131440

Uh oh!

Conversation

afoucret commented Jul 17, 2025

Description

Context / Use case

Key Changes

Uh oh!

elasticsearchmachine commented Jul 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

afoucret commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

astefan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

afoucret commented Jul 18, 2025 •

edited

Loading