Enrich after main field caps #134290

idegtiarenko · 2025-09-08T10:14:01Z

This change moves enrich resolution after main index resolution.
This allows us to avoid additional FC call (compared to #133947) at expense of resolving all fields in cases with enrich.

Queries such as from employees | enrich languages_policy | keep emp_no would have to request all fields,
however I think it is still okay as most of the queries do not have keep/drop so the list of fields is not pruned anyways.

Related to: ES-12837

# Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/session/FieldNameUtilsTests.java

# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

elasticsearchmachine · 2025-09-19T09:53:34Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

idegtiarenko · 2025-09-19T09:54:54Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

-                    listener.delegateFailure((l, indexResolution) -> {
-                        l.onResponse(result.withIndexResolution(indexResolution));
+                    listener.delegateFailureAndWrap((l, indexResolution) -> {
+                        EsqlCCSUtils.updateExecutionInfoWithUnavailableClusters(executionInfo, indexResolution.failures());


This is pulled to an earlier stage from analyzeWithRetry.
It is required to record failures into executionInfo so that following steps (lookup and enrich resolution) are aware about failed clusters and could skip them.

idegtiarenko · 2025-09-19T09:55:35Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+                    LOGGER.debug("No more clusters to search, ending analysis stage");
+                    throw new NoClustersToSearchException();
+                }
+                return r;


This is pulled to an earlier stage from analyzeWithRetry.
No need to resolve anything else (such as lookup, enrich, inference) if the query could not be executed anyways.

idegtiarenko · 2025-09-19T13:05:48Z

x-pack/plugin/esql/qa/testFixtures/src/main/resources/enrich.csv-spec

 fieldsInOtherIndicesBug
 required_capability: enrich_load
 required_capability: fix_replace_missing_field_with_null_duplicate_name_id_in_layout
+required_capability: dense_vector_field_type


This and below addition of required_capability: dense_vector_field_type.

Previously enrich resolutions were happening before main field caps in order to resolve matchField that is later added to the list of fields in the main field caps call.

In order to make main field caps call before enrich we have to request all fields in case there is any enrich in the query (as we do not know what might be their matchField yet). This list is kept and serialized within the plan. For this two affected queries we happen to query from * that includes a dense_vector with corresponding dense_vector field. It is not supported prior to 9.2. I am adding this capability in order to be able to deserialize a plan with this field on data nodes.

luigidellaquila

LGTM, I just left a couple of minor comments.

We'll pay some performance penalty with this, but I don't think we have alternatives for remote enriches.

The only thing we could attempt is to resolve _local enriches first, only if there are no remote enriches at all. This would let us preserve the logic that reduces the field_caps to strictly needed fields in a (probably very limited) set of cases, but it also complicates the code a lot.

I'm not sure it's worth the effort and the complication TBH (for sure I wouldn't do it now) especially because we expect people to use JOINs much more frequently than ENRICH

luigidellaquila · 2025-09-22T08:05:04Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+                return r;
+            })
            .<PreAnalysisResult>andThen((l, r) -> preAnalyzeLookupIndices(preAnalysis.lookupIndices().iterator(), r, executionInfo, l))
+            .<PreAnalysisResult>andThen((l, r) -> {


nit: before this change, the resolution order was: enrich, inference, main, lookup
now it is: main, lookup, inference, enrich

Are there any reasons why we didn't keep the original order (apart from main of course), ie. main, enrich, inference, lookup?

No particular reason actually. It could be any as long as main is first.

I'd put inference last maybe, because enrich & lookup deal with remote clusters (and inference currently doesn't), where there's a high chance something may go wrong, and if it does, there's no need to even spending time on inference.

luigidellaquila · 2025-09-22T08:10:22Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/FieldNameUtils.java

+        if (hasEnriches) {
+            // we do not know names of the enrich policy match fields before hand. We need to resolve all fields in thisc ase
+            return new PreAnalysisResult(IndexResolver.ALL_FIELDS, wildcardJoinIndices);


This could have an impact on performance, but I don't think we have alternatives.

Nit: in the comment in thisc ase -> in this case

smalyshev · 2025-09-22T23:52:37Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

+            .<PreAnalysisResult>andThen((l, r) -> {
+                inferenceService.inferenceResolver().resolveInferenceIds(parsed, l.map(r::withInferenceResolution));
+            })
            .<LogicalPlan>andThen((l, r) -> analyzeWithRetry(parsed, requestFilter, preAnalysis, executionInfo, r, l))


Here I am a bit worried about this situation:

Let's say we have two clusters, Local and Remote, and a filter.

The first call to preAnalyzeMainIndices filters out all Remote indices, so we consider it only for Local and do all the resolutions only for Local.

Now analysis fails, and we retry it without filter. This time the list of indices comes in with both Local and Remote.

Because of that, we're going to send the request to both Local and Remote. But we did not check lookup indices or policies there. It is true that Remote does not actually need to use them because of the filter, but is filter applied early enough? What if the planner on Remote needs something about some index or policy and it's not there? I'm not sure what would happen...

That is a valid concern. I believe in such case we need to retry the entire analysis including index resolution.
I opened ES-12978 for this. This should not be a problem until we have flat resolution. For now list of remotes is still known beforehand.

smalyshev

I am worried a bit that switching to "all field" is going to cause trouble for us later, especially given that this would apply to stateful too, where old way of resolving is still fine. But I guess we'll see.

smalyshev · 2025-09-22T23:54:22Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java


        doResolvePolicies(
-            new HashSet<>(executionInfo.getClusters().keySet()),
+            executionInfo.clusterInfo.isEmpty() ? new HashSet<>() : executionInfo.getRunningClusterAliases().collect(toSet()),


Why do you need this check? Wouldn't the stream take care of it anyway?

executionInfo.getRunningClusterAliases() calls getClusterStates(Cluster.Status.RUNNING) that inside has the following assertion:

elasticsearch/x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlExecutionInfo.java

Lines 317 to 320 in 04ff798

public Stream<Cluster> getClusterStates(Cluster.Status status) {

assert clusterInfo.isEmpty() == false : "ClusterMap in EsqlExecutionInfo must not be empty";

return clusterInfo.values().stream().filter(cluster -> cluster.getStatus() == status);

}

I assume that enforces us to perform a "in ccs" check. Please let me know if that could be done simpler

idegtiarenko added 3 commits September 8, 2025 11:54

resolve all fields when enrich is used

0fe67dc

resolve enriches after main indices

3c1ce1f

comment data dependency

6c37a8f

idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.2.0 labels Sep 8, 2025

idegtiarenko added 11 commits September 8, 2025 13:28

fix unit test

5582140

Merge branch 'main' into es-12837_enrich_after_main_field_caps

db25c08

Merge branch 'main' into es-12837_enrich_after_main_field_caps

5cf06b6

# Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/session/FieldNameUtilsTests.java

connect to only running clusters when resolving enrich

1362fa9

fix exception propagation

4613736

fix enrich with row

d4446bd

only update if resolution is valid

43171a6

Merge branch 'main' into es-12837_enrich_after_main_field_caps

fcce03e

# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java

fix merge

fe40176

reorder

f87d225

postpone inferenceService resolution

5489c5b

idegtiarenko marked this pull request as ready for review September 19, 2025 09:53

idegtiarenko requested review from luigidellaquila and smalyshev September 19, 2025 09:53

idegtiarenko commented Sep 19, 2025

View reviewed changes

idegtiarenko changed the title ~~Es 12837 enrich after main field caps~~ Enrich after main field caps Sep 19, 2025

add required capabilities

146e3ec

idegtiarenko commented Sep 19, 2025

View reviewed changes

idegtiarenko mentioned this pull request Sep 19, 2025

resolve remotes with field caps call #133947

Closed

Merge branch 'main' into es-12837_enrich_after_main_field_caps

546bedf

luigidellaquila approved these changes Sep 22, 2025

View reviewed changes

fix typo

4957da3

smalyshev reviewed Sep 22, 2025

View reviewed changes

smalyshev approved these changes Sep 22, 2025

View reviewed changes

idegtiarenko added 2 commits September 23, 2025 09:00

Merge branch 'main' into es-12837_enrich_after_main_field_caps

41d179d

add comment to the tests

b00fd59

idegtiarenko merged commit 4bb963e into elastic:main Sep 23, 2025
34 checks passed

idegtiarenko mentioned this pull request Sep 23, 2025

Simplify EsqlSession#analyzeWithRetry #135182

Closed

idegtiarenko deleted the es-12837_enrich_after_main_field_caps branch September 23, 2025 08:41

	public Stream<Cluster> getClusterStates(Cluster.Status status) {
	assert clusterInfo.isEmpty() == false : "ClusterMap in EsqlExecutionInfo must not be empty";
	return clusterInfo.values().stream().filter(cluster -> cluster.getStatus() == status);
	}

Enrich after main field caps #134290

Enrich after main field caps #134290

Uh oh!

Conversation

idegtiarenko commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Sep 19, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

idegtiarenko Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

smalyshev left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

idegtiarenko commented Sep 8, 2025 •

edited

Loading

idegtiarenko Sep 23, 2025 •

edited

Loading