Skip to content

Conversation

@idegtiarenko
Copy link
Contributor

@idegtiarenko idegtiarenko commented Sep 8, 2025

This change moves enrich resolution after main index resolution.
This allows us to avoid additional FC call (compared to #133947) at expense of resolving all fields in cases with enrich.

Queries such as from employees | enrich languages_policy | keep emp_no would have to request all fields,
however I think it is still okay as most of the queries do not have keep/drop so the list of fields is not pruned anyways.

Related to: ES-12837

@idegtiarenko idegtiarenko added >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL v9.2.0 labels Sep 8, 2025
@idegtiarenko idegtiarenko marked this pull request as ready for review September 19, 2025 09:53
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

listener.delegateFailure((l, indexResolution) -> {
l.onResponse(result.withIndexResolution(indexResolution));
listener.delegateFailureAndWrap((l, indexResolution) -> {
EsqlCCSUtils.updateExecutionInfoWithUnavailableClusters(executionInfo, indexResolution.failures());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pulled to an earlier stage from analyzeWithRetry.
It is required to record failures into executionInfo so that following steps (lookup and enrich resolution) are aware about failed clusters and could skip them.

LOGGER.debug("No more clusters to search, ending analysis stage");
throw new NoClustersToSearchException();
}
return r;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pulled to an earlier stage from analyzeWithRetry.
No need to resolve anything else (such as lookup, enrich, inference) if the query could not be executed anyways.

@idegtiarenko idegtiarenko changed the title Es 12837 enrich after main field caps Enrich after main field caps Sep 19, 2025
fieldsInOtherIndicesBug
required_capability: enrich_load
required_capability: fix_replace_missing_field_with_null_duplicate_name_id_in_layout
required_capability: dense_vector_field_type
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and below addition of required_capability: dense_vector_field_type.

Previously enrich resolutions were happening before main field caps in order to resolve matchField that is later added to the list of fields in the main field caps call.

In order to make main field caps call before enrich we have to request all fields in case there is any enrich in the query (as we do not know what might be their matchField yet). This list is kept and serialized within the plan. For this two affected queries we happen to query from * that includes a dense_vector with corresponding dense_vector field. It is not supported prior to 9.2. I am adding this capability in order to be able to deserialize a plan with this field on data nodes.

Copy link
Contributor

@luigidellaquila luigidellaquila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I just left a couple of minor comments.

We'll pay some performance penalty with this, but I don't think we have alternatives for remote enriches.

The only thing we could attempt is to resolve _local enriches first, only if there are no remote enriches at all. This would let us preserve the logic that reduces the field_caps to strictly needed fields in a (probably very limited) set of cases, but it also complicates the code a lot.

I'm not sure it's worth the effort and the complication TBH (for sure I wouldn't do it now) especially because we expect people to use JOINs much more frequently than ENRICH

return r;
})
.<PreAnalysisResult>andThen((l, r) -> preAnalyzeLookupIndices(preAnalysis.lookupIndices().iterator(), r, executionInfo, l))
.<PreAnalysisResult>andThen((l, r) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: before this change, the resolution order was: enrich, inference, main, lookup
now it is: main, lookup, inference, enrich

Are there any reasons why we didn't keep the original order (apart from main of course), ie. main, enrich, inference, lookup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No particular reason actually. It could be any as long as main is first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd put inference last maybe, because enrich & lookup deal with remote clusters (and inference currently doesn't), where there's a high chance something may go wrong, and if it does, there's no need to even spending time on inference.

Comment on lines 240 to 242
if (hasEnriches) {
// we do not know names of the enrich policy match fields before hand. We need to resolve all fields in thisc ase
return new PreAnalysisResult(IndexResolver.ALL_FIELDS, wildcardJoinIndices);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could have an impact on performance, but I don't think we have alternatives.

Nit: in the comment in thisc ase -> in this case

.<PreAnalysisResult>andThen((l, r) -> {
inferenceService.inferenceResolver().resolveInferenceIds(parsed, l.map(r::withInferenceResolution));
})
.<LogicalPlan>andThen((l, r) -> analyzeWithRetry(parsed, requestFilter, preAnalysis, executionInfo, r, l))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am a bit worried about this situation:

  1. Let's say we have two clusters, Local and Remote, and a filter.
  2. The first call to preAnalyzeMainIndices filters out all Remote indices, so we consider it only for Local and do all the resolutions only for Local.
  3. Now analysis fails, and we retry it without filter. This time the list of indices comes in with both Local and Remote.
  4. Because of that, we're going to send the request to both Local and Remote. But we did not check lookup indices or policies there. It is true that Remote does not actually need to use them because of the filter, but is filter applied early enough? What if the planner on Remote needs something about some index or policy and it's not there? I'm not sure what would happen...

Copy link
Contributor Author

@idegtiarenko idegtiarenko Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a valid concern. I believe in such case we need to retry the entire analysis including index resolution.
I opened ES-12978 for this. This should not be a problem until we have flat resolution. For now list of remotes is still known beforehand.

Copy link
Contributor

@smalyshev smalyshev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried a bit that switching to "all field" is going to cause trouble for us later, especially given that this would apply to stateful too, where old way of resolving is still fine. But I guess we'll see.


doResolvePolicies(
new HashSet<>(executionInfo.getClusters().keySet()),
executionInfo.clusterInfo.isEmpty() ? new HashSet<>() : executionInfo.getRunningClusterAliases().collect(toSet()),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need this check? Wouldn't the stream take care of it anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executionInfo.getRunningClusterAliases() calls getClusterStates(Cluster.Status.RUNNING) that inside has the following assertion:

public Stream<Cluster> getClusterStates(Cluster.Status status) {
assert clusterInfo.isEmpty() == false : "ClusterMap in EsqlExecutionInfo must not be empty";
return clusterInfo.values().stream().filter(cluster -> cluster.getStatus() == status);
}

I assume that enforces us to perform a "in ccs" check. Please let me know if that could be done simpler

@idegtiarenko idegtiarenko merged commit 4bb963e into elastic:main Sep 23, 2025
34 checks passed
@idegtiarenko idegtiarenko deleted the es-12837_enrich_after_main_field_caps branch September 23, 2025 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants