Skip to content

Conversation

@alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Oct 31, 2025

Relates #137534

When deciding which types are supported, we did not use the correct minimum transport version during the enrich resolution step. What's more, the EnrichPolicyResolver did not account for the fact that the node requesting resolution might be on a version that doesn't support the types in the resolved mapping, which led to serialization bugs surfacing when trying to enable the DATE_RANGE type. We had inconsistent minimum transport version in case of ROW | ENRICH/ROW | LOOKUP JOIN queries and FROM remote:* | ENRICH queries.

For ENRICH, this is a non-issue in practice, because the only affected data types were AGGREGATE_METRIC_DOUBLE and DENSE_VECTOR, which cannot yet be used in enrich policies to begin with. (See #137699 and #127350).

ROW | LOOKUP JOIN queries were broken (or at least potentially inconsistent) because a new coordinator might have assumed that the two new data types are supported, while an old lookup node doesn't actually support them.

  • Pass the determined overall minimum transport version into the query profile for debugging and testing.
  • When resolving enrich policies, pass the minimum version determined in the main index resolution to the enrich policy resolver. This is important for CCS, where the enrich policies are resolved locally on the remote clusters.
  • For consistency, do the same for lookup joins (technically not required, they perform their field caps call on the coordinator which is simpler).
  • For both, enrich and lookup resolution, retrieve the determined minimum version and use it on the coordinator. This is important for ROW | ENRICH and ROW | LOOKUP JOIN queries.
  • Retrieve the coordinating cluster's minimum transport version from the cluster state and use that as baseline so that ROW | ENRICH and ROW | LOOKUP JOIN queries don't just use the coordinator node's version (which may be newer than the version of a node executing enrich/lookup joins for us).
  • Add a bunch of tests
  • Add tests for FROM remote_only:*
  • Add tests for FROM | ENRICH, both for queries on the local cluster and in case of CCS.
  • Add tests for FROM | LOOKUP JOIN on the local cluster + with CCS.

@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

@alex-spies alex-spies added auto-backport Automatically create backport pull requests when merged test-release Trigger CI checks against release build labels Oct 31, 2025
@alex-spies alex-spies marked this pull request as ready for review October 31, 2025 17:16
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Oct 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@alex-spies alex-spies marked this pull request as draft November 4, 2025 18:25
@alex-spies
Copy link
Contributor Author

Thanks for the review and your very good suggestion for simplification by using the cluster state, @idegtiarenko !

@alex-spies alex-spies added the test-full-bwc Trigger full BWC version matrix tests label Nov 28, 2025
@alex-spies
Copy link
Contributor Author

Ok, CI is green except for 2 unrelated release tests that are failing. These are

This is safe to merge.

@alex-spies alex-spies merged commit 4a14f83 into elastic:main Dec 2, 2025
28 of 32 checks passed
@alex-spies alex-spies deleted the fix-enrich-lj-resolution-min-transport-version branch December 2, 2025 13:36
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.2 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 137431

@alex-spies
Copy link
Contributor Author

💚 All backports created successfully

Status Branch Result
9.2

Questions ?

Please refer to the Backport tool documentation

alex-spies added a commit to alex-spies/elasticsearch that referenced this pull request Dec 2, 2025
…rsion (elastic#137431)

When deciding which types are supported, we did not use the correct minimum transport version during the enrich resolution in case of CCS and ROW queries. What's more, the EnrichPolicyResolver did not account for the fact that the node requesting resolution might be on a version that doesn't support the types in the resolved mapping, which led to serialization bugs surfacing when trying to enable the DATE_RANGE type.

- Initialize the minimum transport version with the minimum version from the cluster state before any resolution steps. That makes ROW queries correct.
- Send the determined minimum transport version along the enrich resolution request so that remote clusters don't send un-deserializable data types back.
- Add the determined minimum transport version to the profile.
- Add a bunch of tests.

(cherry picked from commit 4a14f83)

# Conflicts:
#	server/src/main/resources/transport/upper_bounds/9.3.csv
#	x-pack/plugin/esql/qa/server/single-node/src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/single_node/PushExpressionToLoadIT.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/TransportEsqlQueryAction.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/IndexResolver.java
alex-spies added a commit that referenced this pull request Dec 3, 2025
…ort version (#137431) (#138893)

* ESQL: Fix enrich and lookup join resolution based on min transport version (#137431)

When deciding which types are supported, we did not use the correct minimum transport version during the enrich resolution in case of CCS and ROW queries. What's more, the EnrichPolicyResolver did not account for the fact that the node requesting resolution might be on a version that doesn't support the types in the resolved mapping, which led to serialization bugs surfacing when trying to enable the DATE_RANGE type.

- Initialize the minimum transport version with the minimum version from the cluster state before any resolution steps. That makes ROW queries correct.
- Send the determined minimum transport version along the enrich resolution request so that remote clusters don't send un-deserializable data types back.
- Add the determined minimum transport version to the profile.
- Add a bunch of tests.

(cherry picked from commit 4a14f83)

# Conflicts:
#	server/src/main/resources/transport/upper_bounds/9.3.csv
#	x-pack/plugin/esql/qa/server/single-node/src/javaRestTest/java/org/elasticsearch/xpack/esql/qa/single_node/PushExpressionToLoadIT.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/enrich/EnrichPolicyResolver.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plugin/TransportEsqlQueryAction.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/EsqlSession.java
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/session/IndexResolver.java

* Align PreAnalysisResult with main

Make pre-initialization of minimumTransportVersion consistent with main.

* Use min version from the correct field caps

In 9.2, this was in the forked field caps response, not the original
field caps response.

---------

Co-authored-by: elasticsearchmachine <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL auto-backport Automatically create backport pull requests when merged backport pending >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) test-full-bwc Trigger full BWC version matrix tests test-release Trigger CI checks against release build v9.2.3 v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants