Ban Limit + MvExpand before remote Enrich #135051

smalyshev · 2025-09-19T00:49:01Z

Remote ENRICH (and any remote operation in fact) is not compatible with MV_EXPAND + LIMIT. Consider:

FROM *:events | SORT @timestamp | LIMIT 2 | MV_EXPAND ip | ENRICH _remote:clientip_policy ON ip

Semantically, this must take two top events and then expand them. However, this can not be executed remotely, because this means that we have to take top 2 events on each node, then expand them, then apply Enrich, then bring them to the coordinator - but then we can not select top 2 of them - because that would be pre-expand! We do not know which expanded rows are coming from the true top rows and which are coming from "false" top rows which should have been thrown out. This is only possible to execute if MV_EXPAND executes on the coordinator - which contradicts remote Enrich.

With current hack it would silently return wrong data (as it would apply LIMIT after joining remote data without caring for MvExpand) but even if we fix the hack I don't think it can be semantically executed, at least without subplans.

The same problem would happen with remote join - except the limits are already banned before lookup join, so we're good there. And the same probably would happen for any other expanding operation - but I think joins and MV_EXPAND are the only ones that exist right now.

elasticsearchmachine · 2025-09-19T00:50:04Z

Hi @smalyshev, I've created a changelog YAML for you.

elasticsearchmachine · 2025-09-19T13:58:11Z

Pinging @elastic/es-analytical-engine (Team:Analytics)

luigidellaquila

Thanks @smalyshev, the implementation looks correct for this specific case.
I just left a comment about validation in general, and on other cases we could be missing, see below

luigidellaquila · 2025-09-22T12:14:21Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Enrich.java

+    private void checkMvExpandAfterLimit(Failures failures) {
+        this.forEachDown(MvExpand.class, u -> {
+            u.forEachDown(p -> {
+                if (p instanceof Limit || p instanceof TopN) {


I noticed that the logic for JOIN is a bit different; in particular, post optimization, it also checks for the presence of a PipelineBreaker, while ENRICH only checks for ExecutesOn.
Do you think it makes sense to unify the two, or at least to make these two checks consistent?

Join and Enrich are different, as Enrich is cardinality-preserving while Join is not. That makes some pipeline breakers compatible with Enrich but not with Join. I agree that PipelineBreaker usage is not ideal there are it's not exactly meant for that, and in the future we may change that to refine the meanings of each, but Enrich and Join will probably stay different. Unless we move to handling them with subplans which would resolve the cardinality problem (not for free, of course). For now I think PipelineBreaker is a good stand-in for what we need, but longer term it probably will need to be changed.

This is also the reason for this particular change, btw - MV_EXPAND changes cardinality, which leads Enrich to essentially have the same issue that remote JOIN has from the start - the order of LIMIT and cardinality-changing operation comes out wrong, as semantically we expect that the LIMIT is global over all the dataset, but in reality we only do it per-node and delay the global one until we're back at the coordinator. This only works if none of the operations in between is cardinality-changing.

alex-spies

Looks good, nice catch!

alex-spies · 2025-09-22T16:12:27Z

docs/changelog/135051.yaml

+pr: 135051
+summary: Ban Limit + `MvExpand` before remote Enrich
+area: ES|QL
+type: enhancement


nit: maybe that's more of a bug fix.

I guess though we don't have a bug filed for it...

alex-spies · 2025-09-22T16:15:36Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Enrich.java

+    @Override
+    public void postAnalysisVerification(Failures failures) {
+        if (this.mode == Mode.REMOTE) {
+            checkMvExpandAfterLimit(failures);


note: since this triggers after analysis, the condition p instanceof TopN (while correct) will never be true - we don't create TopN nodes during analysis, only OrderBy nodes.

Yes, we have to do it on analysis stage to avoid confusion with synthetic limits that are pushed down, but I wasn't sure if there's any possible way to have topN on analysis stage.

alex-spies · 2025-09-22T16:16:57Z

...l/src/internalClusterTest/java/org/elasticsearch/xpack/esql/action/CrossClusterEnrichIT.java

+            | LIMIT 2
+            | eval ip= TO_STR(host)
+            | MV_EXPAND host
+            | %s


We could add tests that have random nodes in between the mv expand and the enrich. Or between the limit and the mv expand (although we already have this to some extent).

do we have any "random harmless commands" code anywhere in the tests? Or just add a couple of fixed ones?

alex-spies · 2025-09-22T16:18:37Z

x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Enrich.java

+        this.forEachDown(MvExpand.class, u -> {
+            u.forEachDown(p -> {
+                if (p instanceof Limit || p instanceof TopN) {
+                    failures.add(fail(this, "MV_EXPAND after LIMIT is incompatible with remote ENRICH"));


note: technically, that's only true if we cannot push the remote enrich past the mv_expand. Which we sometimes could! (There are more optimizations that could be applied to MV_EXPAND, in general.)

But since we currently don't do this, this check will strictly prohibit only queries that we can't properly run anyway, so this is fine.

Maybe we could add a comment, though?

elasticsearchmachine · 2025-09-22T18:03:55Z

Hi @smalyshev, I've updated the changelog YAML for you.

smalyshev · 2025-09-22T18:13:35Z

@alex-spies do you think we should backport it to 9.1/8.19?

* Ban Limit + MvExpand before remote Enrich

* Ban Limit + MvExpand before remote Enrich (cherry picked from commit 7f1d2dc) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Enrich.java

smalyshev · 2025-09-23T21:00:19Z

💚 All backports created successfully

Status	Branch	Result
✅	9.1
✅	8.19

Questions ?

Please refer to the Backport tool documentation

* Ban Limit + MvExpand before remote Enrich (cherry picked from commit 7f1d2dc) # Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/plan/logical/Enrich.java

* Ban Limit + MvExpand before remote Enrich (#135051) * Ban Limit + MvExpand before remote Enrich (cherry picked from commit 7f1d2dc)

* Add release note for #135051

* Add release note for elastic#135051

* Add release note for #135051

Ban Limit + MvExpand before remote Enrich

9edcd72

smalyshev added the :Analytics/ES|QL AKA ESQL label Sep 19, 2025

elasticsearchmachine added the v9.2.0 label Sep 19, 2025

smalyshev added >enhancement v9.2.0 and removed v9.2.0 labels Sep 19, 2025

Update docs/changelog/135051.yaml

7c6c9b1

smalyshev added 2 commits September 18, 2025 18:51

spotless

3f2e2e9

Move check pre optimizer

3d909ce

smalyshev marked this pull request as ready for review September 19, 2025 13:57

smalyshev requested a review from alex-spies September 19, 2025 13:58

elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Sep 19, 2025

smalyshev requested review from astefan, bpintea and luigidellaquila September 19, 2025 13:58

luigidellaquila reviewed Sep 22, 2025

View reviewed changes

smalyshev requested a review from luigidellaquila September 22, 2025 14:24

alex-spies approved these changes Sep 22, 2025

View reviewed changes

smalyshev added >bug and removed >enhancement labels Sep 22, 2025

Update docs/changelog/135051.yaml

f6ee41a

Merge branch 'main' into remote-enrich-limit

3029092

Add comments

e3871aa

smalyshev merged commit 7f1d2dc into elastic:main Sep 22, 2025
34 checks passed

smalyshev deleted the remote-enrich-limit branch September 22, 2025 19:29

gmjehovich pushed a commit to gmjehovich/elasticsearch that referenced this pull request Sep 22, 2025

Ban Limit + MvExpand before remote Enrich (elastic#135051)

c965073

* Ban Limit + MvExpand before remote Enrich

DonalEvans pushed a commit to DonalEvans/elasticsearch that referenced this pull request Sep 22, 2025

Ban Limit + MvExpand before remote Enrich (elastic#135051)

c1bc48d

* Ban Limit + MvExpand before remote Enrich

smalyshev mentioned this pull request Sep 23, 2025

[9.1] Ban Limit + MvExpand before remote Enrich (#135051) #135310

Merged

smalyshev mentioned this pull request Sep 23, 2025

[8.19] Ban Limit + MvExpand before remote Enrich (#135051) #135311

Merged

smalyshev added a commit that referenced this pull request Sep 25, 2025

[9.1] Ban Limit + MvExpand before remote Enrich (#135051) (#135310)

2549a2c

* Ban Limit + MvExpand before remote Enrich (#135051) * Ban Limit + MvExpand before remote Enrich (cherry picked from commit 7f1d2dc)

smalyshev added a commit that referenced this pull request Sep 25, 2025

[8.19] Ban Limit + MvExpand before remote Enrich (#135051) (#135311)

04be591

* Ban Limit + MvExpand before remote Enrich (#135051) * Ban Limit + MvExpand before remote Enrich (cherry picked from commit 7f1d2dc)

smalyshev added a commit to smalyshev/elasticsearch that referenced this pull request Sep 25, 2025

Add release note for elastic#135051

51c8726

This was referenced Sep 25, 2025

Add release note for #135051 #135475

Merged

Remote ENRICH + MV_EXPAND + LIMIT produces incorrect data #130153

Closed

smalyshev added a commit that referenced this pull request Sep 29, 2025

Add release note for #135051 (#135475)

64b8574

* Add release note for #135051

This was referenced Sep 29, 2025

[8.19] Add release note for #135051 (#135475) #135627

Merged

[9.1] Add release note for #135051 (#135475) #135628

Merged

smalyshev added a commit to smalyshev/elasticsearch that referenced this pull request Sep 29, 2025

Add release note for elastic#135051 (elastic#135475)

4b2b74d

* Add release note for elastic#135051

smalyshev added a commit to smalyshev/elasticsearch that referenced this pull request Sep 29, 2025

Add release note for elastic#135051 (elastic#135475)

01d7870

* Add release note for elastic#135051

elasticsearchmachine pushed a commit that referenced this pull request Sep 29, 2025

Add release note for #135051 (#135475) (#135628)

8b24b91

* Add release note for #135051

elasticsearchmachine pushed a commit that referenced this pull request Sep 29, 2025

Add release note for #135051 (#135475) (#135627)

ce8af96

* Add release note for #135051

Ban Limit + MvExpand before remote Enrich #135051

Ban Limit + MvExpand before remote Enrich #135051

Uh oh!

Conversation

smalyshev commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Sep 19, 2025

Uh oh!

elasticsearchmachine commented Sep 19, 2025

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Sep 22, 2025

Uh oh!

smalyshev commented Sep 22, 2025

Uh oh!

Uh oh!

smalyshev commented Sep 23, 2025

💚 All backports created successfully

Questions ?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

smalyshev commented Sep 19, 2025 •

edited

Loading