ESQL: improve mv_expand planning #103582

astefan · 2023-12-19T16:05:18Z

This is a collection of improvements made to how mv_expand is planned.
There are two mains areas in terms of optimization rules:

limit after mv_expand can be duplicated before mv_expand when there is no sort after mv_expand. Here I used @luigidellaquila's approach to link a limit to a mv_expand as a way of not copying multiple times the same limit past mv_expand. The edge case query that really needs this is this query from test | where false | mv_expand first_name.
mv_expand can be pushed down past by sort in certain situations:
- every time there is a sort after mv_expand the mv_expand can be pushed down
- when there is no sort after mv_expand and the limit cannot be duplicated past mv_expand

While trying out different queries there is one where the two rules above do not apply:

from test
            | sort last_name
            | keep first_name
            | mv_expand first_name
            | where first_name is not null

This query is present here in tests https://github.com/elastic/elasticsearch/pull/103582/files#diff-427544eaca2f06adb1b5850278a22573fb696564260a34a8f5580ff66bbbd95aR1058.

Also, while testing I realized we can push down mv_expand past keep, but we have a faulty handling of mv_expand. Reported here.

Fix small issues with the optimizer rule

…101266_fix

alex-spies

I think the general approach is good. I didn't go through all the test cases yet, but otherwise made some comments.

I think as our optimizer grows and rules become more complex, it'd be useful to have unit tests for individual rules which test only the given, single rule (in contrast to testing the whole optimizer on given queries). I find it difficult to verify if the rules will produce equivalent plans in all situations.

alex-spies · 2023-12-21T17:36:56Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

+        return plan instanceof Eval || plan instanceof Project || plan instanceof RegexExtract || plan instanceof Enrich;
+    }

+    protected static class PushDownMvExpandPastOrderBy extends OptimizerRules.OptimizerRule<OrderBy> {


suggestion: a small example as javadoc would immensely help understand how this rule works.

Question: how is pushing MvExpand past an OrderBy beneficial?

suggestion: a small example as javadoc would immensely help understand how this rule works.

My advice: take a look at the tests in LogicalPlanOptimizerTests and their javadoc, where the optimized plan is visible. Alternatively, trace logging on org.elasticsearch.xpack.esql.optimizer to see what each rule is doing when.

Question: how is pushing MvExpand past an OrderBy beneficial?

In the description of the PR I have identified two cases for this type of pushdown. If the semantics of the query remains the same (and they are in those two situations I mentioned), it is beneficial to form a TopN by having sort (OrderBy) closer to a limit. Since every query now has a limit, the aim is to have the sort closer to it.

Thanks.

I still think it should be obvious what a rule does from either its name or a javadoc; the optimizer tests are not sufficient as they contain complete optimization runs, so there's no fast way to see what a rule does (in isolation). This rule is also sufficiently complex that skimming the code does not give an immediate idea of what to expect in terms of its behavior.

Wait, sorry - this is actually one of the simpler rules - I thought I left this comment further down.

Still - pushing down mv_expand past a sort is not an obvious optimization (obvious would be e.g. pushing down filters to reduce number of rows). A short comment explaining the reasoning for why it exists in the first place would be important IMHO.

alex-spies · 2023-12-21T17:48:01Z

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

+            | sort last_name ASC
+            | mv_expand emp_no


If we wanted to create an early TopN, we could "duplicate" the filter - inserting a filter here that only permits documents with at least one emp_no value that is >10050 would do this; not sure how feasible that is, though.

alex-spies · 2023-12-22T15:20:02Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

+        @Override
+        protected LogicalPlan rule(Limit limit) {
+            var child = limit.child();
+            // let the PushDownAndCombineLimits rule to push Limit further down


missing a verb?

alex-spies · 2023-12-22T15:32:08Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

+                    if (mvExpand.child() instanceof OrderBy) {
+                        return limit.replaceChild(replaceChildUntilMvExpandAndOrderBy((UnaryPlan) child, mvExpand));


Since we're pushing down MvExpand rather than introducing an additional limit, shouldn't this case be handled in a separate rule?

Maybe, I am not sure. The line you referenced here is, basically, an exception to the general rule of duplicating the limit. And this exception here doesn't fulfill the same conditions as PushDownMvExpandPastOrderBy.

We can leave it but I think it should be documented; this is definitely unexpected.

alex-spies · 2023-12-22T15:32:43Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

+        }
+    }
+
+    static class DuplicateLimitAfterMvExpand extends OptimizerRules.OptimizerRule<Limit> {


This should contain an example + reasoning for when this is possible as javadoc IMO.

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

alex-spies · 2023-12-22T16:00:20Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

+            if (mvExpand != null) {
+                // if the filter is on the expanded values and the order of the results doesn't change, push down mv_expand past sort
+                if (isFiltered && hasOrderBy == false) {


I think mvExpand == null and hasOrderBy are mutually exclusive. If so, maybe this can be written a bit easier.

alex-spies

Alright, had another go. Left some observations, and overall still agree with the approach :)

It's great that you agreed so many tests. This must have been fun, getting all the edge cases right. Thanks Andrei!

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

alex-spies · 2023-12-22T17:36:48Z

...gin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java

+     */
+    public void testDontPushDownLimit_ButPushMvExpandPastSort2() {
+        LogicalPlan plan = optimizedPlan("""
+            from test


Looks like this is a duplicate of the previous test.

You are right. I had variations of different tests spawning from previous ones.

luigidellaquila

Thanks @astefan, this is a complex topic and rationalizing it is not easy at all.
I left a comment about sort pushdown, that seems to miss some cases.
In general, I'd still prefer to keep it more simple, ie. push limit past mv_expand (keeping the limit inside mv_expand), plus meaningful error messages for cases that we cannot cover (ie, unbounded sort that still cannot be optimized)

luigidellaquila · 2024-01-02T11:05:30Z

...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

+        return plan instanceof Eval || plan instanceof Project || plan instanceof RegexExtract || plan instanceof Enrich;
+    }

+    protected static class PushDownMvExpandPastOrderBy extends OptimizerRules.OptimizerRule<OrderBy> {


I tried to reconstruct the behavior of this rule and the reasoning behind it, and I have to say that it took me some time and I'm still unsure I got all the nuances.

This is a problem by itself: the rule tries to analyze a potentially large portion of the plan and then makes changes in positions that are very far from the entry point of the rule; it's hard to understand now, so it will be even harder to maintain in the long term, when the original intentions will be harder to recollect and when the language (and the planner) will be richer and more complex.

This said, I tried to understand if the rule is correct in all cases, and my conclusion is that it's not.
My understanding is that the intention is the following: if you have a plan with sort | mv_expand | ... | sort, you replace it with mv_expand | sort | ... | sort, in the assumption that eventually the first sort will be removed (because redundant?).
This is correct as long as there are no commands in between that drop possible candidate results, or for which the order actually matters.

The case that we are ignoring now is LIMIT: a query like sort | mv_expand | ... | limit | ... | sort is different from mv_expand | sort | ... | limit | ... | sort, especially when the first sort happens on the expanded key. Both descendantMvExpand() and replaceChildUntilMvExpandAndOrderBy() are not taking this case into consideration.

Cases we could miss tomorrow are all the commands that rely on the intermediate order, that include inline stats, commands that access previous records and other cases we can hardly completely define now.

In short, this rule would be much better defined as a local rule that converts sort | mv_expand -> mv_expand | sort, but unfortunately it's not generally valid, and defining all the cases when it's valid is risky and error prone. Also, the general usefulness of this change is not completely clear (to me at least).
We can try to fix the above, but we are paying a high cost to maintainability for arguably small advantages.

++, agree that it's hard to verify this rule for all cases, esp. as we might add new operators that might invalidate it in the future, silently.

If we want to keep the sort | mv_expand | ... | sort -> mv_expand | sort | ... | sort, we probably should restrict the operators that can go in the ... part explicitly, ideally adding at least a unit test per such operator to verify that output doesn't change after optimization. That is a lot of work, though :/

astefan added 5 commits December 6, 2023 20:55

Add Analyzer level default limit for certain queries using mv_expand

495de5f

Fix small issues with the optimizer rule

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

6abd99f

…101266_fix

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

a8c41b0

…101266_fix

Change the way mv_expand is handled

777486e

Merge branch 'main' of https://github.com/elastic/elasticsearch into …

73d7405

…101266_fix

elasticsearchmachine added the v8.13.0 label Dec 19, 2023

astefan requested a review from alex-spies December 21, 2023 08:23

alex-spies reviewed Dec 22, 2023

View reviewed changes

luigidellaquila requested changes Jan 2, 2024

View reviewed changes

This was referenced Jan 3, 2024

ES|QL: Implement StickyLimit plan and push LIMIT past MV_EXPAND #103111

Closed

ESQL: mv_expand default limit fix #103080

Closed

Fix LIMIT pushdown with MV_EXPAND #102545

Closed

elasticsearchmachine added v8.14.0 and removed v8.13.0 labels Feb 14, 2024

elasticsearchmachine added v8.15.0 and removed v8.14.0 labels Apr 17, 2024

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

mark-vieira added v9.0.0 and removed v8.16.0 labels Sep 11, 2024

costin mentioned this pull request Oct 28, 2024

ES|QL: fix LIMIT pushdown past MV_EXPAND #115624

Merged

elasticsearchmachine added v9.1.0 and removed v9.0.0 labels Jan 30, 2025

elasticsearchmachine added v9.2.0 and removed v9.1.0 labels Jun 26, 2025

astefan closed this Aug 21, 2025

		if (mvExpand.child() instanceof OrderBy) {
		return limit.replaceChild(replaceChildUntilMvExpandAndOrderBy((UnaryPlan) child, mvExpand));

ESQL: improve mv_expand planning #103582

ESQL: improve mv_expand planning #103582

Uh oh!

Conversation

astefan commented Dec 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alex-spies left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

luigidellaquila left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

astefan commented Dec 19, 2023 •

edited

Loading