-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ES|QL: Remove redundant sorts from execution plan #121156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
4910db5
a0b463a
78c86de
10117dc
4d623d7
0ec1539
7ec3534
937e0dc
3569a16
df50502
3af3c11
16bfba0
5c1ed70
c73246d
1c260e1
a5d30ef
f881d2d
319cc69
7a9c8ab
2d34e50
f2af414
2303d8b
9b22923
9083477
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| pr: 121156 | ||
| summary: Remove redundant sorts from execution plan | ||
| area: ES|QL | ||
| type: bug | ||
| issues: [] |
This file was deleted.
This file was deleted.
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,116 @@ | ||||||
| /* | ||||||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||||||
| * or more contributor license agreements. Licensed under the Elastic License | ||||||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||||||
| * 2.0. | ||||||
| */ | ||||||
|
|
||||||
| package org.elasticsearch.xpack.esql.optimizer.rules.logical; | ||||||
|
|
||||||
| import org.elasticsearch.xpack.esql.plan.logical.Aggregate; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.Drop; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.Enrich; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.Eval; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.Filter; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.InlineStats; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.LogicalPlan; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.Lookup; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.MvExpand; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.OrderBy; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.Project; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.RegexExtract; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.Rename; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.TopN; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.UnaryPlan; | ||||||
| import org.elasticsearch.xpack.esql.plan.logical.join.Join; | ||||||
|
|
||||||
| import java.util.ArrayList; | ||||||
| import java.util.IdentityHashMap; | ||||||
| import java.util.List; | ||||||
|
|
||||||
| /** | ||||||
| * SORT cannot be executed without a LIMIT, as ES|QL doesn't support unbounded sort (yet). | ||||||
| * <p> | ||||||
| * The planner tries to push down LIMIT and transform all the unbounded sorts into a TopN. | ||||||
| * In some cases it's not possible though, eg. | ||||||
| * <p> | ||||||
| * from test | sort x | lookup join lookup on x | sort y | ||||||
| * <p> | ||||||
| * from test | sort x | mv_expand x | sort y | ||||||
| * <p> | ||||||
| * "sort y" will become a TopN, but "sort x" will remain unbounded, so the query could not be executed. | ||||||
|
||||||
| * "sort y" will become a TopN, but "sort x" will remain unbounded, so the query could not be executed. | |
| * "sort y" will become a TopN due to the addition of the default Limit, but "sort x" will remain unbounded, so the query could not be executed. |
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think transformUp cannot be correct here.
In case of multiple redundant Sorts, it will remove the bottom most one, first. But that removal is implemented by a recursive change of the plan via replaceChildren, which creates a new plan instance at every level above the first change (the bottom-most Sort). Since we're using an IdentityHashMap, all intermediate Sorts will not match anymore, and will require another run of this rule to be pruned.
I was able to reproduce this with
row x = [1,2,3], y = 1 | sort x | mv_expand x | sort x | mv_expand x | sort y
Limit[1000[INTEGER],false] = Limit[1000[INTEGER],false]
\_OrderBy[[Order[y{r}#61,ASC,LAST]]] = \_OrderBy[[Order[y{r}#61,ASC,LAST]]]
\_MvExpand[x{r}#69,x{r}#70] = \_MvExpand[x{r}#69,x{r}#70]
\_OrderBy[[Order[x{r}#69,ASC,LAST]]] = \_OrderBy[[Order[x{r}#69,ASC,LAST]]]
\_MvExpand[x{r}#59,x{r}#69] = \_MvExpand[x{r}#59,x{r}#69]
\_OrderBy[[Order[x{r}#59,ASC,LAST]]] ! \_Row[[[1, 2, 3][INTEGER] AS x, 1[INTEGER] AS y]]
\_Row[[[1, 2, 3][INTEGER] AS x, 1[INTEGER] AS y]] !
It took two iterations to prune the two redundant OrderBys.
Either, we should update the rule to only ever remove one redundant order by (that'll just require more looping), or fix this by using transformDown. If we go the latter way (IMHO preferable to avoid too many loops), we should add a test where we manually trigger the rule only once and check that it got all the redundant OrderBys immediately.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> collection.contains(p) ? ((UnaryPlan) p).child() ? p
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hide the implementation by returning the keySet as a Collection<OrderBy> on which call contains
alex-spies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this long list is gonna be dangerous/brittle to correctly maintain, esp. as we add more commands (currently in the process with CHANGE_POINT!)
Could we, instead, please add an abstract method on LogicalPlan (or UnaryPlan), like boolean dependsOnSortOrder()? Or, alternatively, have a marker interface for plans that do not depend on sorts (so that missing the marker is correct by default)?
In a follow-up, I think we can tackle queries like
row x = [1,2,3], y = 1 | sort y | mv_expand x | where x > 2
by an optimizer rule that we run after the main operator optimization batch. It would look for a Limit and then try to pull up OrderBys to it whenever that's valid. To check if it's valid, I think we need something like a method boolean LogicalPlan.canBePushedDownPastOrderBy(List<OrderBy> orders) to check all the intermediate plans that would tell us if performing the OrderBy after the respective plan node would be equivalent. (Or maybe we just want to have another round of push downs past order bys, after the operator optimizations, which would be more aggressive and include push downs of MV_EXPAND and LOOKUP JOINs - which we generally don't want to push down unless it's necessary.)
Depending on sorting and being able to be pushed past sorting are both properties that are easier to check in the context of each relevant plan node, rather than in an optimizer rule that needs to exhaustively check all relevant plan nodes (while exhaustiveness cannot be enforced by the compiler).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on a marker interface (say SortAware) under capabilities package so that commands can signal this behavior and the rule can pick it up without extra code modifications.
The rule would then perform a top-down transform and keep only the first sort for a SortAware (by default Limit to form topN).
P.S. LogicalPlan is a base class, utility methods (assuming they are used beyond just one rule) should be added elsewhere under PlannerUtils or LogicalPlans.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is safe now, as long as we don't have window functions (and, in general, functions/aggs that rely on the order of the input).
We can decide to keep it like this for now and review it when we introduce such capabilities, or we can be more paranoid about future regressions and discard such cases, but in this case we won't be able to completely avoid unbounded sorts.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -10,6 +10,7 @@ | |
| import org.elasticsearch.common.io.stream.StreamInput; | ||
| import org.elasticsearch.common.io.stream.StreamOutput; | ||
| import org.elasticsearch.xpack.esql.capabilities.PostAnalysisVerificationAware; | ||
| import org.elasticsearch.xpack.esql.capabilities.PostOptimizationVerificationAware; | ||
| import org.elasticsearch.xpack.esql.capabilities.TelemetryAware; | ||
| import org.elasticsearch.xpack.esql.common.Failures; | ||
| import org.elasticsearch.xpack.esql.core.capabilities.Resolvables; | ||
|
|
@@ -25,7 +26,7 @@ | |
|
|
||
| import static org.elasticsearch.xpack.esql.common.Failure.fail; | ||
|
|
||
| public class OrderBy extends UnaryPlan implements PostAnalysisVerificationAware, TelemetryAware { | ||
| public class OrderBy extends UnaryPlan implements PostAnalysisVerificationAware, PostOptimizationVerificationAware, TelemetryAware { | ||
| public static final NamedWriteableRegistry.Entry ENTRY = new NamedWriteableRegistry.Entry(LogicalPlan.class, "OrderBy", OrderBy::new); | ||
|
|
||
| private final List<Order> order; | ||
|
|
@@ -109,4 +110,9 @@ public void postAnalysisVerification(Failures failures) { | |
| } | ||
| }); | ||
| } | ||
|
|
||
| @Override | ||
| public void postOptimizationVerification(Failures failures) { | ||
| failures.add(fail(this, "Unbounded sort not supported yet, please add a limit")); | ||
|
||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++ to the deletion, the rule was unfortunately incorrect.