-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Remove redundant TopN
#131833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ESQL: Remove redundant TopN
#131833
Conversation
Pinging @elastic/es-analytical-engine (Team:Analytics) |
# Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
@elasticsearchmachine test this please |
Hi, @smalyshev. It looks like some CI tests are failing, possibly due to the branch being out-of-date with main. Could you try updating the branch before re-running the tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kanoshiou and apologies that it took us so long to get back to you! Your contributions are very much appreciated!
Unfortunately, I think the new version of PruneRedundantOrderBy
is affecting too many queries, including ones where removing an upstream TopN is incorrect. I give examples below.
I think we should limit the scope by adding a new rule that only looks for subsequent TopNs, specifically, and place that rule into the cleanup
batch of the LogicalPlanOptimizer
. That's more obviously correct. Cases when there are query plan nodes between two TopN
s and the upstream one should be pruned are trickier to handle and likely require to mark query plans in some way when they are compatible with this optimization.
// A child TopN is redundant if it has the same sort order as the parent. | ||
// We do not need to compare their values because `PushDownAndCombineLimits` | ||
// has already pushed down the lower limit value | ||
if (childTopN.order().equals(parentTopN.order())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution: equals
will currently still return true
even if we order by 2 attributes that have the same name+type, but not the same NameId
- i.e. attributes that mean something else.
This is a different bug that needs fixing separately, but I wanted to point out that currently it will prevent this from rightfully triggering in a couple situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...rc/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PruneRedundantOrderBy.java
Outdated
Show resolved
Hide resolved
result.add(childTopN); | ||
toCheck.push(childTopN.child()); | ||
} | ||
} else if (p instanceof SortAgnostic) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version of the rule is incorrect, I'm afraid.
We continue going down the plan as long as the children are SortAgnostic
. However, being SortAgnostic
doesn't automatically imply that the plans behave the same when we feed them an unlimited number of rows.
For instance, INLINE STATS
is SortAgnostic
(resp. InlineJoin
is). But removing a TopN
from before an INLINE STATS
changes the meaning of the query.
Also, queries can become much more expensive: For instance, MvExpand
is SortAgnostic
, but SORT a | LIMIT 10 | MV_EXPAND b | SORT a | LIMIT 10
is generally cheaper to run than just MV_EXPAND b | SORT a | LIMIT 3
.
Thank you for your review @alex-spies! I've made several changes based on your feedback. However, I'm still finding it challenging to handle cases where there are query plan nodes between two From the examples you provided in your comment, removing the second |
Hi @alex-spies, I’ve updated the branch. The optimization now removes redundant child |
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java
Redundant
TopN
operations were not being removed from the execution plan, leading to unnecessary performance overhead.We can safely consider a child
TopN
plan redundant if it shares the same sort order as its parent. There's no need to compare their limit values, sincePushDownAndCombineLimits
already pushes down the lower limit value, making it safe to remove the childTopN
.Also, the
PruneRedundantOrderBy
rule was only executed before theReplaceLimitAndSortAsTopN
, so it could only detect redundantOrderBy
operations but not redundantTopN
operations that were created later in the optimization pipeline.Closes #131233