-
Notifications
You must be signed in to change notification settings - Fork 25.6k
ESQL: Remove redundant TopN
#131833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ESQL: Remove redundant TopN
#131833
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
# Conflicts: # x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
|
@elasticsearchmachine test this please |
|
Hi, @smalyshev. It looks like some CI tests are failing, possibly due to the branch being out-of-date with main. Could you try updating the branch before re-running the tests? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kanoshiou and apologies that it took us so long to get back to you! Your contributions are very much appreciated!
Unfortunately, I think the new version of PruneRedundantOrderBy is affecting too many queries, including ones where removing an upstream TopN is incorrect. I give examples below.
I think we should limit the scope by adding a new rule that only looks for subsequent TopNs, specifically, and place that rule into the cleanup batch of the LogicalPlanOptimizer. That's more obviously correct. Cases when there are query plan nodes between two TopNs and the upstream one should be pruned are trickier to handle and likely require to mark query plans in some way when they are compatible with this optimization.
| // A child TopN is redundant if it has the same sort order as the parent. | ||
| // We do not need to compare their values because `PushDownAndCombineLimits` | ||
| // has already pushed down the lower limit value | ||
| if (childTopN.order().equals(parentTopN.order())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution: equals will currently still return true even if we order by 2 attributes that have the same name+type, but not the same NameId - i.e. attributes that mean something else.
This is a different bug that needs fixing separately, but I wanted to point out that currently it will prevent this from rightfully triggering in a couple situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...rc/main/java/org/elasticsearch/xpack/esql/optimizer/rules/logical/PruneRedundantOrderBy.java
Outdated
Show resolved
Hide resolved
| result.add(childTopN); | ||
| toCheck.push(childTopN.child()); | ||
| } | ||
| } else if (p instanceof SortAgnostic) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new version of the rule is incorrect, I'm afraid.
We continue going down the plan as long as the children are SortAgnostic. However, being SortAgnostic doesn't automatically imply that the plans behave the same when we feed them an unlimited number of rows.
For instance, INLINE STATS is SortAgnostic (resp. InlineJoin is). But removing a TopN from before an INLINE STATS changes the meaning of the query.
Also, queries can become much more expensive: For instance, MvExpand is SortAgnostic, but SORT a | LIMIT 10 | MV_EXPAND b | SORT a | LIMIT 10 is generally cheaper to run than just MV_EXPAND b | SORT a | LIMIT 3.
|
Thank you for your review @alex-spies! I've made several changes based on your feedback. However, I'm still finding it challenging to handle cases where there are query plan nodes between two From the examples you provided in your comment, removing the second |
|
Hi @alex-spies, I’ve updated the branch. The optimization now removes redundant child |
# Conflicts: # x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java

Redundant
TopNoperations were not being removed from the execution plan, leading to unnecessary performance overhead.We can safely consider a child
TopNplan redundant if it shares the same sort order as its parent. There's no need to compare their limit values, sincePushDownAndCombineLimitsalready pushes down the lower limit value, making it safe to remove the childTopN.Also, the
PruneRedundantOrderByrule was only executed before theReplaceLimitAndSortAsTopN, so it could only detect redundantOrderByoperations but not redundantTopNoperations that were created later in the optimization pipeline.Closes #131233