Skip to content

Conversation

kanoshiou
Copy link
Contributor

@kanoshiou kanoshiou commented Jul 24, 2025

Redundant TopN operations were not being removed from the execution plan, leading to unnecessary performance overhead.

We can safely consider a child TopN plan redundant if it shares the same sort order as its parent. There's no need to compare their limit values, since PushDownAndCombineLimits already pushes down the lower limit value, making it safe to remove the child TopN.

Also, the PruneRedundantOrderBy rule was only executed before the ReplaceLimitAndSortAsTopN, so it could only detect redundant OrderBy operations but not redundant TopN operations that were created later in the optimization pipeline.

Closes #131233

@elasticsearchmachine elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jul 24, 2025
@gmarouli gmarouli added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Jul 25, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

# Conflicts:
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
@smalyshev
Copy link
Contributor

@elasticsearchmachine test this please

@kanoshiou
Copy link
Contributor Author

Hi, @smalyshev. It looks like some CI tests are failing, possibly due to the branch being out-of-date with main. Could you try updating the branch before re-running the tests?

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kanoshiou and apologies that it took us so long to get back to you! Your contributions are very much appreciated!

Unfortunately, I think the new version of PruneRedundantOrderBy is affecting too many queries, including ones where removing an upstream TopN is incorrect. I give examples below.

I think we should limit the scope by adding a new rule that only looks for subsequent TopNs, specifically, and place that rule into the cleanup batch of the LogicalPlanOptimizer. That's more obviously correct. Cases when there are query plan nodes between two TopNs and the upstream one should be pruned are trickier to handle and likely require to mark query plans in some way when they are compatible with this optimization.

// A child TopN is redundant if it has the same sort order as the parent.
// We do not need to compare their values because `PushDownAndCombineLimits`
// has already pushed down the lower limit value
if (childTopN.order().equals(parentTopN.order())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution: equals will currently still return true even if we order by 2 attributes that have the same name+type, but not the same NameId - i.e. attributes that mean something else.

This is a different bug that needs fixing separately, but I wanted to point out that currently it will prevent this from rightfully triggering in a couple situations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I’m currently encountering this bug while resolving another issue. This is sad 😭

image

Is this bug touching too much code, so it’s hard to fix?

result.add(childTopN);
toCheck.push(childTopN.child());
}
} else if (p instanceof SortAgnostic) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version of the rule is incorrect, I'm afraid.

We continue going down the plan as long as the children are SortAgnostic. However, being SortAgnostic doesn't automatically imply that the plans behave the same when we feed them an unlimited number of rows.

For instance, INLINE STATS is SortAgnostic (resp. InlineJoin is). But removing a TopN from before an INLINE STATS changes the meaning of the query.

Also, queries can become much more expensive: For instance, MvExpand is SortAgnostic, but SORT a | LIMIT 10 | MV_EXPAND b | SORT a | LIMIT 10 is generally cheaper to run than just MV_EXPAND b | SORT a | LIMIT 3.

@alex-spies alex-spies self-assigned this Sep 9, 2025
@kanoshiou
Copy link
Contributor Author

Thank you for your review @alex-spies! I've made several changes based on your feedback. However, I'm still finding it challenging to handle cases where there are query plan nodes between two TopN. Any guidance on how to approach these scenarios would be greatly appreciated.

From the examples you provided in your comment, removing the second TopN in such cases could introduce performance issues (except for nodes like DROP and KEEP, perhaps?) or even alter query semantics. In your experience, under what conditions can we safely remove the redundant TopN without compromising correctness or efficiency?

@kanoshiou
Copy link
Contributor Author

Hi @alex-spies, I’ve updated the branch. The optimization now removes redundant child TopN when specific plans appear between this TopN and its parent TopN.

kanoshiou and others added 3 commits October 10, 2025 09:58
# Conflicts:
#	x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ES|QL - Redundant sorts are not removed

5 participants