Skip to content

Conversation

@kanoshiou
Copy link
Contributor

@kanoshiou kanoshiou commented Jul 24, 2025

Redundant TopN operations were not being removed from the execution plan, leading to unnecessary performance overhead.

We can safely consider a child TopN plan redundant if it shares the same sort order as its parent. There's no need to compare their limit values, since PushDownAndCombineLimits already pushes down the lower limit value, making it safe to remove the child TopN.

Also, the PruneRedundantOrderBy rule was only executed before the ReplaceLimitAndSortAsTopN, so it could only detect redundant OrderBy operations but not redundant TopN operations that were created later in the optimization pipeline.

Closes #131233

@elasticsearchmachine elasticsearchmachine added v9.2.0 needs:triage Requires assignment of a team area label external-contributor Pull request authored by a developer outside the Elasticsearch team labels Jul 24, 2025
@gmarouli gmarouli added :Analytics/ES|QL AKA ESQL and removed needs:triage Requires assignment of a team area label labels Jul 25, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jul 25, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

# Conflicts:
#	x-pack/plugin/esql/src/test/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizerTests.java
@smalyshev
Copy link
Contributor

@elasticsearchmachine test this please

@kanoshiou
Copy link
Contributor Author

Hi, @smalyshev. It looks like some CI tests are failing, possibly due to the branch being out-of-date with main. Could you try updating the branch before re-running the tests?

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kanoshiou and apologies that it took us so long to get back to you! Your contributions are very much appreciated!

Unfortunately, I think the new version of PruneRedundantOrderBy is affecting too many queries, including ones where removing an upstream TopN is incorrect. I give examples below.

I think we should limit the scope by adding a new rule that only looks for subsequent TopNs, specifically, and place that rule into the cleanup batch of the LogicalPlanOptimizer. That's more obviously correct. Cases when there are query plan nodes between two TopNs and the upstream one should be pruned are trickier to handle and likely require to mark query plans in some way when they are compatible with this optimization.

// A child TopN is redundant if it has the same sort order as the parent.
// We do not need to compare their values because `PushDownAndCombineLimits`
// has already pushed down the lower limit value
if (childTopN.order().equals(parentTopN.order())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution: equals will currently still return true even if we order by 2 attributes that have the same name+type, but not the same NameId - i.e. attributes that mean something else.

This is a different bug that needs fixing separately, but I wanted to point out that currently it will prevent this from rightfully triggering in a couple situations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I’m currently encountering this bug while resolving another issue. This is sad 😭

image

Is this bug touching too much code, so it’s hard to fix?

result.add(childTopN);
toCheck.push(childTopN.child());
}
} else if (p instanceof SortAgnostic) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new version of the rule is incorrect, I'm afraid.

We continue going down the plan as long as the children are SortAgnostic. However, being SortAgnostic doesn't automatically imply that the plans behave the same when we feed them an unlimited number of rows.

For instance, INLINE STATS is SortAgnostic (resp. InlineJoin is). But removing a TopN from before an INLINE STATS changes the meaning of the query.

Also, queries can become much more expensive: For instance, MvExpand is SortAgnostic, but SORT a | LIMIT 10 | MV_EXPAND b | SORT a | LIMIT 10 is generally cheaper to run than just MV_EXPAND b | SORT a | LIMIT 3.

@alex-spies alex-spies self-assigned this Sep 9, 2025
@kanoshiou
Copy link
Contributor Author

Thank you for your review @alex-spies! I've made several changes based on your feedback. However, I'm still finding it challenging to handle cases where there are query plan nodes between two TopN. Any guidance on how to approach these scenarios would be greatly appreciated.

From the examples you provided in your comment, removing the second TopN in such cases could introduce performance issues (except for nodes like DROP and KEEP, perhaps?) or even alter query semantics. In your experience, under what conditions can we safely remove the redundant TopN without compromising correctness or efficiency?

@kanoshiou
Copy link
Contributor Author

Hi @alex-spies, I’ve updated the branch. The optimization now removes redundant child TopN when specific plans appear between this TopN and its parent TopN.

Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your effort @kanoshiou ! This one's not easy!

I still think we should limit the scope to ... | TOP N | TOP M | ... . It's very difficult to get the rule right if there are other nodes between the TOP N and TOP M. We have something similar in HoistRemoteEnrichTopN, but I don't think this can be applied here 1:1. In fact, the complexity increased sine HoistRemoteEnrichTopN was added because now TOP Ns can also be local, meaning, they don't necessitate breaking the pipeline. (And getting HoistRemoteEnrichTopN was highly complex, too.)

Comment on lines 96 to 98
|| p instanceof Filter
|| p instanceof Insist
|| p instanceof Limit
Copy link
Contributor

@alex-spies alex-spies Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing past limits and filters is incorrect. There is probably a class of nodes where TOP N | COMMAND | TOP M can be simplified to COMMAND | TOP min(N,M), but these cannot be included:

SORT a | LIMIT 10 | WHERE b > 10 | SORT c | LIMIT 1

vs.

LIMIT 10 | WHERE b > 10 | SORT c | LIMIT 1

The first takes the top 10 as, removes those with b <= 10 and then gives the top c from this subset.

The second takes random 10 rows, then takes those with b > 10, and finally gives the top c from this subset. But the subsets are different.

- Refactor PruneRedundantTopN to use a more straightforward approach for removing redundant TopN operations
- Replace complex breadth-first recursive search with a simpler while loop traversal
- Remove unnecessary method for checking removable child nodes
- Improve performance and readability of TopN pruning logic
- Update test case to reflect simplified optimization approach
@kanoshiou
Copy link
Contributor Author

Thanks for the review and the detailed explanation, @alex-spies!

You're absolutely right about the complexity of handling non-consecutive TopN nodes. I've simplified PruneRedundantTopN to only handle the straightforward case of consecutive TopN nodes (... | TOP N | TOP M | ...), which avoids the correctness issues you mentioned with filters and other operations in between.

# Conflicts:
#	x-pack/plugin/esql/qa/testFixtures/src/main/resources/topN.csv-spec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >enhancement external-contributor Pull request authored by a developer outside the Elasticsearch team Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v9.3.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ES|QL - Redundant sorts are not removed

5 participants