Skip to content

Support DISTINCT ORDER BY LIMIT query use GroupedTopKAggregateStream #19638

@haohuaijin

Description

@haohuaijin

Is your feature request related to a problem or challenge?

current the GroupedTopKAggregateStream support two type of query

select id, max(time) from t group by id order by max(time) desc limit 10
select id, min(time) from t group by id order by min(time) asc limit 10

we have another use case that i find it also can use GroupedTopKAggregateStream to spped up, like below query

select distinct id from t order by id desc/asc limit 10
select id from t group by id order by id desc/asc limit 10

because If a certain id = x is in the global top 10(the second phase of AggregateExec), then x must appear in the local top 10(the first phase of AggregateExec) of at least one partition.

Describe the solution you'd like

  1. modify the TopKAggregation optimizer rule to pass the information to AggregateExec
  2. modify GroupedTopKAggregateStream to support this case

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions