[FEA] Reduce overhead when computing compound aggregations in hash-based groupby

A compound aggregation is an aggregation that depends on other aggregations. For example, `MEAN` depends on `SUM` and `COUNT_VALID`. As such, when computing compound aggregations, we need to firstly compute the dependent aggregations. However, computing the intermediate results for such dependencies typically involves unnecessary work that can accumulate into a significant overhead if the number of aggregations is large.

For example:
 * For computing `MIN`/`MAX` of strings, we firstly compute `ARG_MIN`/`ARG_MAX`, producing a gather map to gather the input. However, such `ARG_MIN`/`ARG_MAX` aggregations launch kernels to compute the unused null mask and null count for the gather map.
 * Similarly, for computing `M2`, we firstly compute `SUM` and `SUM_OF_SQUARED`. These aggregations also launch kernels to compute the unused null mask and null count for the intermediate sums.

We can do better by avoiding to compute null mask and null count if not necessary. We can easily identify if an aggregation is requested by the user or just needed as an intermediate result for computing other compound aggs, then only compute its null mask/null count in such situations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Reduce overhead when computing compound aggregations in hash-based groupby #20734

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Reduce overhead when computing compound aggregations in hash-based groupby #20734

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions