Optimize DistinctCountHLL aggregation for high-cardinality dictionary-encoded columns

## Problem
The DISTINCTCOUNTHLL aggregation function suffers from severe [performance degradation ](https://github.com/apache/pinot/blob/f46f631ce179c9cb152b9846f580f01f4ffa33ae/pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/function/DistinctCountHLLAggregationFunction.java#L107)when processing high-cardinality dictionary-encoded columns (14 Million). Profiling shows that 50% of CPU time is spent in RoaringBitmap operations:

<img width="1584" height="684" alt="Image" src="https://github.com/user-attachments/assets/50cdb0ba-8cd5-412c-ae98-038fd5497ee9" />

For dictionary-encoded columns, the current implementation uses RoaringBitmap to track dictionary IDs during aggregation. While memory-efficient for low cardinality, this approach has O(n log n) insertion complexity that becomes prohibitively expensive for high-cardinality columns (>100K distinct values).

Queries on high-cardinality columns (1M - 15M) (e.g., user IDs, member) takes about 6 - 10sec. RoaringBitmap operations dominate query execution time. No performance benefit from using HLL over distinct count


## Proposed Solution
Implement adaptive cardinality handling that dynamically switches from RoaringBitmap to HyperLogLog:
1. Low cardinality : Use RoaringBitmap (memory efficient, exact counts)
2. High cardinality : Convert to HyperLogLog (O(1) insertions)

Tested with a POC code where we choose HyperLogLog for High- cardinality column and observed improvements from 8sec -> 700ms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize DistinctCountHLL aggregation for high-cardinality dictionary-encoded columns #17336

Problem

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimize DistinctCountHLL aggregation for high-cardinality dictionary-encoded columns #17336

Description

Problem

Proposed Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions