-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Problem
The DISTINCTCOUNTHLL aggregation function suffers from severe performance degradation when processing high-cardinality dictionary-encoded columns (14 Million). Profiling shows that 50% of CPU time is spent in RoaringBitmap operations:
For dictionary-encoded columns, the current implementation uses RoaringBitmap to track dictionary IDs during aggregation. While memory-efficient for low cardinality, this approach has O(n log n) insertion complexity that becomes prohibitively expensive for high-cardinality columns (>100K distinct values).
Queries on high-cardinality columns (1M - 15M) (e.g., user IDs, member) takes about 6 - 10sec. RoaringBitmap operations dominate query execution time. No performance benefit from using HLL over distinct count
Proposed Solution
Implement adaptive cardinality handling that dynamically switches from RoaringBitmap to HyperLogLog:
- Low cardinality : Use RoaringBitmap (memory efficient, exact counts)
- High cardinality : Convert to HyperLogLog (O(1) insertions)
Tested with a POC code where we choose HyperLogLog for High- cardinality column and observed improvements from 8sec -> 700ms