Skip to content

Commit f9575d5

Browse files
GWphuaPhua Guan Wei
andauthored
Exact Cardinality Count extension (#18021)
* Add Bitmap64 extension * Update max intermediate size * Changes to Bitmap64 after build * Replace cardinality count name * Unit tests for bitmap64 * Unit tests for bitmap aggregator * tidy up for tests and Counter * Fix checkstyle * Tidy README * Tidy review suggestions * Fix checkstyle * Clean up dependencies in pom file * Docs for extension * Fix Spelling check * Add wikipedia datasource walkthrough * Add SQL Test Revert "Add wikipedia datasource walkthrough" This reverts commit 83dfef9. Revert "Add SQL Test" This reverts commit e81a0fdc2f07b71958bc32734abe16bacbac920d. * Fix Cannot find class error * Fix Assert cannot be found * Fix test case * Fix SqlAggregator test cases * Fix checkstyle * Fix registerSerde problem * Add JavaDocs for interface * Fix null check not passing * Add dockerfiles for DruidExactCardinalityIT * Add integration tests for Druid Exact Cardinality * Checkstyle * Change groupId to contrib * Change docker file to keep things running * Set fullDatasourceName after initialization to prevent NPE with config * Fix adding contrib pom changes to distribution/pom.xml * Setup IT for contrib packages * Add DruidExactCardinality into IT * Add case to check cardinality ignores duplicates * Add case to check cardinality works on rolled-up columns: * Fix fullDatasourceName can be a local variable * Bitmap64 exactcount update * Add ExactCount extension into github actions * Fix resources/cluster docker file * Add additional line to eof * Change docs to specify columns * Eliminate unnecessary byte array copying * Change Bitmap64 function to filter out numeric types * Prevent copying of array in Output stream * Fix checkstyle * Fix UnsupportedOperationException * Fix SQL type allowable * Remove unnecessary build config * Add unit test for String Column * Type checking for String in ExactCount * Implement unit tests to confirm String column check * Update difference between exact count + distinct count * Fix spelling mistake * Address review feedback P1 - Rework ExposedByteArrayOutputStream * Address unclear exception message when decoding using base64 * Comments for 1KiB max intermediate size * Allow Bitmap64 to take directly from data input * Use underlying byte array * Checkstyle * Fix array writing * Rename to druid-bitmap-exact-count * Missing rename * Update to snapshot 35 * Renamed to druid-exact-count-bitmap * Druid exact count bitmap --------- Co-authored-by: Phua Guan Wei <[email protected]>
1 parent 7023b45 commit f9575d5

File tree

47 files changed

+4614
-4
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+4614
-4
lines changed

.github/workflows/revised-its.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ jobs:
6767
fail-fast: false
6868
matrix:
6969
jdk: [17]
70-
it: [HighAvailability, MultiStageQuery, Catalog, BatchIndex, MultiStageQueryWithMM, InputSource, InputFormat, Security, Query]
70+
it: [HighAvailability, MultiStageQuery, Catalog, BatchIndex, MultiStageQueryWithMM, InputSource, InputFormat, Security, Query, DruidExactCountBitmap]
7171
indexer: [middleManager]
7272
uses: ./.github/workflows/reusable-revised-its.yml
7373
if: ${{ needs.changes.outputs.core == 'true' || needs.changes.outputs.common-extensions == 'true' }}

distribution/pom.xml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -462,6 +462,8 @@
462462
<argument>org.apache.druid.extensions.contrib:grpc-query</argument>
463463
<argument>-c</argument>
464464
<argument>org.apache.druid.extensions.contrib:druid-ranger-security</argument>
465+
<argument>-c</argument>
466+
<argument>org.apache.druid.extensions.contrib:druid-exact-count-bitmap</argument>
465467
</arguments>
466468
</configuration>
467469
</execution>

docs/configuration/extensions.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,7 @@ All of these community extensions can be downloaded using [pull-deps](../operati
8686
|druid-ddsketch|Support for DDSketch approximate quantiles based on [DDSketch](https://github.com/datadog/sketches-java) | [link](../development/extensions-contrib/ddsketch-quantiles.md)|
8787
|druid-deltalake-extensions|Support for ingesting Delta Lake tables.|[link](../development/extensions-contrib/delta-lake.md)|
8888
|druid-distinctcount|DistinctCount aggregator|[link](../development/extensions-contrib/distinctcount.md)|
89+
|druid-exact-count-bitmap|Support for exact cardinality counting using Roaring Bitmap over a Long column.|[link](../development/extensions-contrib/druid-exact-count-bitmap.md)|
8990
|druid-iceberg-extensions|Support for ingesting Iceberg tables.|[link](../development/extensions-contrib/iceberg.md)|
9091
|druid-redis-cache|A cache implementation for Druid based on Redis.|[link](../development/extensions-contrib/redis-cache.md)|
9192
|druid-time-min-max|Min/Max aggregator for timestamp.|[link](../development/extensions-contrib/time-min-max.md)|

0 commit comments

Comments
 (0)