Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ For a precision threshold of `c`, the implementation that we are using requires

The following chart shows how the error varies before and after the threshold:

![cardinality error](/images/cardinality_error.png "")
![cardinality error](/reference/query-languages/images/cardinality_error.png "")

For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,23 @@ Computing exact counts requires loading values into a hash set and returning its

This `cardinality` aggregation is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties:

:::{include} _snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md
:::
* configurable precision, which decides on how to trade memory for accuracy,
* excellent accuracy on low-cardinality sets,
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.

For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.

The following chart shows how the error varies before and after the threshold:

![cardinality error](/reference/aggregations/images/cardinality_error.png "")

For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,
most datasets show consistently good accuracy. Also note that even with a threshold as low as 100,
the error remains very low (1-6% as seen in the above graph) even when counting millions of items.

The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of
hashes in a dataset can affect the accuracy of the cardinality.

## Pre-computed hashes [_pre_computed_hashes]

Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading