Skip to content

Commit f2facfc

Browse files
committed
[DOCS] Fix use of cardinality_error.png
1 parent dd1db50 commit f2facfc

File tree

4 files changed

+17
-3
lines changed

4 files changed

+17
-3
lines changed

docs/reference/aggregations/_snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ For a precision threshold of `c`, the implementation that we are using requires
66

77
The following chart shows how the error varies before and after the threshold:
88

9-
![cardinality error](/images/cardinality_error.png "")
9+
![cardinality error](/reference/query-languages/images/cardinality_error.png "")
1010

1111
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
1212
this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,
12.5 KB
Loading

docs/reference/aggregations/search-aggregations-metrics-cardinality-aggregation.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,9 +65,23 @@ Computing exact counts requires loading values into a hash set and returning its
6565

6666
This `cardinality` aggregation is based on the [HyperLogLog++](https://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdf) algorithm, which counts based on the hashes of the values with some interesting properties:
6767

68-
:::{include} _snippets/search-aggregations-metrics-cardinality-aggregation-explanation.md
69-
:::
68+
* configurable precision, which decides on how to trade memory for accuracy,
69+
* excellent accuracy on low-cardinality sets,
70+
* fixed memory usage: no matter if there are tens or billions of unique values, memory usage only depends on the configured precision.
7071

72+
For a precision threshold of `c`, the implementation that we are using requires about `c * 8` bytes.
73+
74+
The following chart shows how the error varies before and after the threshold:
75+
76+
![cardinality error](/reference/aggregations/images/cardinality_error.png "")
77+
78+
For all 3 thresholds, counts have been accurate up to the configured threshold. Although not guaranteed,
79+
this is likely to be the case. Accuracy in practice depends on the dataset in question. In general,
80+
most datasets show consistently good accuracy. Also note that even with a threshold as low as 100,
81+
the error remains very low (1-6% as seen in the above graph) even when counting millions of items.
82+
83+
The HyperLogLog++ algorithm depends on the leading zeros of hashed values, the exact distributions of
84+
hashes in a dataset can affect the accuracy of the cardinality.
7185

7286
## Pre-computed hashes [_pre_computed_hashes]
7387

12.5 KB
Loading

0 commit comments

Comments
 (0)