Skip to content

Commit a7d39eb

Browse files
authored
Add note about random sampler consistency (elastic#107479) (elastic#107524)
1 parent f189516 commit a7d39eb

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

docs/reference/aggregations/bucket/random-sampler-aggregation.asciidoc

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,17 @@ higher sampling rates, the relative error is still low.
9494

9595
NOTE: This represents the result of aggregations against a typical positively skewed APM data set which also has outliers in the upper tail. The linear dependence of the relative error on the sample size is found to hold widely, but the slope depends on the variation in the quantity being aggregated. As such, the variance in your own data may
9696
cause relative error rates to increase or decrease at a different rate.
97+
[[random-sampler-consistency]]
98+
==== Random sampler consistency
99+
100+
For a given `probability` and `seed`, the random sampler aggregation is consistent when sampling unchanged data from the same shard.
101+
However, this is background random sampling if a particular document is included in the sampled set or not is dependent on current number of segments.
102+
103+
Meaning, replica vs. primary shards could return different values as different particular documents are sampled.
104+
105+
If the shard changes in via doc addition, update, deletion, or segment merging, the particular documents sampled could change, and thus the resulting statistics could change.
106+
107+
The resulting statistics used from the random sampler aggregation are approximate and should be treated as such.
97108

98109
[[random-sampler-special-cases]]
99110
==== Random sampling special cases
@@ -105,6 +116,6 @@ for a bucket is `10,000` with `probability: 0.1`, the actual number of documents
105116

106117
An exception to this is <<search-aggregations-metrics-cardinality-aggregation, cardinality aggregation>>. Unique item
107118
counts are not suitable for automatic scaling. When interpreting the cardinality count, compare it
108-
to the number of sampled docs provided in the top level `doc_count` within the random_sampler aggregation. It gives
119+
to the number of sampled docs provided in the top level `doc_count` within the random_sampler aggregation. It gives
109120
you an idea of unique values as a percentage of total values. It may not reflect, however, the exact number of unique values
110121
for the given field.

0 commit comments

Comments
 (0)