Skip to content

Commit b0c2aee

Browse files
committed
formatting
1 parent 49a7212 commit b0c2aee

File tree

1 file changed

+13
-3
lines changed

1 file changed

+13
-3
lines changed

develop-docs/application/dynamic-sampling/extrapolation.mdx

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -44,9 +44,19 @@ Sentry allows the user to aggregate data in different ways - the following aggre
4444
| percentiles | yes |
4545
| count_unique | no |
4646

47-
Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles.
48-
49-
[Insert text about how different extrapolation mechanisms work]
47+
Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. To extrapolate, the sampling weights have to be used in the following ways:
48+
49+
- **Count**: Calculate a sum of the sampling weight
50+
Example: the query `count()` becomes `round(sum(sampling weight))`.
51+
- **Sum**: Multiply each value with `sampling weight`.
52+
Example: the query `sum(foo)` becomes `sum(foo * sampling weight)`
53+
- **Average**: Use avgWeighted with sampling weight.
54+
Example: the query `avg(foo)` becomes `avgWeighted(foo, sampling weight)`
55+
- **Percentiles**: Use `*TDigestWeighted` with `sampling_weight_2`.
56+
We use the integer weight column since weighted functions in Clickhouse do not support floating point weights. Furthermore, performance and accuracy tests have shown that the t-digest function provides best runtime performance (see Resources below).
57+
Example: the query `quantile(0.95)(foo)` becomes `quantileTDigestWeighted(0.95)(foo, sampling_weight_2)`.
58+
- **Max / Min**: No extrapolation.
59+
There will be investigation into possible extrapolation for these values.
5060

5161
As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as salient.
5262

0 commit comments

Comments
 (0)