Skip to content

Commit d147c2f

Browse files
committed
clean up extrpaolation procedure explanations
1 parent 1fccf17 commit d147c2f

File tree

1 file changed

+4
-5
lines changed

1 file changed

+4
-5
lines changed

develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -35,17 +35,16 @@ Sentry allows the user to aggregate data in different ways - the following aggre
3535
Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles.
3636

3737
### Extrapolation for different aggregates
38-
To extrapolate, the sampling weights have to be used in the following ways:
38+
To extrapolate, sampling weights are calculated as 1/(sample rate). The sampling weights then are used in the following ways:
3939

4040
- **Count**: Calculate a sum of the sampling weight
4141
Example: the query `count()` becomes `round(sum(sampling weight))`.
4242
- **Sum**: Multiply each value with `sampling weight`.
4343
Example: the query `sum(foo)` becomes `sum(foo * sampling weight)`
44-
- **Average**: Use avgWeighted with sampling weight.
44+
- **Average**: Calculate the weighted average with sampling weight.
4545
Example: the query `avg(foo)` becomes `avgWeighted(foo, sampling weight)`
46-
- **Percentiles**: Use `*TDigestWeighted` with `sampling_weight_2`.
47-
We use the integer weight column since weighted functions in Clickhouse do not support floating point weights. Furthermore, performance and accuracy tests have shown that the t-digest function provides best runtime performance (see Resources below).
48-
Example: the query `quantile(0.95)(foo)` becomes `quantileTDigestWeighted(0.95)(foo, sampling_weight_2)`.
46+
- **Percentiles**: Calculate the weighted percentiles with sampling weight.
47+
Example: the query `quantile(0.95)(foo)` becomes `weightedPercentile(0.95)(foo, sampling weight)`.
4948

5049
As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as critical.
5150

0 commit comments

Comments
 (0)