incorporate review coments

shellmayr · shellmayr · commit 98c18a81b508 · 2024-11-18T13:48:41.000+01:00
diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx
@@ -3,7 +3,7 @@ title: Extrapolation
 sidebar_order: 5
 ---
 
-Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics attached to these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency, whose accuracy will be negatively affected by such a bias. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. 
+Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics derived from these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This bias especially impacts numerical attributes like latency, reducing their accuracy. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. 
 
 ### Accuracy & Expressiveness
 What happens during extrapolation, how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: 
@@ -16,10 +16,12 @@ Data can be any combination of accurate and expressive. To illustrate these prop
 ### Benefits of Extrapolation
 At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, as well as reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user:
 
+- **The numbers correspond to the real world**: When data is sampled, there is some math you need to do to infer what the real numbers are, e.g. when you have 1000 samples at 10% sample rate, there are 10000 requests to your application. With extrapolation, you don't have to know your sample rate to understand what your application is actually doing. Instead, you get a view on the real behavior without additional knowledge or math required on your end.
+
 - **Steady timeseries when sample rates change**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, there will be a sudden change in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire when this happens. 
 - **Combining different sample rates**: When your endpoints don’t have the same sample rate, how are you supposed to know the true p90 when one of your endpoints is sampled at 1% and another at 100%, but all you get is the aggregate of the samples?
 
-## How does extrapolation work?
+## How Does Extrapolation Work?
 ### Aggregates
 
 Sentry allows the user to aggregate data in different ways - the following aggregates are generally available, along with whether they are extrapolatable or not:
@@ -37,16 +39,16 @@ Sentry allows the user to aggregate data in different ways - the following aggre
 Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. 
 
 ### Extrapolation for different aggregates
-To extrapolate, sampling weights are calculated as 1/(sample rate). The sampling weights then are used in the following ways:
+To extrapolate, sampling weights are calculated as `1/sample rate`. The sampling weights of each row are then used in the following ways:
 
 - **Count**: Calculate a sum of the sampling weight
-Example: the query `count()` becomes `round(sum(sampling weight))`.
-- **Sum**: Multiply each value with `sampling weight`.
-Example: the query `sum(foo)` becomes `sum(foo * sampling weight)`
+Example: the query `count()` becomes `round(sum(sampling_weight))`.
+- **Sum**: Multiply each value with `sampling_weight`.
+Example: the query `sum(foo)` becomes `sum(foo * sampling_weight)`
 - **Average**: Calculate the weighted average with sampling weight.
-Example: the query `avg(foo)` becomes `avgWeighted(foo, sampling weight)`
+Example: the query `avg(foo)` becomes `avgWeighted(foo, sampling_weight)`
 - **Percentiles**: Calculate the weighted percentiles with sampling weight.
-Example: the query `quantile(0.95)(foo)` becomes `weightedPercentile(0.95)(foo, sampling weight)`.
+Example: the query `quantile(0.95)(foo)` becomes `weightedPercentile(0.95)(foo, sampling_weight)`.
 
 As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as critical. 
 
@@ -60,9 +62,9 @@ There are two modes that can be used to view data in Sentry: default mode and sa
 
 Depending on the context and the use case, one mode may be more useful than the other. 
 
-Generally, default mose is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There may be scenarios where the user will want to switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation, therefore the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time.
+Generally, default mode is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There are scenarios where the user needs to temporarily switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation. Therefore, the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time.
 
-### General approach
+### General Approach
 
 In new product surfaces, the question of whether or not to use extrapolated vs non-extrapolated data is a delicate one, and it needs to be deliberated with care. In the end, it’s a judgement call on the person implementing the feature, but these questions may be a guide on the way to a decision: