From 2d6c40d0673868cb3ad4aeab5077b27cd84a8387 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 11 Nov 2024 10:26:27 +0100 Subject: [PATCH 01/14] feat(extrapolation): add extrapolation develop docs --- .../dynamic-sampling/extrapolation.mdx | 95 +++++++++++++++++++ 1 file changed, 95 insertions(+) create mode 100644 develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx new file mode 100644 index 0000000000000..62a19aea63f19 --- /dev/null +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -0,0 +1,95 @@ +### Purpose of this document & outline + +This document serves as an introduction to extrapolation, informing how extrapolation will interact with different product surfaces and how to integrate it into the product for the users’ benefit. The document covers: + +- How data is extrapolated using samples and the connected sample rate for different aggregations & which aggregations cannot be extrapolated +- The effect of extrapolation on data accuracy +- What extrapolation means for the stability of aggregations +- The benefit of extrapolation for the user + - Sample rate changes do not break alerts + - Numbers correspond to the real occurrences when looking at sufficiently large groups +- Which use cases are better served by + - extrapolated data + - sample data + +### Introduction to Extrapolation + +Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. This means that beyond a certain volume, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have a 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Of course, without making up for the sample rate, this misrepresents the volume of an application, and when different parts of the application have different sample rates, there is even an unfair bias, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency. + +To account for this fact, Sentry offers a feature called Extrapolation. Extrapolation smartly combines the data that was ingested to account for different sample rates in different parts of the application. However, low sample rates may cause the extrapolated data to be less accurate than if there was no sampling at all. + +So how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: + +- Accuracy refers to data being correct. For example, the measured number of spans corresponds to the actual number of spans that were executed. As sample rates decrease, accuracy also goes down, because minute random decisions can influence the result in major ways, in absolute numbers. Also, when traffic is low and 100% of data is sampled, the system is fully accurate despite aggregates being affected by inherent statistical uncertainty. +- Expressiveness refers to data being able to express something about the state of the observed system. For example, a single sample with specific tags and a full trace can be very expressive, and a large amount of spans can have very misleading characteristics. Expressiveness therefore depends on the use case for the data. + +Affected Product Surface + +1. Explore +2. Alerts +3. Dashboards +4. Requests +5. Profiles + +### **Modes** + +There are two modes that can be used to view data in Sentry: extrapolated mode and sample mode. + +- Extrapolated mode extrapolates the ingested data as outlined above. +- Sample mode does not extrapolate and presents exactly the data that was ingested. + +Depending on the context and the use case, one mode may be more useful than the other. + +There is currently no way for Sentry to automatically switch from the extrapolated mode into sample mode based on query attributes, therefore the transition needs to be triggered by the user. However, Sentry can nudge the user, based on observed characteristics of a query, to switch from one mode to another. One example for this is when an ID column is detected: extrapolated aggregates for high-cardinality and low-volume ID columns are usually not very useful, because they may refer to a highly exaggerated volume of data that is not extrapolated correctly due to the high-cardinality nature of the column in question. + +## Aggregates + +Sentry allows the user to aggregate data in different ways - the following aggregates are generally available, along with whether they are extrapolatable or not: + +| **Aggregate** | **Can be extrapolated?** | +| --- | --- | +| mean | yes | +| min | no | +| count | yes | +| sum | yes | +| max | no | +| percentiles | yes | +| count_unique | no | + +Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. + +[Insert text about how different extrapolation mechanisms work] + +As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as salient. + +## Why do we even extrapolate? + +At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, and reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user: + +1. **Steady data when the sample rate changes**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, suddenly you have a drop in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire on a change of sample rate. +2. **Combining different sample rates**: When your endpoints don’t have the same sample rate, how are you supposed to know the true p90 when one of your endpoints is sampled at 1% and another at 100%, but all you get is the aggregate of the samples? + +## How to deal with extrapolation in the product? + +### General approach + +In new product surfaces, the question of whether or not to use extrapolated vs non-extrapolated data is a delicate one, and it needs to be deliberated with care. In the end, it’s a judgement call on the person implementing the feature, but these questions may be a guide on the way to a decision: + +1. What should be the default, and how should the switch between modes work? + 1. In most scenarios, extrapolation should be on by default when looking at aggregates, and off when looking at samples. Switching, in most cases, should be a very conscious operations that users should be aware they are taking, and not an implicit switch that just happens to trigger when users navigate the UI. +2. Does it make sense to mix extrapolated data with non-extrapolated data? + 1. In most cases, mixing the two will be recipe for confusion. For example, offering two functions to compute an aggregate, like p90_raw and p90_extrapolated in a query interface will be very confusing to most users. Therefore, in most cases we should refrain from mixing this data implicitly. +3. When sample rates change over time, is consistency of data points over time important? + 1. In alerts, for example, consistency is very important, because noise affects the trust users have in the alerting system. A system that alerts everytime users switch sample rates is not very convenient to use, especially in larger teams. +4. Does the user care more about a truthful estimate of the aggregate data or about the actual events that happened? + 1. Some scenarios, like visualizing metrics over time, are based on aggregates, whereas a case of debugging a specific user’s problem hinges on actually seeing the specific events. The best mode depends on the intended usage of the product. + +### Confidence + +When users filter on data that has a very low count but also a low sample rate, yielding a highly extrapolated but low-sample dataset, developers and users should be careful with the conclusions they draw from the data. The storage platform provides confidence intervals along with the extrapolated estimates for the different aggregation types to indicate when there is elevated uncertainty in the data. These types of datasets are inherently noisy and may contain misleading information. When this is discovered, the user should either be very careful with the conclusions they draw from the aggregate data, or switch to non-extrapolated mode for investigation of the individual samples. + +## **Conclusion** + +- Extrapolation offers benefits in many parts of the product, but brings some inherent complexity. +- Some aggregates can be extrapolated, others cannot - we may add the capability to additional aggregates in the future. +- A lot of care should be taken about how to expose extrapolation and especially switching of the modes to the user. \ No newline at end of file From 81de4b29d28d34fdfe13f98b783df7adf0f7b665 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 11 Nov 2024 14:03:52 +0100 Subject: [PATCH 02/14] wip --- .../dynamic-sampling/extrapolation.mdx | 36 ++++++++----------- 1 file changed, 15 insertions(+), 21 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index 62a19aea63f19..31a33df0c6d1a 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -14,33 +14,33 @@ This document serves as an introduction to extrapolation, informing how extrapol ### Introduction to Extrapolation -Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. This means that beyond a certain volume, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have a 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Of course, without making up for the sample rate, this misrepresents the volume of an application, and when different parts of the application have different sample rates, there is even an unfair bias, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency. +Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. This means that beyond a certain volume, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Of course, without making up for the sample rate, this misrepresents the volume of an application, and when different parts of the application have different sample rates, there is even an unfair bias, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency. -To account for this fact, Sentry offers a feature called Extrapolation. Extrapolation smartly combines the data that was ingested to account for different sample rates in different parts of the application. However, low sample rates may cause the extrapolated data to be less accurate than if there was no sampling at all. +To account for this fact, Sentry offers a feature called Extrapolation. Extrapolation smartly combines the data that was ingested to account for different sample rates in different parts of the application. However, low sample rates will cause the extrapolated data to be less accurate than if there was no sampling at all. So how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: -- Accuracy refers to data being correct. For example, the measured number of spans corresponds to the actual number of spans that were executed. As sample rates decrease, accuracy also goes down, because minute random decisions can influence the result in major ways, in absolute numbers. Also, when traffic is low and 100% of data is sampled, the system is fully accurate despite aggregates being affected by inherent statistical uncertainty. -- Expressiveness refers to data being able to express something about the state of the observed system. For example, a single sample with specific tags and a full trace can be very expressive, and a large amount of spans can have very misleading characteristics. Expressiveness therefore depends on the use case for the data. +- **Accuracy** refers to data being correct. For example, the measured number of spans corresponds to the actual number of spans that were executed. As sample rates decrease, accuracy also goes down, because minute random decisions can influence the result in major ways, in absolute numbers. +- **Expressiveness** refers to data being able to express something about the state of the observed system. For example, a single sample with specific tags and a full trace can be very expressive, and a large amount of spans can have very misleading characteristics. Expressiveness therefore depends on the use case for the data. Also, when traffic is low and 100% of data is sampled, the system is fully accurate despite aggregates being affected by inherent statistical uncertainty that reduce expressiveness. + +At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, and reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user: + +1. **Steady data when the sample rate changes**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, suddenly you have a drop in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire on a change of sample rate. +2. **Combining different sample rates**: When your endpoints don’t have the same sample rate, how are you supposed to know the true p90 when one of your endpoints is sampled at 1% and another at 100%, but all you get is the aggregate of the samples? + -Affected Product Surface -1. Explore -2. Alerts -3. Dashboards -4. Requests -5. Profiles ### **Modes** -There are two modes that can be used to view data in Sentry: extrapolated mode and sample mode. +There are two modes that can be used to view data in Sentry: default mode and sample mode. -- Extrapolated mode extrapolates the ingested data as outlined above. +- Default mode extrapolates the ingested data as outlined below. - Sample mode does not extrapolate and presents exactly the data that was ingested. Depending on the context and the use case, one mode may be more useful than the other. -There is currently no way for Sentry to automatically switch from the extrapolated mode into sample mode based on query attributes, therefore the transition needs to be triggered by the user. However, Sentry can nudge the user, based on observed characteristics of a query, to switch from one mode to another. One example for this is when an ID column is detected: extrapolated aggregates for high-cardinality and low-volume ID columns are usually not very useful, because they may refer to a highly exaggerated volume of data that is not extrapolated correctly due to the high-cardinality nature of the column in question. +There is currently no way for Sentry to automatically switch from the default mode into sample mode based on query attributes, therefore the transition needs to be triggered by the user. However, Sentry can nudge the user, based on observed characteristics of a query, to switch from one mode to another. One example for this is when an ID column is detected: extrapolated aggregates for high-cardinality and low-volume ID columns are usually not very useful, because they may refer to a highly exaggerated volume of data that is not extrapolated correctly due to the high-cardinality nature of the column in question. ## Aggregates @@ -48,7 +48,7 @@ Sentry allows the user to aggregate data in different ways - the following aggre | **Aggregate** | **Can be extrapolated?** | | --- | --- | -| mean | yes | +| avg | yes | | min | no | | count | yes | | sum | yes | @@ -62,12 +62,6 @@ Each of these aggregates has their own way of dealing with extrapolation, due to As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as salient. -## Why do we even extrapolate? - -At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, and reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user: - -1. **Steady data when the sample rate changes**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, suddenly you have a drop in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire on a change of sample rate. -2. **Combining different sample rates**: When your endpoints don’t have the same sample rate, how are you supposed to know the true p90 when one of your endpoints is sampled at 1% and another at 100%, but all you get is the aggregate of the samples? ## How to deal with extrapolation in the product? @@ -86,7 +80,7 @@ In new product surfaces, the question of whether or not to use extrapolated vs n ### Confidence -When users filter on data that has a very low count but also a low sample rate, yielding a highly extrapolated but low-sample dataset, developers and users should be careful with the conclusions they draw from the data. The storage platform provides confidence intervals along with the extrapolated estimates for the different aggregation types to indicate when there is elevated uncertainty in the data. These types of datasets are inherently noisy and may contain misleading information. When this is discovered, the user should either be very careful with the conclusions they draw from the aggregate data, or switch to non-extrapolated mode for investigation of the individual samples. +When users filter on data that has a very low count but also a low sample rate, yielding a highly extrapolated but low-sample dataset, developers and users should be careful with the conclusions they draw from the data. The storage platform provides confidence intervals along with the extrapolated estimates for the different aggregation types to indicate when there is elevated uncertainty in the data. These types of datasets are inherently noisy and may contain misleading information. When this is discovered, the user should either be very careful with the conclusions they draw from the aggregate data, or switch to non-default mode for investigation of the individual samples. ## **Conclusion** From 68034a45c478ed13ca01f1615602676fddd4b0cb Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 11 Nov 2024 14:05:39 +0100 Subject: [PATCH 03/14] wip --- .../dynamic-sampling/extrapolation.mdx | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index 31a33df0c6d1a..8138a6bed00d7 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -1,3 +1,8 @@ +--- +title: Fidelity and Biases +sidebar_order: 5 +--- + ### Purpose of this document & outline This document serves as an introduction to extrapolation, informing how extrapolation will interact with different product surfaces and how to integrate it into the product for the users’ benefit. The document covers: From c89497b2f658f555641bd361fdfbeb0f0d7830b8 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 11 Nov 2024 14:08:42 +0100 Subject: [PATCH 04/14] add metadata --- .../application-architecture/dynamic-sampling/extrapolation.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index 8138a6bed00d7..e8d5753083fa9 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -1,5 +1,5 @@ --- -title: Fidelity and Biases +title: Extrapolation sidebar_order: 5 --- From 34975779d64015f4f5c43fa958a3d525ec0b13df Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 11 Nov 2024 14:22:50 +0100 Subject: [PATCH 05/14] add opt-out section --- .../dynamic-sampling/extrapolation.mdx | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index e8d5753083fa9..02087d160b5bc 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -45,7 +45,7 @@ There are two modes that can be used to view data in Sentry: default mode and sa Depending on the context and the use case, one mode may be more useful than the other. -There is currently no way for Sentry to automatically switch from the default mode into sample mode based on query attributes, therefore the transition needs to be triggered by the user. However, Sentry can nudge the user, based on observed characteristics of a query, to switch from one mode to another. One example for this is when an ID column is detected: extrapolated aggregates for high-cardinality and low-volume ID columns are usually not very useful, because they may refer to a highly exaggerated volume of data that is not extrapolated correctly due to the high-cardinality nature of the column in question. +Generally, default makes sense for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less useful. There may be scenarios where the user will want to switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation, therefore the sample mode settings should be a transient view option that resets to default mode when the user opens the page the next time. ## Aggregates @@ -83,6 +83,10 @@ In new product surfaces, the question of whether or not to use extrapolated vs n 4. Does the user care more about a truthful estimate of the aggregate data or about the actual events that happened? 1. Some scenarios, like visualizing metrics over time, are based on aggregates, whereas a case of debugging a specific user’s problem hinges on actually seeing the specific events. The best mode depends on the intended usage of the product. + +### Opting Out of Extrapolation +Users may want to opt out of extrapolation for different reasons. It is always possible to set the sample rate to 100% and therefore send all data to Sentry, implicitly opting out of extrapolation and behaving in the same way as sample mode. + ### Confidence When users filter on data that has a very low count but also a low sample rate, yielding a highly extrapolated but low-sample dataset, developers and users should be careful with the conclusions they draw from the data. The storage platform provides confidence intervals along with the extrapolated estimates for the different aggregation types to indicate when there is elevated uncertainty in the data. These types of datasets are inherently noisy and may contain misleading information. When this is discovered, the user should either be very careful with the conclusions they draw from the aggregate data, or switch to non-default mode for investigation of the individual samples. From 628e319e87932e7cee803a24d39bbe30d94f3012 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Tue, 12 Nov 2024 11:08:18 +0100 Subject: [PATCH 06/14] editing & formatting --- .../dynamic-sampling/extrapolation.mdx | 51 +++++++------------ 1 file changed, 17 insertions(+), 34 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index 02087d160b5bc..b45d1dc073ee6 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -3,38 +3,21 @@ title: Extrapolation sidebar_order: 5 --- -### Purpose of this document & outline +Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. This means that when configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Of course, without making up for the sample rate, this misrepresents the true volume of an application, and when different parts of the application have different sample rates, there is even an unfair bias, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency. -This document serves as an introduction to extrapolation, informing how extrapolation will interact with different product surfaces and how to integrate it into the product for the users’ benefit. The document covers: - -- How data is extrapolated using samples and the connected sample rate for different aggregations & which aggregations cannot be extrapolated -- The effect of extrapolation on data accuracy -- What extrapolation means for the stability of aggregations -- The benefit of extrapolation for the user - - Sample rate changes do not break alerts - - Numbers correspond to the real occurrences when looking at sufficiently large groups -- Which use cases are better served by - - extrapolated data - - sample data - -### Introduction to Extrapolation - -Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. This means that beyond a certain volume, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Of course, without making up for the sample rate, this misrepresents the volume of an application, and when different parts of the application have different sample rates, there is even an unfair bias, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency. - -To account for this fact, Sentry offers a feature called Extrapolation. Extrapolation smartly combines the data that was ingested to account for different sample rates in different parts of the application. However, low sample rates will cause the extrapolated data to be less accurate than if there was no sampling at all. +To account for this fact, Sentry uses extrapolation to smartly combine the data that was ingested to account for sample rates in the application. However, low sample rates will cause the extrapolated data to be less accurate than if there was no sampling at all and the application was sampled at 100%. So how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: -- **Accuracy** refers to data being correct. For example, the measured number of spans corresponds to the actual number of spans that were executed. As sample rates decrease, accuracy also goes down, because minute random decisions can influence the result in major ways, in absolute numbers. -- **Expressiveness** refers to data being able to express something about the state of the observed system. For example, a single sample with specific tags and a full trace can be very expressive, and a large amount of spans can have very misleading characteristics. Expressiveness therefore depends on the use case for the data. Also, when traffic is low and 100% of data is sampled, the system is fully accurate despite aggregates being affected by inherent statistical uncertainty that reduce expressiveness. - -At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, and reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user: - -1. **Steady data when the sample rate changes**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, suddenly you have a drop in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire on a change of sample rate. -2. **Combining different sample rates**: When your endpoints don’t have the same sample rate, how are you supposed to know the true p90 when one of your endpoints is sampled at 1% and another at 100%, but all you get is the aggregate of the samples? +- **Accuracy** refers to data being correct. For example, the measured number of spans corresponds to the actual number of spans that were executed. As sample rates decrease, accuracy also goes down, because minor random decisions can influence the result in major ways. +- **Expressiveness** refers to data being able to express something about the state of the observed system. Expressiveness refers to the usefulness of the data for the user in a specific use case. +Data can be any combination of accurate and expressive. To illustrate these properties, let's look at some examples. A single sample with specific tags and a full trace can be very expressive, and a large amount of spans can have very misleading characteristics that are not very expressive. When traffic is low and 100% of data is sampled, the system is fully accurate despite aggregates being affected by inherent statistical uncertainty that reduce expressiveness. +At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, and reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user: +- **Steady data when the sample rate changes**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, suddenly you have a drop in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire on a change of sample rate. +- **Combining different sample rates**: When your endpoints don’t have the same sample rate, how are you supposed to know the true p90 when one of your endpoints is sampled at 1% and another at 100%, but all you get is the aggregate of the samples? ### **Modes** @@ -45,7 +28,7 @@ There are two modes that can be used to view data in Sentry: default mode and sa Depending on the context and the use case, one mode may be more useful than the other. -Generally, default makes sense for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less useful. There may be scenarios where the user will want to switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation, therefore the sample mode settings should be a transient view option that resets to default mode when the user opens the page the next time. +Generally, default mose is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There may be scenarios where the user will want to switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation, therefore the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time. ## Aggregates @@ -74,14 +57,14 @@ As long as there are sufficient samples, the sample rate itself does not matter In new product surfaces, the question of whether or not to use extrapolated vs non-extrapolated data is a delicate one, and it needs to be deliberated with care. In the end, it’s a judgement call on the person implementing the feature, but these questions may be a guide on the way to a decision: -1. What should be the default, and how should the switch between modes work? - 1. In most scenarios, extrapolation should be on by default when looking at aggregates, and off when looking at samples. Switching, in most cases, should be a very conscious operations that users should be aware they are taking, and not an implicit switch that just happens to trigger when users navigate the UI. -2. Does it make sense to mix extrapolated data with non-extrapolated data? - 1. In most cases, mixing the two will be recipe for confusion. For example, offering two functions to compute an aggregate, like p90_raw and p90_extrapolated in a query interface will be very confusing to most users. Therefore, in most cases we should refrain from mixing this data implicitly. -3. When sample rates change over time, is consistency of data points over time important? - 1. In alerts, for example, consistency is very important, because noise affects the trust users have in the alerting system. A system that alerts everytime users switch sample rates is not very convenient to use, especially in larger teams. -4. Does the user care more about a truthful estimate of the aggregate data or about the actual events that happened? - 1. Some scenarios, like visualizing metrics over time, are based on aggregates, whereas a case of debugging a specific user’s problem hinges on actually seeing the specific events. The best mode depends on the intended usage of the product. +- What should be the default, and how should the switch between modes work? + - In most scenarios, extrapolation should be on by default when looking at aggregates, and off when looking at samples. Switching, in most cases, should be a very conscious operations that users should be aware they are taking, and not an implicit switch that just happens to trigger when users navigate the UI. +- Does it make sense to mix extrapolated data with non-extrapolated data? + - In most cases, mixing the two will be recipe for confusion. For example, offering two functions to compute an aggregate, like p90_raw and p90_extrapolated in a query interface will be very confusing to most users. Therefore, in most cases we should refrain from mixing this data implicitly. +- When sample rates change over time, is consistency of data points over time important? + - In alerts, for example, consistency is very important, because noise affects the trust users have in the alerting system. A system that alerts everytime users switch sample rates is not very convenient to use, especially in larger teams. +- Does the user care more about a truthful estimate of the aggregate data or about the actual events that happened? + - Some scenarios, like visualizing metrics over time, are based on aggregates, whereas a case of debugging a specific user’s problem hinges on actually seeing the specific events. The best mode depends on the intended usage of the product. ### Opting Out of Extrapolation From 40288a1c35ab9eb0ee2bde2298e4e40433bdeedc Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Tue, 12 Nov 2024 11:38:03 +0100 Subject: [PATCH 07/14] formatting --- .../dynamic-sampling/extrapolation.mdx | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index b45d1dc073ee6..cba39c016c502 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -44,9 +44,19 @@ Sentry allows the user to aggregate data in different ways - the following aggre | percentiles | yes | | count_unique | no | -Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. - -[Insert text about how different extrapolation mechanisms work] +Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. To extrapolate, the sampling weights have to be used in the following ways: + +- **Count**: Calculate a sum of the sampling weight +Example: the query `count()` becomes `round(sum(sampling weight))`. +- **Sum**: Multiply each value with `sampling weight`. +Example: the query `sum(foo)` becomes `sum(foo * sampling weight)` +- **Average**: Use avgWeighted with sampling weight. +Example: the query `avg(foo)` becomes `avgWeighted(foo, sampling weight)` +- **Percentiles**: Use `*TDigestWeighted` with `sampling_weight_2`. +We use the integer weight column since weighted functions in Clickhouse do not support floating point weights. Furthermore, performance and accuracy tests have shown that the t-digest function provides best runtime performance (see Resources below). +Example: the query `quantile(0.95)(foo)` becomes `quantileTDigestWeighted(0.95)(foo, sampling_weight_2)`. +- **Max / Min**: No extrapolation. +There will be investigation into possible extrapolation for these values. As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as salient. From 1fccf17b76b0b265b7760edaac4b20cc761b40c9 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Tue, 12 Nov 2024 13:01:29 +0100 Subject: [PATCH 08/14] restructure --- .../dynamic-sampling/extrapolation.mdx | 44 +++++++++---------- 1 file changed, 21 insertions(+), 23 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index cba39c016c502..bb43a7d4ca3f4 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -3,34 +3,22 @@ title: Extrapolation sidebar_order: 5 --- -Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. This means that when configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Of course, without making up for the sample rate, this misrepresents the true volume of an application, and when different parts of the application have different sample rates, there is even an unfair bias, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency. +Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Of course, without making up for the sample rate, any metrics attached to these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency, whose accuracy will be negatively affected by such a bias.To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. -To account for this fact, Sentry uses extrapolation to smartly combine the data that was ingested to account for sample rates in the application. However, low sample rates will cause the extrapolated data to be less accurate than if there was no sampling at all and the application was sampled at 100%. - -So how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: +So what happens during extrapolation, how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: - **Accuracy** refers to data being correct. For example, the measured number of spans corresponds to the actual number of spans that were executed. As sample rates decrease, accuracy also goes down, because minor random decisions can influence the result in major ways. - **Expressiveness** refers to data being able to express something about the state of the observed system. Expressiveness refers to the usefulness of the data for the user in a specific use case. Data can be any combination of accurate and expressive. To illustrate these properties, let's look at some examples. A single sample with specific tags and a full trace can be very expressive, and a large amount of spans can have very misleading characteristics that are not very expressive. When traffic is low and 100% of data is sampled, the system is fully accurate despite aggregates being affected by inherent statistical uncertainty that reduce expressiveness. -At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, and reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user: +At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, as well as reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user: -- **Steady data when the sample rate changes**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, suddenly you have a drop in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire on a change of sample rate. +- **Steady data when sample rates change**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, there will be a sudden change in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire when this happens. - **Combining different sample rates**: When your endpoints don’t have the same sample rate, how are you supposed to know the true p90 when one of your endpoints is sampled at 1% and another at 100%, but all you get is the aggregate of the samples? -### **Modes** - -There are two modes that can be used to view data in Sentry: default mode and sample mode. - -- Default mode extrapolates the ingested data as outlined below. -- Sample mode does not extrapolate and presents exactly the data that was ingested. - -Depending on the context and the use case, one mode may be more useful than the other. - -Generally, default mose is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There may be scenarios where the user will want to switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation, therefore the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time. - -## Aggregates +## How does extrapolation work? +### Aggregates Sentry allows the user to aggregate data in different ways - the following aggregates are generally available, along with whether they are extrapolatable or not: @@ -44,7 +32,10 @@ Sentry allows the user to aggregate data in different ways - the following aggre | percentiles | yes | | count_unique | no | -Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. To extrapolate, the sampling weights have to be used in the following ways: +Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. + +### Extrapolation for different aggregates +To extrapolate, the sampling weights have to be used in the following ways: - **Count**: Calculate a sum of the sampling weight Example: the query `count()` becomes `round(sum(sampling weight))`. @@ -55,13 +46,20 @@ Example: the query `avg(foo)` becomes `avgWeighted(foo, sampling weight)` - **Percentiles**: Use `*TDigestWeighted` with `sampling_weight_2`. We use the integer weight column since weighted functions in Clickhouse do not support floating point weights. Furthermore, performance and accuracy tests have shown that the t-digest function provides best runtime performance (see Resources below). Example: the query `quantile(0.95)(foo)` becomes `quantileTDigestWeighted(0.95)(foo, sampling_weight_2)`. -- **Max / Min**: No extrapolation. -There will be investigation into possible extrapolation for these values. - -As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as salient. +As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as critical. ## How to deal with extrapolation in the product? +### **Modes** + +There are two modes that can be used to view data in Sentry: default mode and sample mode. + +- Default mode extrapolates the ingested data as outlined below. +- Sample mode does not extrapolate and presents exactly the data that was ingested. + +Depending on the context and the use case, one mode may be more useful than the other. + +Generally, default mose is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There may be scenarios where the user will want to switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation, therefore the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time. ### General approach From d147c2fbaa0810f9fb6311f8d8c49b56fce33e9b Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Tue, 12 Nov 2024 13:03:40 +0100 Subject: [PATCH 09/14] clean up extrpaolation procedure explanations --- .../dynamic-sampling/extrapolation.mdx | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index bb43a7d4ca3f4..a040026ea63cb 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -35,17 +35,16 @@ Sentry allows the user to aggregate data in different ways - the following aggre Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. ### Extrapolation for different aggregates -To extrapolate, the sampling weights have to be used in the following ways: +To extrapolate, sampling weights are calculated as 1/(sample rate). The sampling weights then are used in the following ways: - **Count**: Calculate a sum of the sampling weight Example: the query `count()` becomes `round(sum(sampling weight))`. - **Sum**: Multiply each value with `sampling weight`. Example: the query `sum(foo)` becomes `sum(foo * sampling weight)` -- **Average**: Use avgWeighted with sampling weight. +- **Average**: Calculate the weighted average with sampling weight. Example: the query `avg(foo)` becomes `avgWeighted(foo, sampling weight)` -- **Percentiles**: Use `*TDigestWeighted` with `sampling_weight_2`. -We use the integer weight column since weighted functions in Clickhouse do not support floating point weights. Furthermore, performance and accuracy tests have shown that the t-digest function provides best runtime performance (see Resources below). -Example: the query `quantile(0.95)(foo)` becomes `quantileTDigestWeighted(0.95)(foo, sampling_weight_2)`. +- **Percentiles**: Calculate the weighted percentiles with sampling weight. +Example: the query `quantile(0.95)(foo)` becomes `weightedPercentile(0.95)(foo, sampling weight)`. As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as critical. From 12bd39449e1f08eb0a09bb1ea60e0415385920e9 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Wed, 13 Nov 2024 15:22:35 +0100 Subject: [PATCH 10/14] structure & some wording --- .../dynamic-sampling/extrapolation.mdx | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index a040026ea63cb..441cd3623846f 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -3,18 +3,20 @@ title: Extrapolation sidebar_order: 5 --- -Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Of course, without making up for the sample rate, any metrics attached to these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency, whose accuracy will be negatively affected by such a bias.To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. +Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics attached to these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency, whose accuracy will be negatively affected by such a bias. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. -So what happens during extrapolation, how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: +### Accuracy & Expressiveness +What happens during extrapolation, how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: - **Accuracy** refers to data being correct. For example, the measured number of spans corresponds to the actual number of spans that were executed. As sample rates decrease, accuracy also goes down, because minor random decisions can influence the result in major ways. - **Expressiveness** refers to data being able to express something about the state of the observed system. Expressiveness refers to the usefulness of the data for the user in a specific use case. Data can be any combination of accurate and expressive. To illustrate these properties, let's look at some examples. A single sample with specific tags and a full trace can be very expressive, and a large amount of spans can have very misleading characteristics that are not very expressive. When traffic is low and 100% of data is sampled, the system is fully accurate despite aggregates being affected by inherent statistical uncertainty that reduce expressiveness. +### Benefits of Extrapolation At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, as well as reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user: -- **Steady data when sample rates change**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, there will be a sudden change in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire when this happens. +- **Steady timeseries when sample rates change**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, there will be a sudden change in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire when this happens. - **Combining different sample rates**: When your endpoints don’t have the same sample rate, how are you supposed to know the true p90 when one of your endpoints is sampled at 1% and another at 100%, but all you get is the aggregate of the samples? ## How does extrapolation work? From 98c18a81b5086c34aab370548758d1105f388d10 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Thu, 14 Nov 2024 15:19:49 +0100 Subject: [PATCH 11/14] incorporate review coments --- .../dynamic-sampling/extrapolation.mdx | 22 ++++++++++--------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index 441cd3623846f..b354c908a9070 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -3,7 +3,7 @@ title: Extrapolation sidebar_order: 5 --- -Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics attached to these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This effect is exacerbated for numerical attributes like latency, whose accuracy will be negatively affected by such a bias. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. +Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics derived from these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This bias especially impacts numerical attributes like latency, reducing their accuracy. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. ### Accuracy & Expressiveness What happens during extrapolation, how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: @@ -16,10 +16,12 @@ Data can be any combination of accurate and expressive. To illustrate these prop ### Benefits of Extrapolation At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, as well as reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user: +- **The numbers correspond to the real world**: When data is sampled, there is some math you need to do to infer what the real numbers are, e.g. when you have 1000 samples at 10% sample rate, there are 10000 requests to your application. With extrapolation, you don't have to know your sample rate to understand what your application is actually doing. Instead, you get a view on the real behavior without additional knowledge or math required on your end. + - **Steady timeseries when sample rates change**: Whenever you change sample rates, both the count and possibly the distribution of the values will change in some way. When you switch the sample rate from 10% to 1% for whatever reason, there will be a sudden change in all associated metrics. Extrapolation corrects for this, so your graphs are steady, and your alerts don’t fire when this happens. - **Combining different sample rates**: When your endpoints don’t have the same sample rate, how are you supposed to know the true p90 when one of your endpoints is sampled at 1% and another at 100%, but all you get is the aggregate of the samples? -## How does extrapolation work? +## How Does Extrapolation Work? ### Aggregates Sentry allows the user to aggregate data in different ways - the following aggregates are generally available, along with whether they are extrapolatable or not: @@ -37,16 +39,16 @@ Sentry allows the user to aggregate data in different ways - the following aggre Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. ### Extrapolation for different aggregates -To extrapolate, sampling weights are calculated as 1/(sample rate). The sampling weights then are used in the following ways: +To extrapolate, sampling weights are calculated as `1/sample rate`. The sampling weights of each row are then used in the following ways: - **Count**: Calculate a sum of the sampling weight -Example: the query `count()` becomes `round(sum(sampling weight))`. -- **Sum**: Multiply each value with `sampling weight`. -Example: the query `sum(foo)` becomes `sum(foo * sampling weight)` +Example: the query `count()` becomes `round(sum(sampling_weight))`. +- **Sum**: Multiply each value with `sampling_weight`. +Example: the query `sum(foo)` becomes `sum(foo * sampling_weight)` - **Average**: Calculate the weighted average with sampling weight. -Example: the query `avg(foo)` becomes `avgWeighted(foo, sampling weight)` +Example: the query `avg(foo)` becomes `avgWeighted(foo, sampling_weight)` - **Percentiles**: Calculate the weighted percentiles with sampling weight. -Example: the query `quantile(0.95)(foo)` becomes `weightedPercentile(0.95)(foo, sampling weight)`. +Example: the query `quantile(0.95)(foo)` becomes `weightedPercentile(0.95)(foo, sampling_weight)`. As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as critical. @@ -60,9 +62,9 @@ There are two modes that can be used to view data in Sentry: default mode and sa Depending on the context and the use case, one mode may be more useful than the other. -Generally, default mose is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There may be scenarios where the user will want to switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation, therefore the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time. +Generally, default mode is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There are scenarios where the user needs to temporarily switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation. Therefore, the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time. -### General approach +### General Approach In new product surfaces, the question of whether or not to use extrapolated vs non-extrapolated data is a delicate one, and it needs to be deliberated with care. In the end, it’s a judgement call on the person implementing the feature, but these questions may be a guide on the way to a decision: From 9bd09a0ed612e708b8f1cb072d0c6f179ffe6983 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 18 Nov 2024 10:13:46 +0100 Subject: [PATCH 12/14] move modes upwards and add explanation for extremes vs percentiles --- .../dynamic-sampling/extrapolation.mdx | 29 ++++++++----------- 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index b354c908a9070..95edea6cd7c19 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -11,7 +11,13 @@ What happens during extrapolation, how does one handle this type of data, and wh - **Accuracy** refers to data being correct. For example, the measured number of spans corresponds to the actual number of spans that were executed. As sample rates decrease, accuracy also goes down, because minor random decisions can influence the result in major ways. - **Expressiveness** refers to data being able to express something about the state of the observed system. Expressiveness refers to the usefulness of the data for the user in a specific use case. -Data can be any combination of accurate and expressive. To illustrate these properties, let's look at some examples. A single sample with specific tags and a full trace can be very expressive, and a large amount of spans can have very misleading characteristics that are not very expressive. When traffic is low and 100% of data is sampled, the system is fully accurate despite aggregates being affected by inherent statistical uncertainty that reduce expressiveness. +### Modes +Given these properties, there are two modes that can be used to view data in Sentry: default mode and sample mode. + +- **Default mode** extrapolates the ingested data as outlined below. +- **Sample mode** does not extrapolate and presents exactly the data that was ingested. + +Depending on the context and the use case, one mode may be more useful than the other. Generally, default mode is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There are scenarios where the user needs to temporarily switch between modes, for example to examine the aggregate numbers first, and dive into the number of samples for investigation. In both modes, the user may investigate single samples to dig deeper into the details. ### Benefits of Extrapolation At first glance, extrapolation may seem unnecessarily complicated. However, for high-volume organizations, sampling is a way to control costs and egress volume, as well as reduce the amount of redundant data sent to Sentry. Why don’t we just show the user the data they send? We don’t just extrapolate for fun, it actually has some major benefits to the user: @@ -28,15 +34,15 @@ Sentry allows the user to aggregate data in different ways - the following aggre | **Aggregate** | **Can be extrapolated?** | | --- | --- | -| avg | yes | -| min | no | | count | yes | +| avg | yes | | sum | yes | -| max | no | | percentiles | yes | +| min | no | +| max | no | | count_unique | no | -Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. +Each of these aggregates has their own way of dealing with extrapolation, due to the fact that e.g. counts have to be extrapolated in a slightly different way from percentiles. While `min` and `max` are technically percentiles, we currently do not offer extrapolation due to the decreased stability of extreme aggregates when sampling. For example, the `p50` will also be more stable than the `p99`, the `min` and `max` are just extreme cases. ### Extrapolation for different aggregates To extrapolate, sampling weights are calculated as `1/sample rate`. The sampling weights of each row are then used in the following ways: @@ -53,20 +59,9 @@ Example: the query `quantile(0.95)(foo)` becomes `weightedPercentile(0.95)(foo, As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as critical. ## How to deal with extrapolation in the product? -### **Modes** - -There are two modes that can be used to view data in Sentry: default mode and sample mode. - -- Default mode extrapolates the ingested data as outlined below. -- Sample mode does not extrapolate and presents exactly the data that was ingested. - -Depending on the context and the use case, one mode may be more useful than the other. - -Generally, default mode is useful for all queries that aggregate on a dataset of sufficient volume. As absolute sample size decreases below a certain limit, default mode becomes less and less expressive. There are scenarios where the user needs to temporarily switch between modes, for example to examine the aggregate numbers first, and dive into single samples for investigation. Therefore, the extrapolation mode setting should be a transient view option that resets to default mode when the user opens the page the next time. ### General Approach - -In new product surfaces, the question of whether or not to use extrapolated vs non-extrapolated data is a delicate one, and it needs to be deliberated with care. In the end, it’s a judgement call on the person implementing the feature, but these questions may be a guide on the way to a decision: +In new product surfaces, the question of whether or not to use extrapolated vs non-extrapolated data is a delicate one, and it needs to be deliberated with care. The extrapolation mode setting should generally be a transient view option that resets to default mode when the user opens the page the next time. In the end, it’s a judgement call on the person implementing the feature, but these questions may be a guide on the way to a decision: - What should be the default, and how should the switch between modes work? - In most scenarios, extrapolation should be on by default when looking at aggregates, and off when looking at samples. Switching, in most cases, should be a very conscious operations that users should be aware they are taking, and not an implicit switch that just happens to trigger when users navigate the UI. From a627d1c7a4f25795fee629513dea9e8c6175cd45 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 18 Nov 2024 12:28:32 +0100 Subject: [PATCH 13/14] polish wording --- .../application-architecture/dynamic-sampling/extrapolation.mdx | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index 95edea6cd7c19..25d0b8f6bcbf5 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -3,7 +3,7 @@ title: Extrapolation sidebar_order: 5 --- -Sentry’s system uses sampling to reduce the amount of data ingested, for reasons of both performance and cost. When configured, Sentry only ingests a fraction of the data according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics derived from these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This bias especially impacts numerical attributes like latency, reducing their accuracy. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. +Dynamic sampling reduces the amount of data ingested, for reasons of both performance and cost. When configured, on a fraction of the data is ingested, according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics derived from these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This bias especially impacts numerical attributes like latency, reducing their accuracy. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. ### Accuracy & Expressiveness What happens during extrapolation, how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: From 635216df111811521a3b78fb1fa40b7dc3599537 Mon Sep 17 00:00:00 2001 From: Simon Hellmayr Date: Mon, 18 Nov 2024 15:04:55 +0100 Subject: [PATCH 14/14] last review comments --- .../dynamic-sampling/extrapolation.mdx | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx index 25d0b8f6bcbf5..a055e60409370 100644 --- a/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx +++ b/develop-docs/application-architecture/dynamic-sampling/extrapolation.mdx @@ -3,7 +3,7 @@ title: Extrapolation sidebar_order: 5 --- -Dynamic sampling reduces the amount of data ingested, for reasons of both performance and cost. When configured, on a fraction of the data is ingested, according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics derived from these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This bias especially impacts numerical attributes like latency, reducing their accuracy. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. +Dynamic sampling reduces the amount of data ingested, for reasons of both performance and cost. When configured, a fraction of the data is ingested, according to the specified sample rate of a project: if you sample at 10% and initially have 1000 requests to your site in a given timeframe, you will only see 100 spans in Sentry. Without making up for the sample rate, any metrics derived from these spans will misrepresent the true volume of the application. When different parts of the application have different sample rates, there will even be a bias towards some of them, skewing the total volume towards parts with higher sample rates. This bias especially impacts numerical attributes like latency, reducing their accuracy. To account for this fact, Sentry uses extrapolation to smartly combine the data to account for sample rates. ### Accuracy & Expressiveness What happens during extrapolation, how does one handle this type of data, and when is extrapolated data accurate and expressive? Let’s start with some definitions: @@ -54,7 +54,7 @@ Example: the query `sum(foo)` becomes `sum(foo * sampling_weight)` - **Average**: Calculate the weighted average with sampling weight. Example: the query `avg(foo)` becomes `avgWeighted(foo, sampling_weight)` - **Percentiles**: Calculate the weighted percentiles with sampling weight. -Example: the query `quantile(0.95)(foo)` becomes `weightedPercentile(0.95)(foo, sampling_weight)`. +Example: the query `percentile(0.95)(foo)` becomes `weightedPercentile(0.95)(foo, sampling_weight)`. As long as there are sufficient samples, the sample rate itself does not matter as much, but due to the extrapolation mechanism, what would be a fluctuation of a few samples, may turn into a much larger absolute impact e.g. in terms of the view count. Of course, when a site gets billions of visits, a fluctation of 100.000 via the noise introduced by a sample rate of 0.00001 is not as critical. @@ -72,12 +72,10 @@ In new product surfaces, the question of whether or not to use extrapolated vs n - Does the user care more about a truthful estimate of the aggregate data or about the actual events that happened? - Some scenarios, like visualizing metrics over time, are based on aggregates, whereas a case of debugging a specific user’s problem hinges on actually seeing the specific events. The best mode depends on the intended usage of the product. - ### Opting Out of Extrapolation -Users may want to opt out of extrapolation for different reasons. It is always possible to set the sample rate to 100% and therefore send all data to Sentry, implicitly opting out of extrapolation and behaving in the same way as sample mode. +Users may want to opt out of extrapolation for different reasons. It is always possible to set the sample rate for specific events to 100% and therefore send all data to Sentry, implicitly opting out of extrapolation and behaving in the same way as sample mode. Depending on their configuration, users may need to change Dynamic Sampling settings or their SDK's traces sampler callback for this. ### Confidence - When users filter on data that has a very low count but also a low sample rate, yielding a highly extrapolated but low-sample dataset, developers and users should be careful with the conclusions they draw from the data. The storage platform provides confidence intervals along with the extrapolated estimates for the different aggregation types to indicate when there is elevated uncertainty in the data. These types of datasets are inherently noisy and may contain misleading information. When this is discovered, the user should either be very careful with the conclusions they draw from the aggregate data, or switch to non-default mode for investigation of the individual samples. ## **Conclusion**