You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/en/observability/apm/sampling.asciidoc
+60-11Lines changed: 60 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,23 +29,69 @@ data might be discarded purely due to chance.
29
29
30
30
See <<apm-configure-head-based-sampling>> to get started.
31
31
32
-
**Distributed tracing with head-based sampling**
32
+
[float]
33
+
[[distributed-tracing-examples]]
34
+
===== Distributed tracing
33
35
34
36
In a distributed trace, the sampling decision is still made when the trace is initiated.
35
37
Each subsequent service respects the initial service's sampling decision, regardless of its configured sample rate;
36
38
the result is a sampling percentage that matches the initiating service.
37
39
38
-
In this example, `Service A` initiates four transactions and has sample rate of `.5` (`50%`).
39
-
The sample rates of `Service B` and `Service C` are ignored.
40
+
In the example in _Figure 1_, `Service A` initiates four transactions and has sample rate of `.5` (`50%`).
41
+
The upstream sampling decision is respected, so even if the sample rate is defined and is a different
42
+
value in `Service B` and `Service C`, the sample rate will be `.5` (`50%`) for all services.
40
43
44
+
.Upstream sampling decision is respected
41
45
image::./images/dt-sampling-example-1.png[Distributed tracing and head based sampling example one]
42
46
43
-
In this example, `Service A` initiates four transactions and has a sample rate of `1` (`100%`).
44
-
Again, the sample rates of `Service B` and `Service C` are ignored.
47
+
In the example in _Figure 2_, `Service A` initiates four transactions and has a sample rate of `1` (`100%`).
48
+
Again, the upstream sampling decision is respected, so the sample rate for all services will
49
+
be `1` (`100%`).
45
50
51
+
.Upstream sampling decision is respected
46
52
image::./images/dt-sampling-example-2.png[Distributed tracing and head based sampling example two]
47
53
48
-
**OpenTelemetry with head-based sampling**
54
+
[float]
55
+
===== Trace continuation strategies with distributed tracing
56
+
57
+
In addition to setting the sample rate, you can also specify which _trace continuation strategy_ to use.
58
+
There are three trace continuation strategies: `continue`, `restart`, and `restart_external`.
59
+
60
+
The *`continue`* trace continuation strategy is the default and will behave similar to the examples in
61
+
the <<distributed-tracing-examples,Distributed tracing section>>.
62
+
63
+
Use the *`restart_external`* trace continuation strategy on an Elastic-monitored service to start
64
+
a new trace if the previous service did not have a `traceparent` header with `es` vendor data.
65
+
This can be helpful if a transaction includes an Elastic-monitored service that is receiving requests
66
+
from an unmonitored service.
67
+
68
+
In the example in _Figure 3_, `Service A` is an Elastic-monitored service that initiates four transactions
69
+
with a sample rate of `.25` (`25%`). Because `Service B` is unmonitored, the traces started in
70
+
`Service A` will end there. `Service C` is an Elastic-monitored service that initiates four transactions
71
+
that start new traces with a new sample rate of `.5` (`50%`). Because `Service D` is also
72
+
Elastic-monitored service, the upstream sampling decision defined in `Service C` is respected.
73
+
The end result will be three sampled traces.
74
+
75
+
.Using the `restart_external` trace continuation strategy
76
+
image::./images/dt-sampling-continuation-strategy-restart_external.png[Distributed tracing and head based sampling with restart_external continuation strategy]
77
+
78
+
Use the *`restart`* trace continuation strategy on an Elastic-monitored service to start
79
+
a new trace regardless of whether the previous service had a `traceparent` header.
80
+
This can be helpful if an Elastic-monitored service is publicly exposed, and you do not
81
+
want tracing data to possibly be spoofed by user requests.
82
+
83
+
In the example in _Figure 4_, `Service A` and `Service B` are Elastic-monitored services that use the
84
+
default trace continuation strategy. `Service A` has a sample rate of `.25` (`25%`), and that
85
+
sampling decision is respected in `Service B`. `Service C` is an Elastic-monitored service that
86
+
uses the `restart` trace continuation strategy and has a sample rate of `1` (`100%`).
87
+
Because it uses `restart`, the upstream sample rate is _not_ respected in `Service C` and all four
88
+
traces will be sampled as new traces in `Service C`. The end result will be five sampled traces.
89
+
90
+
.Using the `restart` trace continuation strategy
91
+
image::./images/dt-sampling-continuation-strategy-restart.png[Distributed tracing and head based sampling with restart continuation strategy]
92
+
93
+
[float]
94
+
===== OpenTelemetry
49
95
50
96
Head-based sampling is implemented directly in the APM agents and SDKs.
51
97
The sample rate must be propagated between services and the managed intake service in order to produce accurate metrics.
@@ -54,13 +100,16 @@ OpenTelemetry offers multiple samplers. However, most samplers do not propagate
54
100
This results in inaccurate span-based metrics, like APM throughput, latency, and error metrics.
55
101
56
102
For accurate span-based metrics when using head-based sampling with OpenTelemetry, you must use
57
-
a [consistent probability sampler](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/).
103
+
a https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/[consistent probability sampler].
58
104
These samplers propagate the sample rate between services and the managed intake service, resulting in accurate metrics.
59
105
60
-
NOTE: OpenTelemetry does not offer consistent probability samplers in all languages.
106
+
[NOTE]
107
+
====
108
+
OpenTelemetry does not offer consistent probability samplers in all languages.
61
109
OpenTelemetry users should consider using tail-based sampling instead.
62
-
+
110
+
63
111
Refer to the documentation of your favorite OpenTelemetry agent or SDK for more information on the availability of consistent probability samplers.
112
+
====
64
113
65
114
[float]
66
115
[[apm-tail-based-sampling]]
@@ -99,7 +148,7 @@ and will work with traces sent by either Elastic APM agents or OpenTelemetry SDK
99
148
Due to <<apm-open-telemetry-tbs,OpenTelemetry tail-based sampling limitations>> when using https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor[tailsamplingprocessor], we recommend using APM Server tail-based sampling instead.
100
149
101
150
[float]
102
-
=== Sampled data and visualizations
151
+
==== Sampled data and visualizations
103
152
104
153
A sampled trace retains all data associated with it.
105
154
A non-sampled trace drops all <<apm-data-model-spans,span>> and <<apm-data-model-transactions,transaction>> data^1^.
@@ -125,7 +174,7 @@ The {kib} apps that utilize RUM data depend on transaction events,
125
174
so non-sampled RUM traces retain transaction data -- only span data is dropped.
126
175
127
176
[float]
128
-
=== Sample rates
177
+
==== Sample rates
129
178
130
179
What's the best sampling rate? Unfortunately, there isn't one.
131
180
Sampling is dependent on your data, the throughput of your application, data retention policies, and other factors.
0 commit comments