Skip to content

Commit 363af38

Browse files
[apm] Add examples using trace_continuation_strategy to sampling docs (#4167)
* first attempt * update diagrams * reframe, restructure * address feedback * add titles to images, reference in text
1 parent a311710 commit 363af38

File tree

6 files changed

+60
-11
lines changed

6 files changed

+60
-11
lines changed
139 KB
Loading
135 KB
Loading
8.44 KB
Loading
13.1 KB
Loading
15 KB
Loading

docs/en/observability/apm/sampling.asciidoc

Lines changed: 60 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -29,23 +29,69 @@ data might be discarded purely due to chance.
2929

3030
See <<apm-configure-head-based-sampling>> to get started.
3131

32-
**Distributed tracing with head-based sampling**
32+
[float]
33+
[[distributed-tracing-examples]]
34+
===== Distributed tracing
3335

3436
In a distributed trace, the sampling decision is still made when the trace is initiated.
3537
Each subsequent service respects the initial service's sampling decision, regardless of its configured sample rate;
3638
the result is a sampling percentage that matches the initiating service.
3739

38-
In this example, `Service A` initiates four transactions and has sample rate of `.5` (`50%`).
39-
The sample rates of `Service B` and `Service C` are ignored.
40+
In the example in _Figure 1_, `Service A` initiates four transactions and has sample rate of `.5` (`50%`).
41+
The upstream sampling decision is respected, so even if the sample rate is defined and is a different
42+
value in `Service B` and `Service C`, the sample rate will be `.5` (`50%`) for all services.
4043

44+
.Upstream sampling decision is respected
4145
image::./images/dt-sampling-example-1.png[Distributed tracing and head based sampling example one]
4246

43-
In this example, `Service A` initiates four transactions and has a sample rate of `1` (`100%`).
44-
Again, the sample rates of `Service B` and `Service C` are ignored.
47+
In the example in _Figure 2_, `Service A` initiates four transactions and has a sample rate of `1` (`100%`).
48+
Again, the upstream sampling decision is respected, so the sample rate for all services will
49+
be `1` (`100%`).
4550

51+
.Upstream sampling decision is respected
4652
image::./images/dt-sampling-example-2.png[Distributed tracing and head based sampling example two]
4753

48-
**OpenTelemetry with head-based sampling**
54+
[float]
55+
===== Trace continuation strategies with distributed tracing
56+
57+
In addition to setting the sample rate, you can also specify which _trace continuation strategy_ to use.
58+
There are three trace continuation strategies: `continue`, `restart`, and `restart_external`.
59+
60+
The *`continue`* trace continuation strategy is the default and will behave similar to the examples in
61+
the <<distributed-tracing-examples,Distributed tracing section>>.
62+
63+
Use the *`restart_external`* trace continuation strategy on an Elastic-monitored service to start
64+
a new trace if the previous service did not have a `traceparent` header with `es` vendor data.
65+
This can be helpful if a transaction includes an Elastic-monitored service that is receiving requests
66+
from an unmonitored service.
67+
68+
In the example in _Figure 3_, `Service A` is an Elastic-monitored service that initiates four transactions
69+
with a sample rate of `.25` (`25%`). Because `Service B` is unmonitored, the traces started in
70+
`Service A` will end there. `Service C` is an Elastic-monitored service that initiates four transactions
71+
that start new traces with a new sample rate of `.5` (`50%`). Because `Service D` is also
72+
Elastic-monitored service, the upstream sampling decision defined in `Service C` is respected.
73+
The end result will be three sampled traces.
74+
75+
.Using the `restart_external` trace continuation strategy
76+
image::./images/dt-sampling-continuation-strategy-restart_external.png[Distributed tracing and head based sampling with restart_external continuation strategy]
77+
78+
Use the *`restart`* trace continuation strategy on an Elastic-monitored service to start
79+
a new trace regardless of whether the previous service had a `traceparent` header.
80+
This can be helpful if an Elastic-monitored service is publicly exposed, and you do not
81+
want tracing data to possibly be spoofed by user requests.
82+
83+
In the example in _Figure 4_, `Service A` and `Service B` are Elastic-monitored services that use the
84+
default trace continuation strategy. `Service A` has a sample rate of `.25` (`25%`), and that
85+
sampling decision is respected in `Service B`. `Service C` is an Elastic-monitored service that
86+
uses the `restart` trace continuation strategy and has a sample rate of `1` (`100%`).
87+
Because it uses `restart`, the upstream sample rate is _not_ respected in `Service C` and all four
88+
traces will be sampled as new traces in `Service C`. The end result will be five sampled traces.
89+
90+
.Using the `restart` trace continuation strategy
91+
image::./images/dt-sampling-continuation-strategy-restart.png[Distributed tracing and head based sampling with restart continuation strategy]
92+
93+
[float]
94+
===== OpenTelemetry
4995

5096
Head-based sampling is implemented directly in the APM agents and SDKs.
5197
The sample rate must be propagated between services and the managed intake service in order to produce accurate metrics.
@@ -54,13 +100,16 @@ OpenTelemetry offers multiple samplers. However, most samplers do not propagate
54100
This results in inaccurate span-based metrics, like APM throughput, latency, and error metrics.
55101

56102
For accurate span-based metrics when using head-based sampling with OpenTelemetry, you must use
57-
a [consistent probability sampler](https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/).
103+
a https://opentelemetry.io/docs/specs/otel/trace/tracestate-probability-sampling/[consistent probability sampler].
58104
These samplers propagate the sample rate between services and the managed intake service, resulting in accurate metrics.
59105

60-
NOTE: OpenTelemetry does not offer consistent probability samplers in all languages.
106+
[NOTE]
107+
====
108+
OpenTelemetry does not offer consistent probability samplers in all languages.
61109
OpenTelemetry users should consider using tail-based sampling instead.
62-
+
110+
63111
Refer to the documentation of your favorite OpenTelemetry agent or SDK for more information on the availability of consistent probability samplers.
112+
====
64113

65114
[float]
66115
[[apm-tail-based-sampling]]
@@ -99,7 +148,7 @@ and will work with traces sent by either Elastic APM agents or OpenTelemetry SDK
99148
Due to <<apm-open-telemetry-tbs,OpenTelemetry tail-based sampling limitations>> when using https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/processor/tailsamplingprocessor[tailsamplingprocessor], we recommend using APM Server tail-based sampling instead.
100149

101150
[float]
102-
=== Sampled data and visualizations
151+
==== Sampled data and visualizations
103152

104153
A sampled trace retains all data associated with it.
105154
A non-sampled trace drops all <<apm-data-model-spans,span>> and <<apm-data-model-transactions,transaction>> data^1^.
@@ -125,7 +174,7 @@ The {kib} apps that utilize RUM data depend on transaction events,
125174
so non-sampled RUM traces retain transaction data -- only span data is dropped.
126175

127176
[float]
128-
=== Sample rates
177+
==== Sample rates
129178

130179
What's the best sampling rate? Unfortunately, there isn't one.
131180
Sampling is dependent on your data, the throughput of your application, data retention policies, and other factors.

0 commit comments

Comments
 (0)