Skip to content

Commit 30a21da

Browse files
authored
Composite Head Samplers (#4321)
## Changes Migrating from open-telemetry/oteps#250. This is another approach for introducing Composite (Head) Samplers. The previous one (open-telemetry/oteps#240) proved too large, with some controversial elements. This OTEP is a split-off from that one, focusing just on one area - new Composite Samplers. Two prototypes exist for this functionality. We are seeking a third prototype. In the Java-contrib repo: https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/consistent-sampling/README.md In @jmacd's https://github.com/jmacd/go-sampler/blob/main/README.md.
1 parent 72cc859 commit 30a21da

File tree

1 file changed

+321
-0
lines changed

1 file changed

+321
-0
lines changed
Lines changed: 321 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,321 @@
1+
# Composite Samplers Proposal
2+
3+
This proposal addresses head-based sampling as described by the [Open Telemetry SDK](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#sampling).
4+
It introduces additional _composite samplers_.
5+
Composite samplers use other samplers (_delegates_ or _children_) to make sampling decisions.
6+
The composite samplers invoke the delegate samplers, but eventually make the final call.
7+
8+
The new samplers proposed here have been designed to work with Consistent Probability Samplers. For detailed description of this concept see [probability sampling (OTEP 235)](https://github.com/open-telemetry/oteps/blob/main/text/trace/0235-sampling-threshold-in-trace-state.md).
9+
Also see Draft PR 3910 [Probability Samplers based on W3C Trace Context Level 2](https://github.com/open-telemetry/opentelemetry-specification/pull/3910).
10+
11+
**Table of content:**
12+
13+
- [Motivation](#motivation)
14+
- [The Goal](#the-goal)
15+
- [Example](#example)
16+
- [Proposed Samplers](#proposed-samplers)
17+
- [New API](#new-api)
18+
- [GetSamplingIntent](#getsamplingintent)
19+
- [Required Arguments for GetSamplingIntent](#required-arguments-for-getsamplingintent)
20+
- [Return Value](#return-value)
21+
- [Requirements for the basic samplers](#requirements-for-the-basic-samplers)
22+
- [Constructing SamplingResult](#constructing-samplingresult)
23+
- [ConsistentRuleBased](#consistentrulebased)
24+
- [Predicate](#predicate)
25+
- [SpanMatches](#spanmatches)
26+
- [Required Arguments for Predicates](#required-arguments-for-predicates)
27+
- [Required Arguments for ConsistentRuleBased](#required-arguments-for-consistentrulebased)
28+
- [ConsistentParentBased](#consistentparentbased)
29+
- [ConsistentAnyOf](#consistentanyof)
30+
- [ConsistentRateLimiting](#consistentratelimiting)
31+
- [Required Arguments for ConsistentRateLimiting](#required-arguments-for-consistentratelimiting)
32+
- [Summary](#summary)
33+
- [Example - sampling configuration](#example---sampling-configuration)
34+
- [Limitations](#limitations)
35+
- [Prototyping](#prototyping)
36+
- [Prior Art](#prior-art)
37+
38+
## Motivation
39+
40+
The need for configuring head sampling has been explicitly or implicitly indicated in several discussions, both within the [Sampling SIG](https://docs.google.com/document/d/1gASMhmxNt9qCa8czEMheGlUW2xpORiYoD7dBD7aNtbQ) and in the wider community.
41+
Some of the discussions are going back a number of years, see for example
42+
43+
- issue [173](https://github.com/open-telemetry/opentelemetry-specification/issues/173): Way to ignore healthcheck traces when using automatic tracer across all languages?
44+
- issue [1060](https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/1060): Exclude URLs from Tracing
45+
- issue [1844](https://github.com/open-telemetry/opentelemetry-specification/issues/1844): Composite Sampler
46+
47+
Unfortunately, some of the valuable ideas flowing at the sampling SIG meetings never got recorded at the time of their inception, but see [Sampling SIG Research Notes](https://github.com/open-telemetry/oteps/pull/213) or the comments under [OTEP 240: A Sampling Configuration proposal](https://github.com/open-telemetry/oteps/pull/240) for some examples.
48+
49+
## The Goal
50+
51+
The goal of this proposal is to help creating advanced sampling configurations using pre-defined building blocks. Let's consider the following example of sampling requirements. It is believed that many users will have requirements following a similar pattern. Most notable elements here are trace classification based on target URL, some spans requiring special handling, and putting a sanity cap on the total volume of exported spans.
52+
53+
Since this is a new part of the sampler specification, it is expected to be an optional component for OpenTelemetry SDKs. However, if an SDK _opts-in_, it SHOULD implement all samplers described herein.
54+
55+
### Example
56+
57+
Head-based sampling requirements:
58+
59+
- for root spans:
60+
- drop all `/healthcheck` requests
61+
- capture all `/checkout` requests
62+
- capture 25% of all other requests
63+
- for non-root spans
64+
- follow the parent sampling decision
65+
- however, capture all calls to service `/foo` (even if the trace will be incomplete)
66+
- in any case, do not exceed 1000 spans/minute
67+
68+
_Note_: several proposed samplers call for calculating _unions_ of Attribute sets.
69+
Whenever such union is constructed, in case of conflicting attribute keys, the attribute definition from the last set that uses that key takes effect. Similarly, whenever modifications of `Tracestate` are performed in sequence, in case of conflicting keys, the last modification erases the previous values.
70+
71+
## Proposed Samplers
72+
73+
A principle of operation for all new samplers is that `ShouldSample` is invoked only once, on the root of the tree formed by composite samplers.
74+
All the logic provided by the composition of samplers is handled by calculating the threshold values through `GetSamplingIntent`, delegating the calculation downstream as necessary.
75+
76+
### New API
77+
78+
To make this approach possible, all Consistent Probability Samplers which participate in the samplers composition need to implement the following API, in addition to the standard Sampler API. We will use the term _Composable_ sampler to denote Consistent Probability Samplers which provide the new API and conform to the rules described here.
79+
All the samplers described below are _Composable_ samplers.
80+
81+
#### GetSamplingIntent
82+
83+
This is an operation for all `Composable` samplers. Its purpose is to query the sampler about the activities it would perform had it been asked to make a sampling decision for a given span, however, without constructing the actual sampling Decision.
84+
85+
#### Required Arguments for GetSamplingIntent
86+
87+
The arguments are the same as for [`ShouldSample`](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/sdk.md#shouldsample) except for the `TraceId`.
88+
89+
- `Context` with parent `Span`.
90+
- Name of the `Span` to be created.
91+
- `SpanKind` of the `Span` to be created.
92+
- Initial set of `Attributes` of the `Span` to be created.
93+
- Collection of links that will be associated with the `Span` to be created.
94+
95+
#### Return value
96+
97+
The return value is a structure (`SamplingIntent`) with the following elements:
98+
99+
- The THRESHOLD value represented as a 14-character hexadecimal string, with value of `null` representing non-probabilistic `DROP` decision (implementations MAY use different representation, if it appears more performant or convenient),
100+
- A function (`IsAdjustedCountReliable`) that provides a `boolean` value indicating that the adjusted count (calculated as reciprocal of the sampling probability) can be faithfully used to estimate span metrics,
101+
- A function (`GetAttributes`) that provides a set of `Attributes` to be added to the `Span` in case of a positive final sampling decision,
102+
- A function (`UpdateTraceState`) that, given an input `Tracestate` and sampling Decision, provides a `Tracestate` to be associated with the `Span`. The samplers SHOULD NOT add or modify the `th` value for the `ot` key within these functions. The root node of the tree of composite samplers is solely responsible for setting or clearing this value (see Constructing `SamplingResult` below).
103+
104+
#### Requirements for the basic samplers
105+
106+
The `ConsistentAlwaysOff` sampler MUST provide a `SamplingIntent` with
107+
108+
- The THRESHOLD value of `null` (or equivalent),
109+
- `IsAdjustedCountReliable` returning `false`,
110+
- `GetAttributes` returning an empty set,
111+
- `UpdateTraceState` returning its argument, without any modifications.
112+
113+
The `ConsistentAlwaysOn` sampler MUST provide a `SamplingIntent` with
114+
115+
- The THRESHOLD value of `00000000000000` (or equivalent),
116+
- `IsAdjustedCountReliable` returning `true`,
117+
- `GetAttributes` returning an empty set,
118+
- `UpdateTraceState` returning its argument, without any modifications.
119+
120+
The `ConsistentFixedThreshold` sampler, which is essentially `TraceIdRatioBased` sampler but implementing the `Composable` interface, MUST provide a `SamplingIntent` with
121+
122+
- The THRESHOLD value representing the threshold calculated according to the proposed [sampler requirements following OTEP 235](https://github.com/open-telemetry/opentelemetry-specification/pull/4166),
123+
- `IsAdjustedCountReliable` returning `true`,
124+
- `GetAttributes` returning an empty set,
125+
- `UpdateTraceState` returning its argument, without any modifications.
126+
127+
#### Constructing `SamplingResult`
128+
129+
The process of constructing the final `SamplingResult` in response to a call to `ShouldSample` on the root sampler of the composite samplers tree consists of the following steps.
130+
131+
- The sampler gets its own `SamplingIntent`, it is a recursive process as described below (unless the sampler is a leaf),
132+
- The sampler compares the received THRESHOLD value with the trace Randomness value to arrive at the final sampling `Decision`,
133+
- The sampler calls the received `UpdateTraceState` function passing the parent `Tracestate` and the final sampling `Decision` to get the new `Tracestate` to be associated with the `Span` - again, in most cases this is a recursive step,
134+
- In case of positive sampling decision:
135+
- the sampler calls the received `GetAttributes` function to determine the set of `Attributes` to be added to the `Span`, in most cases it will be a recursive step,
136+
- the sampler calls the received `IsAdjustedCountReliable` function, and in case of `true` it modifies the `th` value for the `ot` key in the `Tracestate` according to the received THRESHOLD; if the returned value is `false`, it removes the `th` value for the `ot` key from the `Tracestate`,
137+
- In case of negative sampling decision, it removes the `th` value for the `ot` key from the `Tracestate`.
138+
139+
### ConsistentRuleBased
140+
141+
`ConsistentRuleBased` is a composite sampler which performs `Span` categorization (e.g. when sampling decision depends on `Span` attributes) and sampling.
142+
The Spans can be grouped into separate categories, and each category can use a different Sampler.
143+
Categorization of Spans is aided by `Predicates`.
144+
145+
#### Predicate
146+
147+
The Predicates represent logical expressions which can access `Span` `Attributes` (or anything else available when the sampling decision is to be made), and perform tests on the accessible values.
148+
For example, one can test if the target URL for a SERVER span matches a given pattern.
149+
`Predicate` interface allows users to create custom categories based on information that is available at the time of making the sampling decision.
150+
To preserve integrity of consistent probability sampling, the Predicates MUST NOT depend on the parent `sampled` flag nor the lowest 56-bit of the `TraceId` (which can be representing the _randomness_ value).
151+
152+
##### SpanMatches
153+
154+
This is an operation for `Predicate`, which returns `true` if a given `Span` matches, i.e. belongs to the category described by the Predicate.
155+
156+
##### Required Arguments for Predicates
157+
158+
The arguments represent the values that are made available for `ShouldSample`.
159+
160+
- `Context` with parent `Span`.
161+
- `TraceId` of the `Span` to be created.
162+
- Name of the `Span` to be created.
163+
- Initial set of `Attributes` of the `Span` to be created.
164+
- Collection of links that will be associated with the `Span` to be created.
165+
166+
#### Required Arguments for ConsistentRuleBased
167+
168+
- optional `SpanKind`,
169+
- list of pairs (`Predicate`, `Composable`)
170+
171+
For calculating the `SamplingIntent`, if the `Span` kind matches the specified kind, or the specified kind is not given, the sampler goes through the list in the provided order and calls `SpanMatches` on `Predicate`s passing the same arguments as received.
172+
If a call returns `true`, the result is as returned by `GetSamplingIntent` called on the corresponding `Composable` - no other `Predicate`s are evaluated.
173+
If the `SpanKind` does not match, or none of the calls to `SpanMatches` yield `true`, the result is obtained by calling `GetSamplingIntent` on `ConsistentAlwaysOffSampler`.
174+
175+
### ConsistentParentBased
176+
177+
The functionality of `ConsistentParentBased` sampler corresponds to the standard `ParentBased` sampler.
178+
It takes one `Composable` sampler delegate as the argument.
179+
The delegate is used to make sampling decisions for ROOT spans.
180+
181+
The behavior of `ConsistentParentBased` caters to the case where a non-probabilistic sampler was used to sample the parent span.
182+
183+
Upon invocation of its `GetSamplingIntent` operation the sampler checks if there's a valid parent span context. If there isn't, the sampler MUST return the result of calling `GetSamplingIntent` on the delegate.
184+
185+
Otherwise, the sampler attempts to extract the threshold value from the parent trace state.
186+
The sampler MUST return a `SamplingIntent` as follows.
187+
188+
If the parent trace state has a valid threshold T:
189+
190+
- The resulting THRESHOLD value is T.
191+
- The `IsAdjustedCountReliable` returns `true`.
192+
193+
If the parent trace state has no valid threshold, the sampler examines the `sampled` flag from the traceparent.
194+
195+
If the flag is set:
196+
197+
- The resulting THRESHOLD value is `00000000000000` (or equivalent).
198+
- The `IsAdjustedCountReliable` returns `false`.
199+
200+
If the flag is not set:
201+
202+
- The resulting THRESHOLD value is `null` (or equivalent).
203+
- The `IsAdjustedCountReliable` returns `false`.
204+
205+
By default, in all cases with valid parent context:
206+
207+
- The `GetAttributes` function returns empty set.
208+
- The `UpdateTraceState` function returns its argument, without any modifications.
209+
210+
However, `ConsistentParentBased` implementations SHOULD allow users to customize added Attributes as well as modify the trace state depending on whether the parent Span is local or remote.
211+
212+
### ConsistentAnyOf
213+
214+
`ConsistentAnyOf` is a composite sampler which takes a non-empty list of `Composable` samplers (delegates) as the argument. The intention is to make a positive sampling decision if _any of_ the delegates would make a positive decision.
215+
216+
Upon invocation of its `GetSamplingIntent` operation, it MUST go through the whole list and invoke `GetSamplingIntent` operation on each delegate sampler, passing the same arguments as received.
217+
218+
`ConsistentAnyOf` sampler MUST return a `SamplingIntent` which is constructed as follows:
219+
220+
- If any of the delegates returned a non-`null` threshold value, the resulting threshold is the lexicographical minimum value T from the set of those non-`null` values, otherwise `null`.
221+
- The `IsAdjustedCountReliable` returns `true`, if any of the delegates returning the threshold value equal to T returns `true` upon calling its `IsAdjustedCountReliable` function, otherwise it returns `false`.
222+
- The `GetAttributes` function calculates the union of `Attribute` sets as returned by the calls to `GetAttributes` function for each delegate, in the declared order.
223+
- The `UpdateTraceState` function makes a chain of calls to the `UpdateTraceState` functions as returned by the delegates, passing the received `Tracestate` as argument to subsequent calls and returning the last value received.
224+
225+
Each delegate sampler MUST be given a chance to participate in calculating the `SamplingIntent` as described above and MUST see the same argument values. The order of the delegate samplers does not affect the final sampling `Decision`.
226+
227+
### ConsistentRateLimiting
228+
229+
`ConsistentRateLimiting` is a composite sampler that helps control the average rate of sampled spans while allowing another sampler (the delegate) to provide sampling hints.
230+
231+
#### Required Arguments for ConsistentRateLimiting
232+
233+
- Composable (delegate)
234+
- maximum sampling (throughput) target rate
235+
236+
The sampler SHOULD measure and keep the average rate of incoming spans, and therefore also of the desired ratio between the incoming span rate to the target span rate.
237+
Upon invocation of its `GetSamplingIntent` operation, the composite sampler MUST get the `SamplingIntent` from the delegate sampler, passing the same arguments as received.
238+
239+
The returned `SamplingIntent` is constructed as follows.
240+
241+
- If using the obtained threshold value as the final threshold would entail sampling more spans than the declared target rate, the sampler SHOULD set the threshold to a value that would meet the target rate. Several algorithms can be used for threshold adjustment, no particular behavior is prescribed by the specification though.
242+
- The `IsAdjustedCountReliable` returns the result of calling this function on the `SamplingIntent` provided by the delegate.
243+
- The `GetAttributes` function returns the result of calling this function on the `SamplingIntent` provided by the delegate.
244+
- The `UpdateTraceState` function returns the `Tracestate` as returned by calling `UpdateTraceState` from the delegate's `SamplingIntent`.
245+
246+
## Summary
247+
248+
### Example - sampling configuration
249+
250+
Going back to our [example](#example) of sampling requirements, we can now configure the head sampler to support this particular case, using an informal notation of samplers and their arguments.
251+
First, let's express the requirements for the ROOT spans as follows.
252+
253+
```
254+
S1 = ConsistentRuleBased(ROOT, {
255+
(http.target == /healthcheck) => ConsistentAlwaysOff,
256+
(http.target == /checkout) => ConsistentAlwaysOn,
257+
true => ConsistentFixedThreshold(0.25)
258+
})
259+
```
260+
261+
Note: technically, `ROOT` is not a `SpanKind`, but is a special token matching all Spans with invalid parent context (i.e. the ROOT spans, regardless of their kind).
262+
263+
In the next step, we can build the sampler to handle non-root spans as well:
264+
265+
```
266+
S2 = ConsistentParentBased(S1)
267+
```
268+
269+
The special case of calling service `/foo` can now be supported by:
270+
271+
```
272+
S3 = ConsistentAnyOf(S2, ConsistentRuleBased(CLIENT, {
273+
(http.url == /foo) => ConsistentAlwaysOn
274+
}))
275+
```
276+
277+
Finally, the last step is to put a limit on the stream of exported spans:
278+
279+
```
280+
S4 = ConsistentRateLimiting(S3, 1000)
281+
```
282+
283+
This is the complete example:
284+
285+
```
286+
S4 =
287+
ConsistentRateLimiting(
288+
ConsistentAnyOf(
289+
ConsistentParentBased(
290+
ConsistentRuleBased(ROOT, {
291+
(http.target == /healthcheck) => ConsistentAlwaysOff,
292+
(http.target == /checkout) => ConsistentAlwaysOn,
293+
true => ConsistentFixedThreshold(0.25),
294+
})
295+
),
296+
ConsistentRuleBased(CLIENT, {
297+
(http.url == /foo) = > ConsistentAlwaysOn,
298+
}),
299+
),
300+
1000,
301+
)
302+
```
303+
304+
### Limitations
305+
306+
Developers of `Composable` samplers should consider that the sampling Decision they declare as their intent might be different from the final sampling Decision.
307+
308+
### Prototyping
309+
310+
A prototype implementation of Composable Samplers for Java is available, see [ConsistentSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/consistent-sampling/src/main/java/io/opentelemetry/contrib/sampler/consistent56/ConsistentSampler.java) and its subclasses.
311+
312+
## Prior art
313+
314+
A number of composite samplers are already available as independent contributions
315+
([RuleBasedRoutingSampler](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/RuleBasedRoutingSampler.java),
316+
[Stratified Sampling](https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/trace/stratified-sampling-example),
317+
LinksBasedSampler [for Java](https://github.com/open-telemetry/opentelemetry-java-contrib/blob/main/samplers/src/main/java/io/opentelemetry/contrib/sampler/LinksBasedSampler.java)
318+
and [for DOTNET](https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/docs/trace/links-based-sampler)).
319+
Also, historically, some Span categorization was introduced by [JaegerRemoteSampler](https://www.jaegertracing.io/docs/1.54/sampling/#remote-sampling).
320+
321+
This proposal aims at generalizing these ideas, and at providing a bit more formal specification for the behavior of the composite samplers.

0 commit comments

Comments
 (0)