Skip to content

Commit bcee9a4

Browse files
dmehaladgoffredo
andauthored
Feature: Sampling delegation (#59)
* core implementation + simple test * add tests * wip: delegate example * self code review * Rework docker-compose * code review * fix compilation * Apply suggestions from code review Co-authored-by: David Goffredo <[email protected]> * Revising implementation * documentation minutiae * proxy-example -> http-proxy-example * revise comments in tracingutil.hpp * .hpp -> .h * remove enum base type * struct InjectionOptions in its own header * document TraceSegment::{read,write}_sampling_delegation_response * add explicit SIGTERM handler to examples/http-server/proxy * revise sampling delegation * make it easier to always use the two-argument overload of Span::inject * don't interpret the delegation request header * prevent delegation from overriding sampling * address review comments: - Protect `struct SamplingDelegation` with a mutex. - Check the result of `finalize_config(config3)` in `test_tracer.cpp`. * enable_sampling_delegation -> delegate_trace_sampling * it IS implemented! * add developer documentation for sampling delegation * restore a comment in the example proxy * mention the Span sampling delegation methods in the docs * remove some unnecessary includes from span.h * undo unnecessary inline * assert what you assume * DD_SAMPLING_DELEGATION_HEADER -> sampling_delegation_request_header * revise the description of TracerConfig::delegate_trace_sampling * compromise between compilers * initialize POD member of struct ExtractedData * remove TODO in CMakeLists.txt * it's C++, not NodeJS * Apply suggestions from code review --------- Co-authored-by: David Goffredo <[email protected]>
1 parent 89a8e36 commit bcee9a4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+3223
-1307
lines changed

BUILD.bazel

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,7 @@ cc_library(
7171
"src/datadog/hex.h",
7272
"src/datadog/http_client.h",
7373
"src/datadog/id_generator.h",
74+
"src/datadog/injection_options.h",
7475
"src/datadog/json.hpp",
7576
"src/datadog/json_fwd.hpp",
7677
"src/datadog/limiter.h",

CMakeLists.txt

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ cmake_minimum_required(VERSION 3.24)
55
project(dd-trace-cpp)
66

77
option(BUILD_COVERAGE "Build code with code coverage profiling instrumentation" OFF)
8-
option(BUILD_HASHER_EXAMPLE "Build the example program examples/hasher" OFF)
8+
option(BUILD_EXAMPLES "Build example programs" OFF)
99
option(BUILD_TESTING "Build the unit tests (test/)" OFF)
1010
option(BUILD_FUZZERS "Build fuzzers" OFF)
1111
option(BUILD_BENCHMARK "Build benchmark binaries" OFF)
@@ -120,8 +120,8 @@ target_sources(dd_trace_cpp-objects PRIVATE
120120
src/datadog/span_matcher.cpp
121121
src/datadog/span_sampler_config.cpp
122122
src/datadog/span_sampler.cpp
123-
src/datadog/tag_propagation.cpp
124123
src/datadog/tags.cpp
124+
src/datadog/tag_propagation.cpp
125125
src/datadog/threaded_event_scheduler.cpp
126126
src/datadog/tracer_config.cpp
127127
src/datadog/tracer_telemetry.cpp
@@ -163,6 +163,7 @@ target_sources(dd_trace_cpp-objects PUBLIC
163163
src/datadog/hex.h
164164
src/datadog/http_client.h
165165
src/datadog/id_generator.h
166+
src/datadog/injection_options.h
166167
src/datadog/json_fwd.hpp
167168
src/datadog/json.hpp
168169
src/datadog/limiter.h
@@ -212,7 +213,6 @@ find_package(Threads REQUIRED)
212213
target_link_libraries(dd_trace_cpp-objects
213214
PUBLIC
214215
libcurl
215-
PUBLIC
216216
Threads::Threads
217217
${COVERAGE_LIBRARIES}
218218
${COREFOUNDATION_LIBRARY}
@@ -239,8 +239,9 @@ if(BUILD_TESTING)
239239
add_subdirectory(test)
240240
endif()
241241

242-
# Each example has its own build flag.
243-
add_subdirectory(examples)
242+
if(BUILD_EXAMPLES)
243+
add_subdirectory(examples)
244+
endif()
244245

245246
if(BUILD_BENCHMARK)
246247
add_subdirectory(benchmark)

doc/sampling-delegation.md

Lines changed: 189 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,189 @@
1+
# Sampling Delegation
2+
This document is a technical description of how sampling delegation works in
3+
this library. The intended audience is maintainers of the library.
4+
5+
Sampling delegation allows a tracer to use the trace sampling decision of a
6+
service that it calls. The purpose of sampling delegation is to allow reverse
7+
proxies at the ingress of a system (gateways) to use trace sampling decisions
8+
that are decided by the actual services, as opposed to having to decide the
9+
trace sampling decision at the proxy. The idea is that putting a reverse proxy
10+
in front of your service(s) should not change how you configure sampling.
11+
12+
See the `sampling-delegation` directory in Datadog's internal architecture
13+
repository for the specification of sampling delegation.
14+
15+
## Roles
16+
In sampling delegation, a tracer plays one or both of two roles:
17+
18+
- The _delegator_ is the tracer that is configured to delegate its trace
19+
sampling decision. The delegator will request a sampling decision from one of
20+
the services it calls.
21+
- It will send the `X-Datadog-Delegate-Trace-Sampling` request header.
22+
- If it is the root service, and if delegation succeeded, then it will set the
23+
`_dd.is_sampling_decider:0` tag to indicate that some other service made the
24+
sampling decision.
25+
- The _delegatee_ is the tracer that has received a request whose headers
26+
indicate that the client is delegating the sampling decision. The delegatee
27+
will make a trace sampling decision using its own configuration, and then
28+
convey that decision back to the client.
29+
- It will send the `X-Datadog-Trace-Sampling-Decision` response header.
30+
- If its sampling decision was made locally, as opposed to delegated to yet
31+
another service, then it will set the `_dd.is_sampling_decider:1` tag to
32+
indicate that it is the service that made the sampling decision.
33+
34+
For a given trace, the tracer might act as the delegator, the delegatee, both,
35+
or neither.
36+
37+
## Tracer Configuration
38+
Whether a tracer should act as a delegator is determined by its configuration.
39+
40+
`bool TracerConfig::delegate_trace_sampling` is defined in [tracer_config.h][1]
41+
and defaults to `false`. Its value is overridden by the
42+
`DD_TRACE_DELEGATE_SAMPLING` environment variable. If `delegate_trace_sampling`
43+
is `true`, then the tracer will act as delegator.
44+
45+
## Runtime State
46+
Whether a tracer should act as a delegatee is determined by whether the
47+
extracted trace context includes the `X-Datadog-Delegate-Trace-Sampling` request
48+
header. If trace context is extracted in the Datadog style, and if the
49+
extracted context includes the `X-Datadog-Delegate-Trace-Sampling` header, then
50+
the tracer will act as delegatee.
51+
52+
All logic relevant to sampling delegation happens in `TraceSegment`, defined in
53+
[trace_segment.h][2]. The `Tracer` that creates the `TraceSegment` passes
54+
two booleans into `TraceSegment`'s constructor:
55+
56+
- `bool sampling_delegation_enabled` indicates whether the `TraceSegment` will
57+
act as delegator.
58+
- `bool sampling_decision_was_delegated_to_me` indicates whether the
59+
`TraceSegment` will act as delegatee.
60+
61+
`TraceSegment` then keeps track of its sampling delegation relevant state in a
62+
private data structure, `struct SamplingDelegation` (also defined in
63+
[trace_segment.h][2]). `struct SamplingDelegation` contains the two booleans
64+
passed into `TraceSegment`'s constructor, and additional booleans used
65+
throughout the trace segment's lifetime.
66+
67+
### `bool TraceSegment::SamplingDelegation::sent_request_header`
68+
`send_request_header` indicates that, as delegator, the trace segment included
69+
the `X-Datadog-Delegate-Trace-Sampling` request header as part of trace context
70+
sent to another service.
71+
72+
`sent_request_header` is used to prevent sampling delegation from being
73+
requested of two or more services. Once a trace segment has requested sampling
74+
delegation once, it will not request sampling delegation again, even if it never
75+
receives the delegated decision in response.
76+
77+
### `bool TraceSegment::SamplingDelegation::received_matching_response_header`
78+
`received_matching_response_header` indicates that, as delegator, the trace
79+
segment received a valid `X-Datadog-Trace-Sampling-Decision` response header
80+
from a service to which the trace segment had previously sent the
81+
`X-Datadog-Delegate-Trace-Sampling` request header.
82+
83+
The `X-Datadog-Trace-Sampling-Decision` response header is valid if it is valid
84+
JSON of the form `{"priority": int, "mechanism": int}`. See
85+
`parse_sampling_delegation_response`, defined in [trace_segment.cpp][3].
86+
87+
`received_matching_response_header` is used as part of determining whether to
88+
set the `_dd.is_sampling_decider:1` tag as delegatee. If a trace segment is
89+
acting as delegatee, and if it made the sampling decision, then it sets the tag
90+
`_dd.is_sampling_decider:1` on its local root span. However, the trace segment
91+
might also be acting as delegator. `received_matching_response_header` allows
92+
the trace segment to determine whether it delegated its decision to another
93+
service, and thus is not the "sampling decider."
94+
95+
An alternative way to determine whether a trace segment delegated its sampling
96+
decision is to see whether its `SamplingDecision::origin` has the value
97+
`SamplingDecision::Origin::DELEGATED` (see [sampling_decision.h][4]). However,
98+
a trace segment's sampling decision might be overridden at any time by
99+
`TraceSegment::override_sampling_priority(int)`. So, to answer the question
100+
"did we delegate to another service?" it is better to keep track of whether the
101+
trace segment received a valid and expected `X-Datadog-Trace-Sampling-Decision`
102+
response header, which is what `received_matching_response_header` does.
103+
104+
### `bool TraceSegment::SamplingDelegation::sent_response_header`
105+
`sent_response_header` indicates that, as delegatee, the trace segment sent its trace sampling
106+
decision back to the client in the `X-Datadog-Trace-Sampling-Decision` response
107+
header.
108+
109+
`sent_response_header` is used as part of determining whether to set the
110+
`_dd.is_sampling_decider:1` tag as delegatee. The trace segment would not claim
111+
to be the "sampling decider" if the service that delegated to it does not know
112+
about the decision. If `sent_response_header` is true, then the trace segment
113+
can be fairly confident that the client will receive the sampling decision.
114+
115+
### `bool Span::expecting_delegated_sampling_decision_`
116+
In addition to the state maintained in `TraceSegment`, `Span` also has a
117+
sampling delegation related `bool`. See [span.h][5].
118+
119+
When sampling delegation is requested for an injected `Span`, that span
120+
remembers that it injected the `X-Datadog-Delegate-Trace-Sampling` header.
121+
122+
Later, when the corresponding response is examined, the `Span` knows whether to
123+
expect the `X-Datadog-Trace-Sampling-Decision` response header to be present.
124+
125+
`bool Span::expecting_delegated_sampling_decision_` prevents a `Span` from
126+
interpreting an `X-Datadog-Trace-Sampling-Decision` response header when none
127+
was requested.
128+
129+
## Reading and Writing Responses
130+
Distributed tracing typically does not involve RPC _responses_. When a service
131+
X makes an HTTP/gRPC/etc. request to another service Y, X injects information
132+
about the trace in request metadata (e.g. HTTP request headers). Y then
133+
extracts that information from the request.
134+
135+
Responses aren't involved.
136+
137+
Now, with sampling delegation, responses _are_ involved.
138+
139+
Trace context injection and extraction are about _requests_ (sending a receiving,
140+
respectively). For _responses_ the tracing library needs a new notion.
141+
142+
`TraceSegment` has two member functions for producing and consuming
143+
response-related metadata (see [trace_segment.h][2]):
144+
145+
- `void TraceSegment::write_sampling_delegation_response(DictWriter&)` writes
146+
the `X-Datadog-Trace-Sampling-Decision` response header, if appropriate. This
147+
is something that a _delegatee_ does.
148+
- `void TraceSegment::read_sampling_delegation_response(const DictReader&)`
149+
reads the `X-Datadog-Delegate-Trace-Sampling` response header, if present.
150+
This is something that a _delegator_ does.
151+
152+
`TraceSegment::read_sampling_delegation_response` is not called directly by an
153+
instrumented application.
154+
Instead, an instrumented application calls
155+
`Span::read_sampling_delegation_response` on the `Span` that performed the
156+
injection whose response is being examined.
157+
`Span::read_sampling_delegation_response` then might call
158+
`TraceSegment::read_sampling_delegation_response`.
159+
160+
`TraceSegment::write_sampling_delegation_response` is called directly by an
161+
instrumented application.
162+
163+
Just as `Tracer::extract_span` and `Span::inject` must be called by an
164+
instrumented application in order for trace context propagation to work,
165+
`Span::read_sampling_delegation_response` and
166+
`TraceSegment::write_sampling_delegation_response` must be called by an
167+
instrumented application in order for sampling delegation to work.
168+
169+
## Per-Trace Configuration
170+
In addition to the `Tracer`-wide configuration option `bool
171+
TracerConfig::delegate_trace_sampling`, there is also a per-injection option
172+
`Optional<bool> InjectionOptions::delegate_sampling_decision`.
173+
174+
`Span::inject` has an overload
175+
`void inject(DictWriter&, const InjectionOptions&) const`. The
176+
`InjectionOptions` can be used to specify sampling delegation (or its absence)
177+
for this particular injection site. If
178+
`InjectionOptions::delegate_sampling_decision` is null, which is the default,
179+
then the tracer-wide configuration option is used instead.
180+
181+
This granularity of control is useful in NGINX, where one `location` (i.e.
182+
upstream or backend) might be configured for sampling delegation, while another
183+
`location` might not.
184+
185+
[1]: ../src/datadog/tracer_config.h
186+
[2]: ../src/datadog/trace_segment.h
187+
[3]: ../src/datadog/trace_segment.cpp
188+
[4]: ../src/datadog/sampling_decision.h
189+
[5]: ../src/datadog/sampling_decision.h

examples/CMakeLists.txt

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,2 @@
1-
if (BUILD_HASHER_EXAMPLE)
2-
add_subdirectory(hasher)
3-
endif()
1+
add_subdirectory(hasher)
2+
add_subdirectory(http-server)

examples/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,6 @@ can be used to add Datadog tracing to a C++ application.
55

66
- [hasher](hasher) is a command-line tool that creates a complete trace
77
involving only one service.
8-
- [http-server](http-server) is an ensemble of services, including one C++
9-
service traced using this library. The traces generated are distributed
8+
- [http-server](http-server) is an ensemble of services, including two C++
9+
services traced using this library. The traces generated are distributed
1010
across all of the services in the example.
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
add_subdirectory(proxy)
2+
add_subdirectory(server)

examples/http-server/Dockerfile

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
from ubuntu:22.04
2+
3+
WORKDIR /dd-trace-cpp
4+
5+
ARG DEBIAN_FRONTEND=noninteractive
6+
ARG BRANCH=v0.1.12
7+
8+
run apt update -y \
9+
&& apt install -y g++ make git wget sed \
10+
&& git clone --branch "${BRANCH}" 'https://github.com/datadog/dd-trace-cpp' . \
11+
&& bin/install-cmake \
12+
&& mkdir dist \
13+
&& cmake -B .build -DBUILD_EXAMPLES=1 . \
14+
&& cmake --build .build -j \
15+
&& cmake --install .build --prefix=dist

examples/http-server/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Click one of the results to display a flame graph of the associated trace.
5858

5959
![screenshot of flame graph](diagrams/flame-graph.png)
6060

61-
At the top is the Node.js proxy that we called using `curl`. Below that is the
61+
At the top is the C++ proxy that we called using `curl`. Below that is the
6262
C++ server to which the proxy forwarded our request. Below that is the
6363
Python database service, including a span indicating its use of SQLite.
6464

0 commit comments

Comments
 (0)