|
| 1 | +# Sampling Delegation |
| 2 | +This document is a technical description of how sampling delegation works in |
| 3 | +this library. The intended audience is maintainers of the library. |
| 4 | + |
| 5 | +Sampling delegation allows a tracer to use the trace sampling decision of a |
| 6 | +service that it calls. The purpose of sampling delegation is to allow reverse |
| 7 | +proxies at the ingress of a system (gateways) to use trace sampling decisions |
| 8 | +that are decided by the actual services, as opposed to having to decide the |
| 9 | +trace sampling decision at the proxy. The idea is that putting a reverse proxy |
| 10 | +in front of your service(s) should not change how you configure sampling. |
| 11 | + |
| 12 | +See the `sampling-delegation` directory in Datadog's internal architecture |
| 13 | +repository for the specification of sampling delegation. |
| 14 | + |
| 15 | +## Roles |
| 16 | +In sampling delegation, a tracer plays one or both of two roles: |
| 17 | + |
| 18 | +- The _delegator_ is the tracer that is configured to delegate its trace |
| 19 | + sampling decision. The delegator will request a sampling decision from one of |
| 20 | + the services it calls. |
| 21 | + - It will send the `X-Datadog-Delegate-Trace-Sampling` request header. |
| 22 | + - If it is the root service, and if delegation succeeded, then it will set the |
| 23 | + `_dd.is_sampling_decider:0` tag to indicate that some other service made the |
| 24 | + sampling decision. |
| 25 | +- The _delegatee_ is the tracer that has received a request whose headers |
| 26 | + indicate that the client is delegating the sampling decision. The delegatee |
| 27 | + will make a trace sampling decision using its own configuration, and then |
| 28 | + convey that decision back to the client. |
| 29 | + - It will send the `X-Datadog-Trace-Sampling-Decision` response header. |
| 30 | + - If its sampling decision was made locally, as opposed to delegated to yet |
| 31 | + another service, then it will set the `_dd.is_sampling_decider:1` tag to |
| 32 | + indicate that it is the service that made the sampling decision. |
| 33 | + |
| 34 | +For a given trace, the tracer might act as the delegator, the delegatee, both, |
| 35 | +or neither. |
| 36 | + |
| 37 | +## Tracer Configuration |
| 38 | +Whether a tracer should act as a delegator is determined by its configuration. |
| 39 | + |
| 40 | +`bool TracerConfig::delegate_trace_sampling` is defined in [tracer_config.h][1] |
| 41 | +and defaults to `false`. Its value is overridden by the |
| 42 | +`DD_TRACE_DELEGATE_SAMPLING` environment variable. If `delegate_trace_sampling` |
| 43 | +is `true`, then the tracer will act as delegator. |
| 44 | + |
| 45 | +## Runtime State |
| 46 | +Whether a tracer should act as a delegatee is determined by whether the |
| 47 | +extracted trace context includes the `X-Datadog-Delegate-Trace-Sampling` request |
| 48 | +header. If trace context is extracted in the Datadog style, and if the |
| 49 | +extracted context includes the `X-Datadog-Delegate-Trace-Sampling` header, then |
| 50 | +the tracer will act as delegatee. |
| 51 | + |
| 52 | +All logic relevant to sampling delegation happens in `TraceSegment`, defined in |
| 53 | +[trace_segment.h][2]. The `Tracer` that creates the `TraceSegment` passes |
| 54 | +two booleans into `TraceSegment`'s constructor: |
| 55 | + |
| 56 | +- `bool sampling_delegation_enabled` indicates whether the `TraceSegment` will |
| 57 | + act as delegator. |
| 58 | +- `bool sampling_decision_was_delegated_to_me` indicates whether the |
| 59 | + `TraceSegment` will act as delegatee. |
| 60 | + |
| 61 | +`TraceSegment` then keeps track of its sampling delegation relevant state in a |
| 62 | +private data structure, `struct SamplingDelegation` (also defined in |
| 63 | +[trace_segment.h][2]). `struct SamplingDelegation` contains the two booleans |
| 64 | +passed into `TraceSegment`'s constructor, and additional booleans used |
| 65 | +throughout the trace segment's lifetime. |
| 66 | + |
| 67 | +### `bool TraceSegment::SamplingDelegation::sent_request_header` |
| 68 | +`send_request_header` indicates that, as delegator, the trace segment included |
| 69 | +the `X-Datadog-Delegate-Trace-Sampling` request header as part of trace context |
| 70 | +sent to another service. |
| 71 | + |
| 72 | +`sent_request_header` is used to prevent sampling delegation from being |
| 73 | +requested of two or more services. Once a trace segment has requested sampling |
| 74 | +delegation once, it will not request sampling delegation again, even if it never |
| 75 | +receives the delegated decision in response. |
| 76 | + |
| 77 | +### `bool TraceSegment::SamplingDelegation::received_matching_response_header` |
| 78 | +`received_matching_response_header` indicates that, as delegator, the trace |
| 79 | +segment received a valid `X-Datadog-Trace-Sampling-Decision` response header |
| 80 | +from a service to which the trace segment had previously sent the |
| 81 | +`X-Datadog-Delegate-Trace-Sampling` request header. |
| 82 | + |
| 83 | +The `X-Datadog-Trace-Sampling-Decision` response header is valid if it is valid |
| 84 | +JSON of the form `{"priority": int, "mechanism": int}`. See |
| 85 | +`parse_sampling_delegation_response`, defined in [trace_segment.cpp][3]. |
| 86 | + |
| 87 | +`received_matching_response_header` is used as part of determining whether to |
| 88 | +set the `_dd.is_sampling_decider:1` tag as delegatee. If a trace segment is |
| 89 | +acting as delegatee, and if it made the sampling decision, then it sets the tag |
| 90 | +`_dd.is_sampling_decider:1` on its local root span. However, the trace segment |
| 91 | +might also be acting as delegator. `received_matching_response_header` allows |
| 92 | +the trace segment to determine whether it delegated its decision to another |
| 93 | +service, and thus is not the "sampling decider." |
| 94 | + |
| 95 | +An alternative way to determine whether a trace segment delegated its sampling |
| 96 | +decision is to see whether its `SamplingDecision::origin` has the value |
| 97 | +`SamplingDecision::Origin::DELEGATED` (see [sampling_decision.h][4]). However, |
| 98 | +a trace segment's sampling decision might be overridden at any time by |
| 99 | +`TraceSegment::override_sampling_priority(int)`. So, to answer the question |
| 100 | +"did we delegate to another service?" it is better to keep track of whether the |
| 101 | +trace segment received a valid and expected `X-Datadog-Trace-Sampling-Decision` |
| 102 | +response header, which is what `received_matching_response_header` does. |
| 103 | + |
| 104 | +### `bool TraceSegment::SamplingDelegation::sent_response_header` |
| 105 | +`sent_response_header` indicates that, as delegatee, the trace segment sent its trace sampling |
| 106 | +decision back to the client in the `X-Datadog-Trace-Sampling-Decision` response |
| 107 | +header. |
| 108 | + |
| 109 | +`sent_response_header` is used as part of determining whether to set the |
| 110 | +`_dd.is_sampling_decider:1` tag as delegatee. The trace segment would not claim |
| 111 | +to be the "sampling decider" if the service that delegated to it does not know |
| 112 | +about the decision. If `sent_response_header` is true, then the trace segment |
| 113 | +can be fairly confident that the client will receive the sampling decision. |
| 114 | + |
| 115 | +### `bool Span::expecting_delegated_sampling_decision_` |
| 116 | +In addition to the state maintained in `TraceSegment`, `Span` also has a |
| 117 | +sampling delegation related `bool`. See [span.h][5]. |
| 118 | + |
| 119 | +When sampling delegation is requested for an injected `Span`, that span |
| 120 | +remembers that it injected the `X-Datadog-Delegate-Trace-Sampling` header. |
| 121 | + |
| 122 | +Later, when the corresponding response is examined, the `Span` knows whether to |
| 123 | +expect the `X-Datadog-Trace-Sampling-Decision` response header to be present. |
| 124 | + |
| 125 | +`bool Span::expecting_delegated_sampling_decision_` prevents a `Span` from |
| 126 | +interpreting an `X-Datadog-Trace-Sampling-Decision` response header when none |
| 127 | +was requested. |
| 128 | + |
| 129 | +## Reading and Writing Responses |
| 130 | +Distributed tracing typically does not involve RPC _responses_. When a service |
| 131 | +X makes an HTTP/gRPC/etc. request to another service Y, X injects information |
| 132 | +about the trace in request metadata (e.g. HTTP request headers). Y then |
| 133 | +extracts that information from the request. |
| 134 | + |
| 135 | +Responses aren't involved. |
| 136 | + |
| 137 | +Now, with sampling delegation, responses _are_ involved. |
| 138 | + |
| 139 | +Trace context injection and extraction are about _requests_ (sending a receiving, |
| 140 | +respectively). For _responses_ the tracing library needs a new notion. |
| 141 | + |
| 142 | +`TraceSegment` has two member functions for producing and consuming |
| 143 | +response-related metadata (see [trace_segment.h][2]): |
| 144 | + |
| 145 | +- `void TraceSegment::write_sampling_delegation_response(DictWriter&)` writes |
| 146 | + the `X-Datadog-Trace-Sampling-Decision` response header, if appropriate. This |
| 147 | + is something that a _delegatee_ does. |
| 148 | +- `void TraceSegment::read_sampling_delegation_response(const DictReader&)` |
| 149 | + reads the `X-Datadog-Delegate-Trace-Sampling` response header, if present. |
| 150 | + This is something that a _delegator_ does. |
| 151 | + |
| 152 | +`TraceSegment::read_sampling_delegation_response` is not called directly by an |
| 153 | +instrumented application. |
| 154 | +Instead, an instrumented application calls |
| 155 | +`Span::read_sampling_delegation_response` on the `Span` that performed the |
| 156 | +injection whose response is being examined. |
| 157 | +`Span::read_sampling_delegation_response` then might call |
| 158 | +`TraceSegment::read_sampling_delegation_response`. |
| 159 | + |
| 160 | +`TraceSegment::write_sampling_delegation_response` is called directly by an |
| 161 | +instrumented application. |
| 162 | + |
| 163 | +Just as `Tracer::extract_span` and `Span::inject` must be called by an |
| 164 | +instrumented application in order for trace context propagation to work, |
| 165 | +`Span::read_sampling_delegation_response` and |
| 166 | +`TraceSegment::write_sampling_delegation_response` must be called by an |
| 167 | +instrumented application in order for sampling delegation to work. |
| 168 | + |
| 169 | +## Per-Trace Configuration |
| 170 | +In addition to the `Tracer`-wide configuration option `bool |
| 171 | +TracerConfig::delegate_trace_sampling`, there is also a per-injection option |
| 172 | +`Optional<bool> InjectionOptions::delegate_sampling_decision`. |
| 173 | + |
| 174 | +`Span::inject` has an overload |
| 175 | +`void inject(DictWriter&, const InjectionOptions&) const`. The |
| 176 | +`InjectionOptions` can be used to specify sampling delegation (or its absence) |
| 177 | +for this particular injection site. If |
| 178 | +`InjectionOptions::delegate_sampling_decision` is null, which is the default, |
| 179 | +then the tracer-wide configuration option is used instead. |
| 180 | + |
| 181 | +This granularity of control is useful in NGINX, where one `location` (i.e. |
| 182 | +upstream or backend) might be configured for sampling delegation, while another |
| 183 | +`location` might not. |
| 184 | + |
| 185 | +[1]: ../src/datadog/tracer_config.h |
| 186 | +[2]: ../src/datadog/trace_segment.h |
| 187 | +[3]: ../src/datadog/trace_segment.cpp |
| 188 | +[4]: ../src/datadog/sampling_decision.h |
| 189 | +[5]: ../src/datadog/sampling_decision.h |
0 commit comments