Skip to content

Commit f8577b0

Browse files
feat(tracing): support 128-bit trace ids [does not include support for b3 and w3c] (#5326)
## Description Currently Datadog spans are correlated using 64 bit trace ids. However, as industry standards have sprung up around distributed tracing ([OpenTracing](https://github.com/opentracing/specification/blob/master/rfc/trace_identifiers.md#trace-context-http-headers), [OpenCensus](https://github.com/census-instrumentation/opencensus-specs/blob/master/trace/Span.md#traceid), and now [OpenTelemetry](https://opentelemetry.io/docs/reference/specification/trace/api/#spancontext)) the accepted standard length for trace IDs has settled at 128 bits. This PR introduced a configuration which can enable the generation and propagation of 128bit trace ids (`DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED`). With this change 128bit trace ids will be `opt-in`. 128bit trace ids will become the default standard in a future PR. ## Components ### Format Current format: trace ids are 64bit integers with the following binary representation: <64 random bits> Proposed format: trace ids are 128bit integers with the following binary representation: `<32-bit unix seconds><32 bits of zero><64 random bits>` ### Encoding 128bit integers are not compatible with ddtrace trace encoders and agent endpoints. As a workaround 128bit trace ids will be encoded in two fields. - The 64 lowest order bits will be stored in the `trace_id` field in Json and MsgPack Encoders. - The 64 highest order bits will be encoded into hex and stored as a span tag with the key `_dd.p.tid`. #### Example Span: `<Span(id=3822442170818112150,trace_id=133030438088573679178230931306392465947,parent_id=None,name=bit128)>` trace_id as binary: `01100100000101001011101011111101000000000000000000000000000000001101100111101101100000011001111100101111111100100011001000011011` trace_id lower order bits: `1101100111101101100000011001111100101111111100100011001000011011` trace_id lower order bits as integer: `15703349996414972443` trace_id higher order bits: `0110010000010100101110101111110100000000000000000000000000000000` trace_id higher order bits as hex: `6414bafd00000000` Encoded Span: `span_id=3822442170818112150` , `trace_id=15703349996414972443` , `name=bit128`, ` _meta={"_dd.tid": "6414bafd00000000")>` ### Distributed Tracing This PR only adds support for propagating 128bit trace ids using the datadog propagation mode. Although b3 and w3c trace header formats support 128bit the Datadog tracer truncates all trace ids to 64 bits. Moving forward this is unacceptable. Supporting 128bit trace ids in b3 and w3c will be added in a future PR. #### Datadog Distributed Tracing Headers Similar to the encoding example above the `x-datadog-trace-id` header will propagate the 64 lower order bits as an integer and `x-datadog-tags` header will propagate the higher order bits as hex (using the `_dd.tid` tag). ##### Example Span: `<Span(id=3822442170818112150,trace_id=133030438088573679178230931306392465947,parent_id=None,name=bit128)>` distributed tracing headers: `{"x-datadog-tags": "t.tid:6414bafd00000000", "x-datadog-trace-id": 15703349996414972443, "x-datadog-parent-id": 3822442170818112150}` ### Sampling The 64 lowest order bits are random but the 64 highest order bits correspond to the unix time and are not random. When 128bit trace ids are generated we should only use the lowest order 64 bits (random component) to determine whether a span should be sampled. This will ensure when trace ids mapped to values from 0 to 1 we get a uniform random distribution. ## Testing Strategy - Run the tracer test suite with `DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED=true`. This will ensure all tracing operations work as expected when 128bit trace ids are generated (ex: sampling, distributed tracing, encoding). - Ensure all 128bit trace id system tests pass. - Add integration tests for the following scenarios - 128bit trace ids are encoded without data loss and raising an OverflowError. - trace_id field should only contain a 64bit integer and the remaining bits are stored in the `_dd.p.tid` tag. - 128bit trace ids are propagated by Datadog distributing tracing headers. - Ensure the full 128bit trace id propagated and reconstructed by downstream services. - Ensure Spans with 128bit trace ids are sampled at the expected rate ### Performance Testing 1. There is no performance regression when 128bit trace id generation is disabled. This is the default mode. 2. When 128bit trace id generation is enabled there is a ~60ns increase (578ns -> 640ns, 10%) to span creation (`Span.__init__(name, ......)`). This performance regression does not appear avoidable. - This overhead was measured on M1 using python 3.8. Results vary across platforms and python versions. ## Next Steps 1. Support 128bit trace id propagation in b3 and w3c headers - The code change in straight forward (ie avoid truncating incoming trace ids) but this change will require a significant refactor of existing tests. 2. Add support for the `DD_TRACE_128_BIT_TRACEID_LOGGING_ENABLED` environment variable - This supports logging 64bit trace ids even when `DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED=true` 4. Enable 128bit trace id on a sample application (ex: internal services using the ddtrace library) - Ensure sampling rates are respected - Ensure logs correlation works for traces with 128bit trace ids (this is a concern raised in the RFC) - Ensure distributed tracing works across tracers that only support 64bit trace ids - Ensure the full 128bit trace id is reconstructed and is viewable in the Datadog product 5. Enable support for 128bit trace id by default and add public documentation. ## Checklist - [x] Change(s) are motivated and described in the PR description. - [x] Testing strategy is described if automated tests are not included in the PR. - [x] Risk is outlined (performance impact, potential for breakage, maintainability, etc). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/contributing.html#Release-Note-Guidelines) are followed. - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)). - [x] Author is aware of the performance implications of this PR as reported in the benchmarks PR comment. ## Reviewer Checklist - [x] Title is accurate. - [x] No unnecessary changes are introduced. - [x] Description motivates each change. - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [x] Testing strategy adequately addresses listed risk(s). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] Release note makes sense to a user of the library. - [x] Reviewer is aware of, and discussed the performance implications of this PR as reported in the benchmarks PR comment. --------- Co-authored-by: Kyle Verhoog <[email protected]>
1 parent 1dd239c commit f8577b0

21 files changed

+316
-35
lines changed

ddtrace/internal/_encoding.pyx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -604,7 +604,7 @@ cdef class MsgpackEncoderV03(MsgpackEncoderBase):
604604
if ret == 0:
605605
ret = pack_bytes(&self.pk, <char *> b"trace_id", 8)
606606
if ret != 0: return ret
607-
ret = pack_number(&self.pk, span.trace_id)
607+
ret = pack_number(&self.pk, span._trace_id_64bits)
608608
if ret != 0: return ret
609609

610610
if has_parent_id:
@@ -718,7 +718,7 @@ cdef class MsgpackEncoderV05(MsgpackEncoderBase):
718718
ret = self._pack_string(span.resource)
719719
if ret != 0: return ret
720720

721-
_ = span.trace_id
721+
_ = span._trace_id_64bits
722722
ret = msgpack_pack_uint64(&self.pk, _ if _ is not None else 0)
723723
if ret != 0: return ret
724724

ddtrace/internal/_rand.pyi

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,3 @@
11
def seed() -> None: ...
22
def rand64bits(check_pid: bool = True) -> int: ...
3+
def rand128bits(check_pid: bool = True) -> int: ...

ddtrace/internal/_rand.pyx

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ test_rand64bits_pid_check 121.8156 (2.03) 168.9837 (1.71) 130.3854 (
5454
import os
5555
import random
5656

57+
from libc.time cimport time
58+
5759
from ddtrace.internal import compat
5860
from ddtrace.internal import forksafe
5961

@@ -87,4 +89,9 @@ cpdef rand64bits():
8789
return <uint64_t>(state * <uint64_t>2685821657736338717)
8890

8991

92+
cpdef rand128bits():
93+
# Returns a 128bit integer with the following format -> <32-bit unix seconds><32 bits of zero><64 random bits>
94+
return int(time(NULL)) << 96 | rand64bits()
95+
96+
9097
seed()

ddtrace/internal/constants.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@
1818
DEFAULT_SERVICE_NAME = "unnamed_python_service"
1919
# Used to set the name of an integration on a span
2020
COMPONENT = "component"
21+
HIGHER_ORDER_TRACE_ID_BITS = "_dd.p.tid"
22+
MAX_UINT_64BITS = (1 << 64) - 1
2123

2224
APPSEC_BLOCKED_RESPONSE_HTML = """
2325
<!DOCTYPE html><html lang="en"><head> <meta charset="UTF-8"> <meta name="viewport"

ddtrace/internal/encoding.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,9 +52,9 @@ def encode(self, obj):
5252

5353
@staticmethod
5454
def _span_to_dict(span):
55-
# type: () -> Dict[str, Any]
55+
# type: (Span) -> Dict[str, Any]
5656
d = {
57-
"trace_id": span.trace_id,
57+
"trace_id": span._trace_id_64bits,
5858
"parent_id": span.parent_id,
5959
"span_id": span.span_id,
6060
"service": span.service,

ddtrace/internal/processor/trace.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,16 @@
1212
from ddtrace import config
1313
from ddtrace.constants import SAMPLING_PRIORITY_KEY
1414
from ddtrace.constants import USER_KEEP
15+
from ddtrace.internal.constants import HIGHER_ORDER_TRACE_ID_BITS
16+
from ddtrace.internal.constants import MAX_UINT_64BITS
1517
from ddtrace.internal.logger import get_logger
1618
from ddtrace.internal.processor import SpanProcessor
1719
from ddtrace.internal.sampling import SpanSamplingRule
1820
from ddtrace.internal.sampling import is_single_span_sampled
1921
from ddtrace.internal.service import ServiceStatusError
2022
from ddtrace.internal.writer import TraceWriter
2123
from ddtrace.span import Span
24+
from ddtrace.span import _get_64_highest_order_bits_as_hex
2225
from ddtrace.span import _is_top_level
2326

2427

@@ -127,6 +130,10 @@ def process_trace(self, trace):
127130

128131
ctx._update_tags(chunk_root)
129132
chunk_root.set_tag_str("language", "python")
133+
# for 128 bit trace ids
134+
if chunk_root.trace_id > MAX_UINT_64BITS:
135+
trace_id_hob = _get_64_highest_order_bits_as_hex(chunk_root.trace_id)
136+
chunk_root.set_tag_str(HIGHER_ORDER_TRACE_ID_BITS, trace_id_hob)
130137
return trace
131138

132139

ddtrace/propagation/http.py

Lines changed: 41 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,8 @@
2121
from ..internal._tagset import encode_tagset_values
2222
from ..internal.compat import ensure_str
2323
from ..internal.compat import ensure_text
24+
from ..internal.constants import HIGHER_ORDER_TRACE_ID_BITS as _HIGHER_ORDER_TRACE_ID_BITS
25+
from ..internal.constants import MAX_UINT_64BITS as _MAX_UINT_64BITS
2426
from ..internal.constants import PROPAGATION_STYLE_B3
2527
from ..internal.constants import PROPAGATION_STYLE_B3_SINGLE_HEADER
2628
from ..internal.constants import PROPAGATION_STYLE_DATADOG
@@ -31,6 +33,8 @@
3133
from ..internal.logger import get_logger
3234
from ..internal.sampling import validate_sampling_decision
3335
from ..span import _MetaDictType
36+
from ..span import _get_64_highest_order_bits_as_hex
37+
from ..span import _get_64_lowest_order_bits_as_int
3438
from ._utils import get_wsgi_header
3539

3640

@@ -154,7 +158,16 @@ def _inject(span_context, headers):
154158
log.debug("tried to inject invalid context %r", span_context)
155159
return
156160

157-
headers[HTTP_HEADER_TRACE_ID] = str(span_context.trace_id)
161+
if span_context.trace_id > _MAX_UINT_64BITS:
162+
# set lower order 64 bits in `x-datadog-trace-id` header. For backwards compatibility these
163+
# bits should be converted to a base 10 integer.
164+
headers[HTTP_HEADER_TRACE_ID] = str(_get_64_lowest_order_bits_as_int(span_context.trace_id))
165+
# set higher order 64 bits in `_dd.p.tid` to propagate the full 128 bit trace id.
166+
# Note - The higher order bits must be encoded in hex
167+
span_context._meta[_HIGHER_ORDER_TRACE_ID_BITS] = _get_64_highest_order_bits_as_hex(span_context.trace_id)
168+
else:
169+
headers[HTTP_HEADER_TRACE_ID] = str(span_context.trace_id)
170+
158171
headers[HTTP_HEADER_PARENT_ID] = str(span_context.span_id)
159172
sampling_priority = span_context.sampling_priority
160173
# Propagate priority only if defined
@@ -197,11 +210,18 @@ def _inject(span_context, headers):
197210
@staticmethod
198211
def _extract(headers):
199212
# type: (Dict[str, str]) -> Optional[Context]
200-
trace_id = _extract_header_value(
201-
POSSIBLE_HTTP_HEADER_TRACE_IDS,
202-
headers,
203-
)
204-
if trace_id is None:
213+
trace_id_str = _extract_header_value(POSSIBLE_HTTP_HEADER_TRACE_IDS, headers)
214+
if trace_id_str is None:
215+
return None
216+
try:
217+
trace_id = int(trace_id_str)
218+
except ValueError:
219+
trace_id = 0
220+
221+
if trace_id == 0 or trace_id > _MAX_UINT_64BITS:
222+
log.warning(
223+
"Invalid trace id: %r. `x-datadog-trace-id` must be greater than zero and less than 2**64", trace_id_str
224+
)
205225
return None
206226

207227
parent_span_id = _extract_header_value(
@@ -246,6 +266,20 @@ def _extract(headers):
246266
}
247267
log.debug("failed to decode x-datadog-tags: %r", tags_value, exc_info=True)
248268

269+
if meta is not None and config._128_bit_trace_id_enabled:
270+
# When 128 bit trace ids are propagated the 64 lowest order bits are encoded as an integer
271+
# and set in the `x-datadog-trace-id` header (this was done for backwards compatibility).
272+
# The 64 highest order bits are encoded in base 16 and store in the `_dd.p.tid` tag.
273+
# Here we reconstruct the full 128 bit trace_id.
274+
trace_id_hob_hex = meta.get(_HIGHER_ORDER_TRACE_ID_BITS) # type: Optional[str]
275+
if trace_id_hob_hex is not None:
276+
# convert lowest order bits in trace_id to base 16
277+
trace_id_lod_hex = "{:016x}".format(trace_id)
278+
# combine highest and lowest order hex values to create a 128 bit trace_id
279+
trace_id = int(trace_id_hob_hex + trace_id_lod_hex, 16)
280+
# After the full trace id is reconstructed this tag is no longer required
281+
del meta[_HIGHER_ORDER_TRACE_ID_BITS]
282+
249283
# Try to parse values into their expected types
250284
try:
251285
if sampling_priority is not None:
@@ -258,7 +292,7 @@ def _extract(headers):
258292

259293
return Context(
260294
# DEV: Do not allow `0` for trace id or span id, use None instead
261-
trace_id=int(trace_id) or None,
295+
trace_id=trace_id or None,
262296
span_id=int(parent_span_id) or None, # type: ignore[arg-type]
263297
sampling_priority=sampling_priority, # type: ignore[arg-type]
264298
dd_origin=origin,

ddtrace/sampler.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
from .constants import USER_REJECT
2626
from .internal.compat import iteritems
2727
from .internal.compat import pattern_type
28+
from .internal.constants import MAX_UINT_64BITS as _MAX_UINT_64BITS
2829
from .internal.logger import get_logger
2930
from .internal.rate_limiter import RateLimiter
3031
from .internal.sampling import SamplingMechanism
@@ -44,8 +45,11 @@
4445

4546
log = get_logger(__name__)
4647

47-
MAX_TRACE_ID = 2 ** 64
48-
48+
# All references to MAX_TRACE_ID were replaced with _MAX_UINT_64BITS.
49+
# Now that ddtrace supports generating 128bit trace_ids,
50+
# the max trace id should be 2**128 - 1 (not 2**64 -1)
51+
# MAX_TRACE_ID is no longer used and should be removed.
52+
MAX_TRACE_ID = _MAX_UINT_64BITS
4953
# Has to be the same factor and key as the Agent to allow chained sampling
5054
KNUTH_FACTOR = 1111111111111111111
5155

@@ -101,11 +105,11 @@ def __init__(self, sample_rate=1.0):
101105
def set_sample_rate(self, sample_rate):
102106
# type: (float) -> None
103107
self.sample_rate = float(sample_rate)
104-
self.sampling_id_threshold = self.sample_rate * MAX_TRACE_ID
108+
self.sampling_id_threshold = self.sample_rate * _MAX_UINT_64BITS
105109

106110
def sample(self, span):
107111
# type: (Span) -> bool
108-
return ((span.trace_id * KNUTH_FACTOR) % MAX_TRACE_ID) <= self.sampling_id_threshold
112+
return ((span._trace_id_64bits * KNUTH_FACTOR) % _MAX_UINT_64BITS) <= self.sampling_id_threshold
109113

110114

111115
class RateByServiceSampler(BasePrioritySampler):
@@ -431,7 +435,7 @@ def sample_rate(self):
431435
def sample_rate(self, sample_rate):
432436
# type: (float) -> None
433437
self._sample_rate = sample_rate
434-
self._sampling_id_threshold = sample_rate * MAX_TRACE_ID
438+
self._sampling_id_threshold = sample_rate * _MAX_UINT_64BITS
435439

436440
def _pattern_matches(self, prop, pattern):
437441
# If the rule is not set, then assume it matches
@@ -501,7 +505,7 @@ def sample(self, span):
501505
elif self.sample_rate == 0:
502506
return False
503507

504-
return ((span.trace_id * KNUTH_FACTOR) % MAX_TRACE_ID) <= self._sampling_id_threshold
508+
return ((span._trace_id_64bits * KNUTH_FACTOR) % _MAX_UINT_64BITS) <= self._sampling_id_threshold
505509

506510
def _no_rule_or_self(self, val):
507511
return "NO_RULE" if val is self.NO_RULE else val

ddtrace/settings/config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -225,6 +225,8 @@ def __init__(self):
225225

226226
self._telemetry_metrics_enabled = asbool(os.getenv("_DD_TELEMETRY_METRICS_ENABLED", default=False))
227227

228+
self._128_bit_trace_id_enabled = asbool(os.getenv("DD_TRACE_128_BIT_TRACEID_GENERATION_ENABLED", False))
229+
228230
# Propagation styles
229231
self._propagation_style_extract = self._propagation_style_inject = _parse_propagation_styles(
230232
"DD_TRACE_PROPAGATION_STYLE", default=_PROPAGATION_STYLE_DEFAULT

ddtrace/span.py

Lines changed: 26 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,8 @@
2828
from .context import Context
2929
from .ext import http
3030
from .ext import net
31-
from .internal import _rand
31+
from .internal._rand import rand64bits as _rand64bits
32+
from .internal._rand import rand128bits as _rand128bits
3233
from .internal.compat import NumericType
3334
from .internal.compat import StringIO
3435
from .internal.compat import ensure_text
@@ -37,6 +38,7 @@
3738
from .internal.compat import numeric_types
3839
from .internal.compat import stringify
3940
from .internal.compat import time_ns
41+
from .internal.constants import MAX_UINT_64BITS as _MAX_UINT_64BITS
4042
from .internal.logger import get_logger
4143
from .internal.sampling import SamplingMechanism
4244
from .internal.sampling import update_sampling_decision
@@ -50,6 +52,18 @@
5052
log = get_logger(__name__)
5153

5254

55+
def _get_64_lowest_order_bits_as_int(large_int):
56+
# type: (int) -> int
57+
"""Get the 64 lowest order bits from a 128bit integer"""
58+
return _MAX_UINT_64BITS & large_int
59+
60+
61+
def _get_64_highest_order_bits_as_hex(large_int):
62+
# type: (int) -> str
63+
"""Get the 64 highest order bits from a 128bit integer"""
64+
return "{:032x}".format(large_int)[:16]
65+
66+
5367
class Span(object):
5468

5569
__slots__ = [
@@ -137,8 +151,13 @@ def __init__(
137151
self.duration_ns = None # type: Optional[int]
138152

139153
# tracing
140-
self.trace_id = trace_id or _rand.rand64bits() # type: int
141-
self.span_id = span_id or _rand.rand64bits() # type: int
154+
if trace_id is not None:
155+
self.trace_id = trace_id # type: int
156+
elif config._128_bit_trace_id_enabled:
157+
self.trace_id = _rand128bits()
158+
else:
159+
self.trace_id = _rand64bits()
160+
self.span_id = span_id or _rand64bits() # type: int
142161
self.parent_id = parent_id # type: Optional[int]
143162
self._on_finish_callbacks = [] if on_finish is None else on_finish
144163

@@ -176,6 +195,10 @@ def _get_ctx_item(self, key):
176195
return None
177196
return self._store.get(key)
178197

198+
@property
199+
def _trace_id_64bits(self):
200+
return _get_64_lowest_order_bits_as_int(self.trace_id)
201+
179202
@property
180203
def start(self):
181204
# type: () -> float

0 commit comments

Comments
 (0)