Skip to content

Commit 0db7b10

Browse files
fix(v05): reset string table index map after rollbacks [backport 1.20] (#7879)
Backport 3945eb6 from #7874 to 1.20. Resolves: #7662 This PR resolves an encoding issue when `DD_TRACE_API_VERSION` is set to `v0.5` ## Background The python tracer always attempts to encode a trace. While encoding a trace if the payload/buffer size limit is reached the trace buffer and string table are rolled back to a previous state. However the rollback only resets the index in the string table and the trace buffer. It does not removing stale string->index pairs from `StringTable._table`. This causes problems when: - buffer limit is reached while encoding a trace -> then trace buffer and string table are rolled back -> then a smaller trace containing strings that map to stale/rolled back string table index is successfully encoded. This scenario is very rare and only occurs on a very small fraction of python traces. The scenario described above can mis-encode traces in two different ways resulting two types of decoding errors: - out of bounds access: The encoded trace references string table indices that have not been serialized. In python decoders this will raise an IndexError. - string swapping/overwriting: Two different strings map to the same string table index. When decoding the trace both strings will be mapped to the same value. This is the behavior we saw in #incident-22455 ## Fix Remove stale _id->string pairs from the `StringTable._table` after rolling back ## Checklist - [x] Change(s) are motivated and described in the PR description. - [x] Testing strategy is described if automated tests are not included in the PR. - [x] Risk is outlined (performance impact, potential for breakage, maintainability, etc). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed. If no release note is required, add label `changelog/no-changelog`. - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)). - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Title is accurate. - [x] No unnecessary changes are introduced. - [x] Description motivates each change. - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [x] Testing strategy adequately addresses listed risk(s). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] Release note makes sense to a user of the library. - [x] Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment. - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) - [x] If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from `@DataDog/security-design-and-guidance`. - [x] This PR doesn't touch any of that. --------- Co-authored-by: Munir Abdinur <[email protected]>
1 parent 24c2632 commit 0db7b10

File tree

3 files changed

+54
-5
lines changed

3 files changed

+54
-5
lines changed

ddtrace/internal/_encoding.pyx

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -286,13 +286,26 @@ cdef class MsgpackStringTable(StringTable):
286286
self.pk.length = self._sp_len
287287
self._next_id = self._sp_id
288288

289+
# After rolling back the string table next_id we must remove all stale string -> _id pairs
290+
# This will resolve two classes of encoding errors:
291+
# - multiple strings referencing the same string table index. In this scenario
292+
# two different strings in the encoded trace could be serialized with the same _id. In this scenario
293+
# two different strings could reference one string in the encoded trace (string swapping).
294+
# - when the string table references an index of the string table that is not serialized. The encoded
295+
# trace can not be decoded without accessing an invalid index. In this scenario the agent will
296+
# return a 400 status code.
297+
self._table = {s: idx for s, idx in self._table.items() if idx < self._next_id}
298+
289299
cdef get_bytes(self):
290300
cdef int ret
291-
cdef stdint.uint32_t table_size = self._next_id
292-
cdef int offset = MSGPACK_STRING_TABLE_LENGTH_PREFIX_SIZE - array_prefix_size(table_size)
293-
cdef int old_pos = self.pk.length
294-
301+
cdef stdint.uint32_t table_size
302+
cdef int offset
303+
cdef int old_pos
295304
with self._lock:
305+
table_size = self._next_id
306+
offset = MSGPACK_STRING_TABLE_LENGTH_PREFIX_SIZE - array_prefix_size(table_size)
307+
old_pos = self.pk.length
308+
296309
# Update table size prefix
297310
self.pk.length = offset
298311
ret = msgpack_pack_array(&self.pk, table_size)
@@ -305,7 +318,7 @@ cdef class MsgpackStringTable(StringTable):
305318
return None
306319
self.pk.length = old_pos
307320

308-
return PyBytes_FromStringAndSize(self.pk.buf + offset, self.pk.length - offset)
321+
return PyBytes_FromStringAndSize(self.pk.buf + offset, self.pk.length - offset)
309322

310323
@property
311324
def size(self):
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
fixes:
3+
- |
4+
tracing: Resolves trace encoding errors raised when ``DD_TRACE_API_VERSION`` is set to ``v0.5`` and a BufferFull
5+
Exception is raised by the TraceWriter. This fix ensures span fields are not overwritten and reduces the frequency
6+
of 4XX errors in the trace agent.

tests/tracer/test_encoders.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -302,6 +302,36 @@ def allencodings(f):
302302
return pytest.mark.parametrize("encoding", MSGPACK_ENCODERS.keys())(f)
303303

304304

305+
def test_msgpack_encoding_after_an_exception_was_raised():
306+
"""Ensure that the encoder's state is consistent after an Exception is raised during encoding"""
307+
# Encode a trace after a rollback/BufferFull occurs exception
308+
rolledback_encoder = MsgpackEncoderV05(1 << 12, 1 << 12)
309+
trace = gen_trace(nspans=1, ntags=100, nmetrics=100, key_size=10, value_size=10)
310+
rand_string = rands(size=20, chars=string.ascii_letters)
311+
# trace only has one span
312+
trace[0].set_tag_str("some_tag", rand_string)
313+
try:
314+
# Encode a trace that will trigger a rollback/BufferItemTooLarge exception
315+
# BufferFull is not raised since only one span is being encoded
316+
rolledback_encoder.put(trace)
317+
except BufferItemTooLarge:
318+
pass
319+
else:
320+
pytest.fail("Encoding the trace did not overflow the trace buffer. We should increase the size of the span.")
321+
# Successfully encode a small trace
322+
small_trace = gen_trace(nspans=1, ntags=0, nmetrics=0)
323+
# Add a tag to the small trace that was previously encoded in the encoder's StringTable
324+
small_trace[0].set_tag_str("previously_encoded_string", rand_string)
325+
rolledback_encoder.put(small_trace)
326+
327+
# Encode a trace without triggering a rollback/BufferFull exception
328+
ref_encoder = MsgpackEncoderV05(1 << 20, 1 << 20)
329+
ref_encoder.put(small_trace)
330+
331+
# Ensure the two encoders have the same state
332+
assert rolledback_encoder.encode() == ref_encoder.encode()
333+
334+
305335
@allencodings
306336
def test_custom_msgpack_encode(encoding):
307337
encoder = MSGPACK_ENCODERS[encoding](1 << 20, 1 << 20)

0 commit comments

Comments
 (0)