Skip to content

Commit 3b5bf70

Browse files
fix(v05): reset string table index map after rollbacks [backport 2.3] (#7880)
Backport 3945eb6 from #7874 to 2.3. Resolves: #7662 This PR resolves an encoding issue when `DD_TRACE_API_VERSION` is set to `v0.5` ## Background The python tracer always attempts to encode a trace. While encoding a trace if the payload/buffer size limit is reached the trace buffer and string table are rolled back to a previous state. However the rollback only resets the index in the string table and the trace buffer. It does not removing stale string->index pairs from `StringTable._table`. This causes problems when: - buffer limit is reached while encoding a trace -> then trace buffer and string table are rolled back -> then a smaller trace containing strings that map to stale/rolled back string table index is successfully encoded. This scenario is very rare and only occurs on a very small fraction of python traces. The scenario described above can mis-encode traces in two different ways resulting two types of decoding errors: - out of bounds access: The encoded trace references string table indices that have not been serialized. In python decoders this will raise an IndexError. - string swapping/overwriting: Two different strings map to the same string table index. When decoding the trace both strings will be mapped to the same value. This is the behavior we saw in #incident-22455 ## Fix Remove stale _id->string pairs from the `StringTable._table` after rolling back ## Checklist - [x] Change(s) are motivated and described in the PR description. - [x] Testing strategy is described if automated tests are not included in the PR. - [x] Risk is outlined (performance impact, potential for breakage, maintainability, etc). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] [Library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) are followed. If no release note is required, add label `changelog/no-changelog`. - [x] Documentation is included (in-code, generated user docs, [public corp docs](https://github.com/DataDog/documentation/)). - [x] Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Title is accurate. - [x] No unnecessary changes are introduced. - [x] Description motivates each change. - [x] Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes unless absolutely necessary. - [x] Testing strategy adequately addresses listed risk(s). - [x] Change is maintainable (easy to change, telemetry, documentation). - [x] Release note makes sense to a user of the library. - [x] Reviewer has explicitly acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment. - [x] Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) - [x] If this PR touches code that signs or publishes builds or packages, or handles credentials of any kind, I've requested a review from `@DataDog/security-design-and-guidance`. - [x] This PR doesn't touch any of that. --------- Co-authored-by: Munir Abdinur <[email protected]>
1 parent 0301f25 commit 3b5bf70

File tree

3 files changed

+54
-5
lines changed

3 files changed

+54
-5
lines changed

ddtrace/internal/_encoding.pyx

Lines changed: 18 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -278,13 +278,26 @@ cdef class MsgpackStringTable(StringTable):
278278
self.pk.length = self._sp_len
279279
self._next_id = self._sp_id
280280

281+
# After rolling back the string table next_id we must remove all stale string -> _id pairs
282+
# This will resolve two classes of encoding errors:
283+
# - multiple strings referencing the same string table index. In this scenario
284+
# two different strings in the encoded trace could be serialized with the same _id. In this scenario
285+
# two different strings could reference one string in the encoded trace (string swapping).
286+
# - when the string table references an index of the string table that is not serialized. The encoded
287+
# trace can not be decoded without accessing an invalid index. In this scenario the agent will
288+
# return a 400 status code.
289+
self._table = {s: idx for s, idx in self._table.items() if idx < self._next_id}
290+
281291
cdef get_bytes(self):
282292
cdef int ret
283-
cdef stdint.uint32_t table_size = self._next_id
284-
cdef int offset = MSGPACK_STRING_TABLE_LENGTH_PREFIX_SIZE - array_prefix_size(table_size)
285-
cdef int old_pos = self.pk.length
286-
293+
cdef stdint.uint32_t table_size
294+
cdef int offset
295+
cdef int old_pos
287296
with self._lock:
297+
table_size = self._next_id
298+
offset = MSGPACK_STRING_TABLE_LENGTH_PREFIX_SIZE - array_prefix_size(table_size)
299+
old_pos = self.pk.length
300+
288301
# Update table size prefix
289302
self.pk.length = offset
290303
ret = msgpack_pack_array(&self.pk, table_size)
@@ -297,7 +310,7 @@ cdef class MsgpackStringTable(StringTable):
297310
return None
298311
self.pk.length = old_pos
299312

300-
return PyBytes_FromStringAndSize(self.pk.buf + offset, self.pk.length - offset)
313+
return PyBytes_FromStringAndSize(self.pk.buf + offset, self.pk.length - offset)
301314

302315
@property
303316
def size(self):
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
fixes:
3+
- |
4+
tracing: Resolves trace encoding errors raised when ``DD_TRACE_API_VERSION`` is set to ``v0.5`` and a BufferFull
5+
Exception is raised by the TraceWriter. This fix ensures span fields are not overwritten and reduces the frequency
6+
of 4XX errors in the trace agent.

tests/tracer/test_encoders.py

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -299,6 +299,36 @@ def allencodings(f):
299299
return pytest.mark.parametrize("encoding", MSGPACK_ENCODERS.keys())(f)
300300

301301

302+
def test_msgpack_encoding_after_an_exception_was_raised():
303+
"""Ensure that the encoder's state is consistent after an Exception is raised during encoding"""
304+
# Encode a trace after a rollback/BufferFull occurs exception
305+
rolledback_encoder = MsgpackEncoderV05(1 << 12, 1 << 12)
306+
trace = gen_trace(nspans=1, ntags=100, nmetrics=100, key_size=10, value_size=10)
307+
rand_string = rands(size=20, chars=string.ascii_letters)
308+
# trace only has one span
309+
trace[0].set_tag_str("some_tag", rand_string)
310+
try:
311+
# Encode a trace that will trigger a rollback/BufferItemTooLarge exception
312+
# BufferFull is not raised since only one span is being encoded
313+
rolledback_encoder.put(trace)
314+
except BufferItemTooLarge:
315+
pass
316+
else:
317+
pytest.fail("Encoding the trace did not overflow the trace buffer. We should increase the size of the span.")
318+
# Successfully encode a small trace
319+
small_trace = gen_trace(nspans=1, ntags=0, nmetrics=0)
320+
# Add a tag to the small trace that was previously encoded in the encoder's StringTable
321+
small_trace[0].set_tag_str("previously_encoded_string", rand_string)
322+
rolledback_encoder.put(small_trace)
323+
324+
# Encode a trace without triggering a rollback/BufferFull exception
325+
ref_encoder = MsgpackEncoderV05(1 << 20, 1 << 20)
326+
ref_encoder.put(small_trace)
327+
328+
# Ensure the two encoders have the same state
329+
assert rolledback_encoder.encode() == ref_encoder.encode()
330+
331+
302332
@allencodings
303333
def test_custom_msgpack_encode(encoding):
304334
encoder = MSGPACK_ENCODERS[encoding](1 << 20, 1 << 20)

0 commit comments

Comments
 (0)