Skip to content

Commit d345f8c

Browse files
fix(tracing): handle unicode when truncating long span attributes [backport 3.8] (#13478)
Backport c0fe465 from #13475 to 3.8. PR #13270 introduced truncation of long span attributes. However, the truncation code works at the character level (e.g., uses `len(text)` to count the string length), but `msgpack_pack_unicode()` expects a size in bytes as an argument. For a string with non-ASCII characters, the character length can be less than the byte length, and so in some cases the string would not be truncated (because the number of characters would be below the limit), but `msgpack_pack_unicode()` would fail (because the number of bytes would be above the limit). This PR changes the call to `msgpack_pack_unicode()` to use the old limit of `ITEM_LIMIT` (2**32 - 1). ## Checklist - [x] PR author has checked that all the criteria below are met - The PR description includes an overview of the change - The PR description articulates the motivation for the change - The change includes tests OR the PR description describes a testing strategy - The PR description notes risks associated with the change, if any - Newly-added code is easy to change - The change follows the [library release note guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html) - The change includes or references documentation updates if necessary - Backport labels are set (if [applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)) ## Reviewer Checklist - [x] Reviewer has checked that all the criteria below are met - Title is accurate - All changes are related to the pull request's stated goal - Avoids breaking [API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces) changes - Testing strategy adequately addresses listed risks - Newly-added code is easy to change - Release note makes sense to a user of the library - If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment - Backport labels are set in a manner that is consistent with the [release branch maintenance policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting) Co-authored-by: Vítor De Araújo <[email protected]>
1 parent f79ce6f commit d345f8c

5 files changed

+68
-1
lines changed

ddtrace/internal/_encoding.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -154,7 +154,7 @@ cdef inline int pack_text(msgpack_packer *pk, object text) except? -1:
154154
if len(text) > MAX_SPAN_META_VALUE_LEN:
155155
text = truncate_string(text)
156156
IF PY_MAJOR_VERSION >= 3:
157-
ret = msgpack_pack_unicode(pk, text, MAX_SPAN_META_VALUE_LEN)
157+
ret = msgpack_pack_unicode(pk, text, ITEM_LIMIT)
158158
if ret == -2:
159159
raise ValueError("unicode string is too large")
160160
ELSE:
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
fixes:
3+
- |
4+
tracing: Fixes an issue where truncation of span attributes longer than 25000 characters would not consistently
5+
count the size of UTF-8 multibyte characters, leading to a ``unicode string is too large`` error.

tests/integration/test_integration_snapshots.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -290,3 +290,13 @@ def test_encode_span_with_large_string_attributes(encoding):
290290
with override_global_config(dict(_trace_api=encoding)):
291291
with tracer.trace(name="a" * 25000, resource="b" * 25001) as span:
292292
span.set_tag(key="c" * 25001, value="d" * 2000)
293+
294+
295+
@pytest.mark.parametrize("encoding", ["v0.4", "v0.5"])
296+
@pytest.mark.snapshot()
297+
def test_encode_span_with_large_unicode_string_attributes(encoding):
298+
from ddtrace import tracer
299+
300+
with override_global_config(dict(_trace_api=encoding)):
301+
with tracer.trace(name="á" * 25000, resource="â" * 25001) as span:
302+
span.set_tag(key="å" * 25001, value="ä" * 2000)

tests/snapshots/tests.integration.test_integration_snapshots.test_encode_span_with_large_unicode_string_attributes[v0.4].json

Lines changed: 26 additions & 0 deletions
Large diffs are not rendered by default.

tests/snapshots/tests.integration.test_integration_snapshots.test_encode_span_with_large_unicode_string_attributes[v0.5].json

Lines changed: 26 additions & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)