fix(core): add strip_null_bytes() to safe_dumps — prevents PostgreSQL 22P05 errors in spend logs#24314
fix(core): add strip_null_bytes() to safe_dumps — prevents PostgreSQL 22P05 errors in spend logs#24314xykong wants to merge 1 commit intoBerriAI:mainfrom
Conversation
… 22P05 errors Null bytes (\x00) in LLM request/response payloads cause PostgreSQL to raise '22P05: invalid byte sequence for encoding UTF8: 0x00' when spend logs are written to the database. Changes: - Add strip_null_bytes() helper to safe_json_dumps.py that recursively removes \x00 chars from strings, dicts, lists, tuples and sets - Inline null byte removal into safe_dumps() _serialize() for str paths so all JSON serialization through safe_dumps() is automatically safe - In spend_tracking_utils.py: replace json.dumps() with safe_dumps() for messages and request_body serialization; add strip_null_bytes() call in _sanitize_request_body_for_spend_logs_payload string handling Centralizing the fix in safe_dumps() is more robust than ad-hoc stripping at each call site. Fixes BerriAI#24310 Related: BerriAI#21290, BerriAI#15519
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile SummaryThis PR centralizes null-byte stripping into Key changes:
Issues found:
Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/litellm_core_utils/safe_json_dumps.py | Adds strip_null_bytes() helper and integrates null-byte stripping into _serialize(). Both functions correctly strip null bytes from string values and fallback str() conversions, but neither strips null bytes from dictionary keys, leaving a residual path that can still trigger PostgreSQL 22P05. No new tests cover the null-byte behavior. |
| litellm/proxy/spend_tracking/spend_tracking_utils.py | Replaces two ad-hoc json.dumps(..., default=str) calls with safe_dumps() and adds an early strip_null_bytes() call before the truncation length check in _sanitize_value. Changes are correct and well-scoped; the early strip ensures the truncation threshold is measured on the already-cleaned string. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[LLM Request/Response Payload] --> B{spend_tracking_utils}
B --> C[_get_messages_for_spend_logs_payload]
B --> D[_get_proxy_server_request_for_spend_logs_payload]
B --> E[_get_response_for_spend_logs_payload]
C -->|safe_dumps| F[safe_dumps]
D --> G[_sanitize_request_body]
E --> G
G -->|string values| H[strip_null_bytes - value]
H --> I{len gt max_string_length?}
I -->|yes| J[truncate string]
I -->|no| K[keep as-is]
J --> L[safe_dumps]
K --> L
F --> M[_serialize - strips str values]
L --> M
M -->|dict values| N[null bytes stripped in values]
M -->|dict keys| O[WARNING - keys NOT stripped]
M -->|fallback str| P[str obj - null bytes stripped]
N --> Q[json.dumps to PostgreSQL]
O --> Q
P --> Q
Comments Outside Diff (1)
-
litellm/litellm_core_utils/safe_json_dumps.py, line 45-51 (link)Null bytes in dict keys also unstripped in
_serializeMirrors the issue in
strip_null_bytes: the_serializehelper iterates over dict items but never strips null bytes from string keys. Sincesafe_dumpsgoes through_serialize, keys carrying\x00will survive into the final JSON string and can still trigger a PostgreSQL22P05error.
Last reviewed commit: "fix(core): add strip..."
| if isinstance(data, dict): | ||
| return {k: strip_null_bytes(v) for k, v in data.items()} |
There was a problem hiding this comment.
Dict keys with null bytes not stripped
strip_null_bytes() and _serialize() inside safe_dumps both iterate over dict.items() and strip null bytes only from values, not from keys. A dict key containing \x00 will be written to PostgreSQL as-is, which can still trigger a 22P05 error.
# Current (strip_null_bytes):
if isinstance(data, dict):
return {k: strip_null_bytes(v) for k, v in data.items()}
# Also _serialize() inside safe_dumps:
for k, v in obj.items():
if isinstance(k, str):
result[k] = _serialize(v, seen, depth + 1)
# ^^^^ k is never stripped of null bytesConsider stripping null bytes from the key as well:
| if isinstance(data, dict): | |
| return {k: strip_null_bytes(v) for k, v in data.items()} | |
| if isinstance(data, dict): | |
| return { | |
| (k.replace("\x00", "") if isinstance(k, str) else k): strip_null_bytes(v) | |
| for k, v in data.items() | |
| } |
And similarly in _serialize:
for k, v in obj.items():
if isinstance(k, str):
result[k.replace("\x00", "")] = _serialize(v, seen, depth + 1)| def strip_null_bytes(data: Any) -> Any: | ||
| """Recursively remove \\x00 null bytes from strings to prevent PostgreSQL 22P05 errors.""" | ||
| if isinstance(data, str): | ||
| return data.replace("\x00", "") | ||
| if isinstance(data, dict): | ||
| return {k: strip_null_bytes(v) for k, v in data.items()} | ||
| if isinstance(data, list): | ||
| return [strip_null_bytes(item) for item in data] | ||
| if isinstance(data, tuple): | ||
| return tuple(strip_null_bytes(item) for item in data) | ||
| if isinstance(data, set): | ||
| return {strip_null_bytes(item) for item in data} | ||
| return data |
There was a problem hiding this comment.
No tests added for null-byte stripping behavior
The PR description states "The existing safe_json_dumps test suite covers the serialization path," but looking at tests/test_litellm/litellm_core_utils/test_safe_json_dumps.py, there are no new tests that assert \x00 bytes are actually removed. The existing tests cover circular references, max depth, and primitive types — not null-byte stripping.
Per the project's requirement that PRs claiming to fix an issue include evidence via passing tests, at minimum the following cases should be covered:
def test_strip_null_bytes_in_safe_dumps():
assert safe_dumps("hel\x00lo") == '"hello"'
assert json.loads(safe_dumps({"key": "val\x00ue"})) == {"key": "value"}
assert json.loads(safe_dumps(["a\x00b", "c\x00d"])) == ["ab", "cd"]Without these, a future refactor that accidentally removes the .replace("\x00", "") calls would go undetected.
Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)
|
@xykong can we please add relevant tests to ensure this behaviour? |
Summary
Fixes
PostgreSQL 22P05: invalid byte sequence for encoding "UTF8": 0x00errors that occur when LLM request/response payloads containing null bytes are written to spend log tables.Fixes #24310
Related: #21290, #15519
Problem
Null bytes (
\x00/\^@) can appear in LLM payloads — e.g., from multimodal requests, tool call responses, or certain model outputs. When these reach PostgreSQL text columns viajson.dumps(), the DB rejects them with:Changes
litellm/litellm_core_utils/safe_json_dumps.pyAdd
strip_null_bytes()helper and integrate null byte removal intosafe_dumps()at the string serialization level:Inside
_serialize():litellm/proxy/spend_tracking/spend_tracking_utils.pyReplace ad-hoc
json.dumps()withsafe_dumps()in two call sites:Also add early null byte stripping in
_sanitize_request_body_for_spend_logs_payloadstring handling:elif isinstance(value, str): + value = strip_null_bytes(value) if len(value) > max_string_length_prompt_in_db:Why centralize in safe_dumps vs. caller level
The current codebase has ad-hoc
_strip_null_bytes()inproxy/utils.pyfor some paths, butsafe_dumps()is the shared serialization utility. Centralizing here means any future caller ofsafe_dumps()is automatically protected without remembering to strip separately.Testing
The existing
safe_json_dumpstest suite covers the serialization path. New behavior:\x00pass throughsafe_dumps()with null bytes removedmessages,request_body) now usesafe_dumps()Impact
safe_dumps()signature unchanged; output may differ only when input contains\x00strip_null_bytes()exported as public function for reuse