Skip to content

fix(core): add strip_null_bytes() to safe_dumps — prevents PostgreSQL 22P05 errors in spend logs#24314

Open
xykong wants to merge 1 commit intoBerriAI:mainfrom
xykong:fix/strip-null-bytes-in-safe-dumps
Open

fix(core): add strip_null_bytes() to safe_dumps — prevents PostgreSQL 22P05 errors in spend logs#24314
xykong wants to merge 1 commit intoBerriAI:mainfrom
xykong:fix/strip-null-bytes-in-safe-dumps

Conversation

@xykong
Copy link
Contributor

@xykong xykong commented Mar 21, 2026

Summary

Fixes PostgreSQL 22P05: invalid byte sequence for encoding "UTF8": 0x00 errors that occur when LLM request/response payloads containing null bytes are written to spend log tables.

Fixes #24310
Related: #21290, #15519

Problem

Null bytes (\x00 / \^@) can appear in LLM payloads — e.g., from multimodal requests, tool call responses, or certain model outputs. When these reach PostgreSQL text columns via json.dumps(), the DB rejects them with:

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
SQLSTATE: 22P05

Changes

litellm/litellm_core_utils/safe_json_dumps.py

Add strip_null_bytes() helper and integrate null byte removal into safe_dumps() at the string serialization level:

def strip_null_bytes(data: Any) -> Any:
    """Recursively remove \x00 null bytes from strings to prevent PostgreSQL 22P05 errors."""
    if isinstance(data, str):
        return data.replace("\x00", "")
    if isinstance(data, dict):
        return {k: strip_null_bytes(v) for k, v in data.items()}
    if isinstance(data, list):
        return [strip_null_bytes(item) for item in data]
    ...

Inside _serialize():

- if isinstance(obj, (str, int, float, bool, type(None))):
-     return obj
+ if isinstance(obj, str):
+     return obj.replace("\x00", "")   # strip null bytes inline
+ if isinstance(obj, (int, float, bool, type(None))):
+     return obj
  ...
  try:
-     return str(obj)
+     return str(obj).replace("\x00", "")  # also strip fallback str()

litellm/proxy/spend_tracking/spend_tracking_utils.py

Replace ad-hoc json.dumps() with safe_dumps() in two call sites:

- return json.dumps(messages, default=str)
+ return safe_dumps(messages)

- _request_body_json_str = json.dumps(_request_body, default=str)
+ _request_body_json_str = safe_dumps(_request_body)

Also add early null byte stripping in _sanitize_request_body_for_spend_logs_payload string handling:

  elif isinstance(value, str):
+     value = strip_null_bytes(value)
      if len(value) > max_string_length_prompt_in_db:

Why centralize in safe_dumps vs. caller level

The current codebase has ad-hoc _strip_null_bytes() in proxy/utils.py for some paths, but safe_dumps() is the shared serialization utility. Centralizing here means any future caller of safe_dumps() is automatically protected without remembering to strip separately.

Testing

The existing safe_json_dumps test suite covers the serialization path. New behavior:

  • Strings with \x00 pass through safe_dumps() with null bytes removed
  • All spend log serialization paths (messages, request_body) now use safe_dumps()

Impact

  • Minimal scope: 2 files, ~25 lines
  • No breaking changes: safe_dumps() signature unchanged; output may differ only when input contains \x00
  • Backward compatible: strip_null_bytes() exported as public function for reuse

… 22P05 errors

Null bytes (\x00) in LLM request/response payloads cause PostgreSQL to
raise '22P05: invalid byte sequence for encoding UTF8: 0x00' when spend
logs are written to the database.

Changes:
- Add strip_null_bytes() helper to safe_json_dumps.py that recursively
  removes \x00 chars from strings, dicts, lists, tuples and sets
- Inline null byte removal into safe_dumps() _serialize() for str paths
  so all JSON serialization through safe_dumps() is automatically safe
- In spend_tracking_utils.py: replace json.dumps() with safe_dumps() for
  messages and request_body serialization; add strip_null_bytes() call
  in _sanitize_request_body_for_spend_logs_payload string handling

Centralizing the fix in safe_dumps() is more robust than ad-hoc
stripping at each call site.

Fixes BerriAI#24310
Related: BerriAI#21290, BerriAI#15519
@vercel
Copy link

vercel bot commented Mar 21, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 21, 2026 6:42pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 21, 2026

Greptile Summary

This PR centralizes null-byte stripping into safe_dumps() and its two call sites in spend_tracking_utils.py to prevent PostgreSQL 22P05 (invalid byte sequence for encoding "UTF8": 0x00) errors when LLM payloads containing null bytes are written to spend log tables.

Key changes:

  • A new strip_null_bytes() helper is added to safe_json_dumps.py that recursively strips \x00 from strings in dicts, lists, tuples, and sets.
  • _serialize() inside safe_dumps now calls .replace("\x00", "") on string primitives and fallback str() conversions.
  • Two json.dumps(..., default=str) calls in spend_tracking_utils.py are replaced with safe_dumps().
  • An early strip_null_bytes(value) call is added inside _sanitize_value before the truncation length check, ensuring the threshold is measured against the already-cleaned string.

Issues found:

  • Both strip_null_bytes() and _serialize() strip null bytes only from dictionary values, not from dictionary keys. A \x00 in a key will survive into the final JSON string and can still trigger a PostgreSQL 22P05 error.
  • No new tests verify the null-byte stripping behavior. The PR description implies the existing test suite covers this, but the test file contains no assertions involving \x00, leaving the fix without a regression guard.

Confidence Score: 3/5

  • Mostly safe to merge — the fix correctly handles the most common null-byte paths — but two gaps remain: dict keys are not stripped and there are no regression tests for the new behavior.
  • The core fix is sound and all main spend-log serialization paths now go through safe_dumps. However, both strip_null_bytes() and _serialize() inside safe_dumps skip null-byte removal for dictionary keys, leaving a residual way to trigger the PostgreSQL 22P05 error. Additionally, no tests assert the new behavior, meaning a future refactor could silently reintroduce the bug.
  • litellm/litellm_core_utils/safe_json_dumps.py — dict-key stripping gap and missing test coverage

Important Files Changed

Filename Overview
litellm/litellm_core_utils/safe_json_dumps.py Adds strip_null_bytes() helper and integrates null-byte stripping into _serialize(). Both functions correctly strip null bytes from string values and fallback str() conversions, but neither strips null bytes from dictionary keys, leaving a residual path that can still trigger PostgreSQL 22P05. No new tests cover the null-byte behavior.
litellm/proxy/spend_tracking/spend_tracking_utils.py Replaces two ad-hoc json.dumps(..., default=str) calls with safe_dumps() and adds an early strip_null_bytes() call before the truncation length check in _sanitize_value. Changes are correct and well-scoped; the early strip ensures the truncation threshold is measured on the already-cleaned string.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LLM Request/Response Payload] --> B{spend_tracking_utils}
    B --> C[_get_messages_for_spend_logs_payload]
    B --> D[_get_proxy_server_request_for_spend_logs_payload]
    B --> E[_get_response_for_spend_logs_payload]

    C -->|safe_dumps| F[safe_dumps]
    D --> G[_sanitize_request_body]
    E --> G

    G -->|string values| H[strip_null_bytes - value]
    H --> I{len gt max_string_length?}
    I -->|yes| J[truncate string]
    I -->|no| K[keep as-is]
    J --> L[safe_dumps]
    K --> L

    F --> M[_serialize - strips str values]
    L --> M

    M -->|dict values| N[null bytes stripped in values]
    M -->|dict keys| O[WARNING - keys NOT stripped]
    M -->|fallback str| P[str obj - null bytes stripped]

    N --> Q[json.dumps to PostgreSQL]
    O --> Q
    P --> Q
Loading

Comments Outside Diff (1)

  1. litellm/litellm_core_utils/safe_json_dumps.py, line 45-51 (link)

    P1 Null bytes in dict keys also unstripped in _serialize

    Mirrors the issue in strip_null_bytes: the _serialize helper iterates over dict items but never strips null bytes from string keys. Since safe_dumps goes through _serialize, keys carrying \x00 will survive into the final JSON string and can still trigger a PostgreSQL 22P05 error.

Last reviewed commit: "fix(core): add strip..."

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing xykong:fix/strip-null-bytes-in-safe-dumps (220f39b) with main (b64b0d4)

Open in CodSpeed

Comment on lines +13 to +14
if isinstance(data, dict):
return {k: strip_null_bytes(v) for k, v in data.items()}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Dict keys with null bytes not stripped

strip_null_bytes() and _serialize() inside safe_dumps both iterate over dict.items() and strip null bytes only from values, not from keys. A dict key containing \x00 will be written to PostgreSQL as-is, which can still trigger a 22P05 error.

# Current (strip_null_bytes):
if isinstance(data, dict):
    return {k: strip_null_bytes(v) for k, v in data.items()}

# Also _serialize() inside safe_dumps:
for k, v in obj.items():
    if isinstance(k, str):
        result[k] = _serialize(v, seen, depth + 1)
    # ^^^^ k is never stripped of null bytes

Consider stripping null bytes from the key as well:

Suggested change
if isinstance(data, dict):
return {k: strip_null_bytes(v) for k, v in data.items()}
if isinstance(data, dict):
return {
(k.replace("\x00", "") if isinstance(k, str) else k): strip_null_bytes(v)
for k, v in data.items()
}

And similarly in _serialize:

for k, v in obj.items():
    if isinstance(k, str):
        result[k.replace("\x00", "")] = _serialize(v, seen, depth + 1)

Comment on lines +9 to +21
def strip_null_bytes(data: Any) -> Any:
"""Recursively remove \\x00 null bytes from strings to prevent PostgreSQL 22P05 errors."""
if isinstance(data, str):
return data.replace("\x00", "")
if isinstance(data, dict):
return {k: strip_null_bytes(v) for k, v in data.items()}
if isinstance(data, list):
return [strip_null_bytes(item) for item in data]
if isinstance(data, tuple):
return tuple(strip_null_bytes(item) for item in data)
if isinstance(data, set):
return {strip_null_bytes(item) for item in data}
return data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 No tests added for null-byte stripping behavior

The PR description states "The existing safe_json_dumps test suite covers the serialization path," but looking at tests/test_litellm/litellm_core_utils/test_safe_json_dumps.py, there are no new tests that assert \x00 bytes are actually removed. The existing tests cover circular references, max depth, and primitive types — not null-byte stripping.

Per the project's requirement that PRs claiming to fix an issue include evidence via passing tests, at minimum the following cases should be covered:

def test_strip_null_bytes_in_safe_dumps():
    assert safe_dumps("hel\x00lo") == '"hello"'
    assert json.loads(safe_dumps({"key": "val\x00ue"})) == {"key": "value"}
    assert json.loads(safe_dumps(["a\x00b", "c\x00d"])) == ["ab", "cd"]

Without these, a future refactor that accidentally removes the .replace("\x00", "") calls would go undetected.

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

@RheagalFire
Copy link
Collaborator

@xykong can we please add relevant tests to ensure this behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Null bytes (\x00) in LLM request/response payloads cause PostgreSQL 22P05 error in spend logs

2 participants