fix(core): add strip_null_bytes() to safe_dumps — prevents PostgreSQL 22P05 errors in spend logs by xykong · Pull Request #24314 · BerriAI/litellm

xykong · 2026-03-21T18:40:22Z

Summary

Fixes PostgreSQL 22P05: invalid byte sequence for encoding "UTF8": 0x00 errors that occur when LLM request/response payloads containing null bytes are written to spend log tables.

Fixes #24310
Related: #21290, #15519

Problem

Null bytes (\x00 / \^@) can appear in LLM payloads — e.g., from multimodal requests, tool call responses, or certain model outputs. When these reach PostgreSQL text columns via json.dumps(), the DB rejects them with:

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
SQLSTATE: 22P05

Changes

`litellm/litellm_core_utils/safe_json_dumps.py`

Add strip_null_bytes() helper and integrate null byte removal into safe_dumps() at the string serialization level:

def strip_null_bytes(data: Any) -> Any:
    """Recursively remove \x00 null bytes from strings to prevent PostgreSQL 22P05 errors."""
    if isinstance(data, str):
        return data.replace("\x00", "")
    if isinstance(data, dict):
        return {k: strip_null_bytes(v) for k, v in data.items()}
    if isinstance(data, list):
        return [strip_null_bytes(item) for item in data]
    ...

Inside _serialize():

- if isinstance(obj, (str, int, float, bool, type(None))):
-     return obj
+ if isinstance(obj, str):
+     return obj.replace("\x00", "")   # strip null bytes inline
+ if isinstance(obj, (int, float, bool, type(None))):
+     return obj
  ...
  try:
-     return str(obj)
+     return str(obj).replace("\x00", "")  # also strip fallback str()

`litellm/proxy/spend_tracking/spend_tracking_utils.py`

Replace ad-hoc json.dumps() with safe_dumps() in two call sites:

- return json.dumps(messages, default=str)
+ return safe_dumps(messages)

- _request_body_json_str = json.dumps(_request_body, default=str)
+ _request_body_json_str = safe_dumps(_request_body)

Also add early null byte stripping in _sanitize_request_body_for_spend_logs_payload string handling:

  elif isinstance(value, str):
+     value = strip_null_bytes(value)
      if len(value) > max_string_length_prompt_in_db:

Why centralize in safe_dumps vs. caller level

The current codebase has ad-hoc _strip_null_bytes() in proxy/utils.py for some paths, but safe_dumps() is the shared serialization utility. Centralizing here means any future caller of safe_dumps() is automatically protected without remembering to strip separately.

Testing

The existing safe_json_dumps test suite covers the serialization path. New behavior:

Strings with \x00 pass through safe_dumps() with null bytes removed
All spend log serialization paths (messages, request_body) now use safe_dumps()

Impact

Minimal scope: 2 files, ~25 lines
No breaking changes: safe_dumps() signature unchanged; output may differ only when input contains \x00
Backward compatible: strip_null_bytes() exported as public function for reuse

… 22P05 errors Null bytes (\x00) in LLM request/response payloads cause PostgreSQL to raise '22P05: invalid byte sequence for encoding UTF8: 0x00' when spend logs are written to the database. Changes: - Add strip_null_bytes() helper to safe_json_dumps.py that recursively removes \x00 chars from strings, dicts, lists, tuples and sets - Inline null byte removal into safe_dumps() _serialize() for str paths so all JSON serialization through safe_dumps() is automatically safe - In spend_tracking_utils.py: replace json.dumps() with safe_dumps() for messages and request_body serialization; add strip_null_bytes() call in _sanitize_request_body_for_spend_logs_payload string handling Centralizing the fix in safe_dumps() is more robust than ad-hoc stripping at each call site. Fixes BerriAI#24310 Related: BerriAI#21290, BerriAI#15519

vercel · 2026-03-21T18:40:29Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 21, 2026 6:42pm

greptile-apps · 2026-03-21T18:42:55Z

Greptile Summary

This PR centralizes null-byte stripping into safe_dumps() and its two call sites in spend_tracking_utils.py to prevent PostgreSQL 22P05 (invalid byte sequence for encoding "UTF8": 0x00) errors when LLM payloads containing null bytes are written to spend log tables.

Key changes:

A new strip_null_bytes() helper is added to safe_json_dumps.py that recursively strips \x00 from strings in dicts, lists, tuples, and sets.
_serialize() inside safe_dumps now calls .replace("\x00", "") on string primitives and fallback str() conversions.
Two json.dumps(..., default=str) calls in spend_tracking_utils.py are replaced with safe_dumps().
An early strip_null_bytes(value) call is added inside _sanitize_value before the truncation length check, ensuring the threshold is measured against the already-cleaned string.

Issues found:

Both strip_null_bytes() and _serialize() strip null bytes only from dictionary values, not from dictionary keys. A \x00 in a key will survive into the final JSON string and can still trigger a PostgreSQL 22P05 error.
No new tests verify the null-byte stripping behavior. The PR description implies the existing test suite covers this, but the test file contains no assertions involving \x00, leaving the fix without a regression guard.

Confidence Score: 3/5

Mostly safe to merge — the fix correctly handles the most common null-byte paths — but two gaps remain: dict keys are not stripped and there are no regression tests for the new behavior.
The core fix is sound and all main spend-log serialization paths now go through safe_dumps. However, both strip_null_bytes() and _serialize() inside safe_dumps skip null-byte removal for dictionary keys, leaving a residual way to trigger the PostgreSQL 22P05 error. Additionally, no tests assert the new behavior, meaning a future refactor could silently reintroduce the bug.
litellm/litellm_core_utils/safe_json_dumps.py — dict-key stripping gap and missing test coverage

Important Files Changed

Filename	Overview
litellm/litellm_core_utils/safe_json_dumps.py	Adds `strip_null_bytes()` helper and integrates null-byte stripping into `_serialize()`. Both functions correctly strip null bytes from string values and fallback `str()` conversions, but neither strips null bytes from dictionary keys, leaving a residual path that can still trigger PostgreSQL 22P05. No new tests cover the null-byte behavior.
litellm/proxy/spend_tracking/spend_tracking_utils.py	Replaces two ad-hoc `json.dumps(..., default=str)` calls with `safe_dumps()` and adds an early `strip_null_bytes()` call before the truncation length check in `_sanitize_value`. Changes are correct and well-scoped; the early strip ensures the truncation threshold is measured on the already-cleaned string.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[LLM Request/Response Payload] --> B{spend_tracking_utils}
    B --> C[_get_messages_for_spend_logs_payload]
    B --> D[_get_proxy_server_request_for_spend_logs_payload]
    B --> E[_get_response_for_spend_logs_payload]

    C -->|safe_dumps| F[safe_dumps]
    D --> G[_sanitize_request_body]
    E --> G

    G -->|string values| H[strip_null_bytes - value]
    H --> I{len gt max_string_length?}
    I -->|yes| J[truncate string]
    I -->|no| K[keep as-is]
    J --> L[safe_dumps]
    K --> L

    F --> M[_serialize - strips str values]
    L --> M

    M -->|dict values| N[null bytes stripped in values]
    M -->|dict keys| O[WARNING - keys NOT stripped]
    M -->|fallback str| P[str obj - null bytes stripped]

    N --> Q[json.dumps to PostgreSQL]
    O --> Q
    P --> Q

Comments Outside Diff (1)

litellm/litellm_core_utils/safe_json_dumps.py, line 45-51 (link)

Null bytes in dict keys also unstripped in _serialize

Mirrors the issue in strip_null_bytes: the _serialize helper iterates over dict items but never strips null bytes from string keys. Since safe_dumps goes through _serialize, keys carrying \x00 will survive into the final JSON string and can still trigger a PostgreSQL 22P05 error.

_{Last reviewed commit: "fix(core): add strip..."}

codspeed-hq · 2026-03-21T18:43:03Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing xykong:fix/strip-null-bytes-in-safe-dumps (220f39b) with main (b64b0d4)}

greptile-apps · 2026-03-21T18:42:59Z

litellm/litellm_core_utils/safe_json_dumps.py

+    if isinstance(data, dict):
+        return {k: strip_null_bytes(v) for k, v in data.items()}


Dict keys with null bytes not stripped

strip_null_bytes() and _serialize() inside safe_dumps both iterate over dict.items() and strip null bytes only from values, not from keys. A dict key containing \x00 will be written to PostgreSQL as-is, which can still trigger a 22P05 error.

# Current (strip_null_bytes): if isinstance(data, dict): return {k: strip_null_bytes(v) for k, v in data.items()} # Also _serialize() inside safe_dumps: for k, v in obj.items(): if isinstance(k, str): result[k] = _serialize(v, seen, depth + 1) # ^^^^ k is never stripped of null bytes

Consider stripping null bytes from the key as well:

Suggested change

if isinstance(data, dict):

return {k: strip_null_bytes(v) for k, v in data.items()}

if isinstance(data, dict):

return {

(k.replace("\x00", "") if isinstance(k, str) else k): strip_null_bytes(v)

for k, v in data.items()

}

And similarly in _serialize:

for k, v in obj.items(): if isinstance(k, str): result[k.replace("\x00", "")] = _serialize(v, seen, depth + 1)

greptile-apps · 2026-03-21T18:43:03Z

litellm/litellm_core_utils/safe_json_dumps.py

+def strip_null_bytes(data: Any) -> Any:
+    """Recursively remove \\x00 null bytes from strings to prevent PostgreSQL 22P05 errors."""
+    if isinstance(data, str):
+        return data.replace("\x00", "")
+    if isinstance(data, dict):
+        return {k: strip_null_bytes(v) for k, v in data.items()}
+    if isinstance(data, list):
+        return [strip_null_bytes(item) for item in data]
+    if isinstance(data, tuple):
+        return tuple(strip_null_bytes(item) for item in data)
+    if isinstance(data, set):
+        return {strip_null_bytes(item) for item in data}
+    return data


No tests added for null-byte stripping behavior

The PR description states "The existing safe_json_dumps test suite covers the serialization path," but looking at tests/test_litellm/litellm_core_utils/test_safe_json_dumps.py, there are no new tests that assert \x00 bytes are actually removed. The existing tests cover circular references, max depth, and primitive types — not null-byte stripping.

Per the project's requirement that PRs claiming to fix an issue include evidence via passing tests, at minimum the following cases should be covered:

def test_strip_null_bytes_in_safe_dumps(): assert safe_dumps("hel\x00lo") == '"hello"' assert json.loads(safe_dumps({"key": "val\x00ue"})) == {"key": "value"} assert json.loads(safe_dumps(["a\x00b", "c\x00d"])) == ["ab", "cd"]

Without these, a future refactor that accidentally removes the .replace("\x00", "") calls would go undetected.

Rule Used: What: Ensure that any PR claiming to fix an issue ... (source)

RheagalFire · 2026-03-22T09:23:37Z

@xykong can we please add relevant tests to ensure this behaviour?

vercel bot deployed to Preview March 21, 2026 18:42 View deployment

greptile-apps bot reviewed Mar 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(core): add strip_null_bytes() to safe_dumps — prevents PostgreSQL 22P05 errors in spend logs#24314

fix(core): add strip_null_bytes() to safe_dumps — prevents PostgreSQL 22P05 errors in spend logs#24314
xykong wants to merge 1 commit intoBerriAI:mainfrom
xykong:fix/strip-null-bytes-in-safe-dumps

xykong commented Mar 21, 2026

Uh oh!

vercel bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 21, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (1)

Uh oh!

codspeed-hq bot commented Mar 21, 2026

Uh oh!

greptile-apps bot Mar 21, 2026

Uh oh!

greptile-apps bot Mar 21, 2026

Uh oh!

RheagalFire commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if isinstance(data, dict):
		return {k: strip_null_bytes(v) for k, v in data.items()}

Uh oh!

Conversation

xykong commented Mar 21, 2026

Summary

Problem

Changes

litellm/litellm_core_utils/safe_json_dumps.py

litellm/proxy/spend_tracking/spend_tracking_utils.py

Why centralize in safe_dumps vs. caller level

Testing

Impact

Uh oh!

vercel bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (1)

Uh oh!

codspeed-hq bot commented Mar 21, 2026

Merging this PR will not alter performance

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Mar 21, 2026

Choose a reason for hiding this comment

Uh oh!

RheagalFire commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`litellm/litellm_core_utils/safe_json_dumps.py`

`litellm/proxy/spend_tracking/spend_tracking_utils.py`

vercel bot commented Mar 21, 2026 •

edited

Loading

greptile-apps bot commented Mar 21, 2026 •

edited

Loading