-
-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Description
Description
When LLM request or response content contains null bytes (\x00 / \^@ characters), the spend log write to PostgreSQL fails with:
ERROR: invalid byte sequence for encoding "UTF8": 0x00
SQLSTATE: 22P05
This causes update_spend_logs to fail silently or raise an exception, losing spend tracking data.
Root Cause
The safe_dumps() function in litellm/litellm_core_utils/safe_json_dumps.py does not strip null bytes before serialization. When the resulting JSON string is written to PostgreSQL, the database rejects the \x00 character (which is invalid in PostgreSQL UTF-8 text columns).
The issue affects any field that passes through spend_tracking_utils.py:
messagesfield (from_get_messages_for_spend_logs_payload)request_bodyfield (from_get_proxy_server_request_for_spend_logs_payload)- Any string value in the spend log payload
Current Workaround Pattern
Some call sites in proxy/utils.py have ad-hoc _strip_null_bytes() helpers, but the core serialization path still allows null bytes to pass through.
Proposed Fix
Add null byte stripping directly into safe_dumps() so all serialization paths are protected:
# litellm/litellm_core_utils/safe_json_dumps.py
def strip_null_bytes(data: Any) -> Any:
"""Recursively remove \\x00 null bytes from strings to prevent PostgreSQL 22P05 errors."""
if isinstance(data, str):
return data.replace("\x00", "")
if isinstance(data, dict):
return {k: strip_null_bytes(v) for k, v in data.items()}
if isinstance(data, list):
return [strip_null_bytes(item) for item in data]
if isinstance(data, tuple):
return tuple(strip_null_bytes(item) for item in data)
if isinstance(data, set):
return {strip_null_bytes(item) for item in data}
return data
def safe_dumps(data: Any, max_depth: int = DEFAULT_MAX_RECURSE_DEPTH) -> str:
def _serialize(obj, depth, seen):
...
if isinstance(obj, str):
return strip_null_bytes(obj) # ← strip here
...
try:
return strip_null_bytes(str(obj)) # ← and here for fallback
except Exception:
return "Unserializable Object"Additionally, replace ad-hoc json.dumps() calls in spend_tracking_utils.py with safe_dumps():
# Before
return json.dumps(messages, default=str)
_request_body_json_str = json.dumps(_request_body, default=str)
# After
return safe_dumps(messages)
_request_body_json_str = safe_dumps(_request_body)Why in safe_dumps vs. caller level
Centralizing null byte stripping in safe_dumps() ensures all serialization paths are protected without requiring every call site to remember to strip. This is more robust than the current ad-hoc approach.
Related Issues
- [Bug]: update_spend_logs fails with PostgreSQL 22P05 when store_prompts_in_spend_logs contains \u0000 text #21290 (open) —
update_spend_logsfails with PostgreSQL 22P05 - [Bug]: DB exception in update_spend job #15519 (closed) — DB exception in update_spend caused by null bytes
I have a PR ready with tests.
Environment
- LiteLLM proxy with PostgreSQL backend
- Triggered by: multimodal requests, tool use responses, or any model that returns binary/null content in its output