Skip to content

[Bug]: Null bytes (\x00) in LLM request/response payloads cause PostgreSQL 22P05 error in spend logs #24310

@xykong

Description

@xykong

Description

When LLM request or response content contains null bytes (\x00 / \^@ characters), the spend log write to PostgreSQL fails with:

ERROR: invalid byte sequence for encoding "UTF8": 0x00
SQLSTATE: 22P05

This causes update_spend_logs to fail silently or raise an exception, losing spend tracking data.

Root Cause

The safe_dumps() function in litellm/litellm_core_utils/safe_json_dumps.py does not strip null bytes before serialization. When the resulting JSON string is written to PostgreSQL, the database rejects the \x00 character (which is invalid in PostgreSQL UTF-8 text columns).

The issue affects any field that passes through spend_tracking_utils.py:

  • messages field (from _get_messages_for_spend_logs_payload)
  • request_body field (from _get_proxy_server_request_for_spend_logs_payload)
  • Any string value in the spend log payload

Current Workaround Pattern

Some call sites in proxy/utils.py have ad-hoc _strip_null_bytes() helpers, but the core serialization path still allows null bytes to pass through.

Proposed Fix

Add null byte stripping directly into safe_dumps() so all serialization paths are protected:

# litellm/litellm_core_utils/safe_json_dumps.py

def strip_null_bytes(data: Any) -> Any:
    """Recursively remove \\x00 null bytes from strings to prevent PostgreSQL 22P05 errors."""
    if isinstance(data, str):
        return data.replace("\x00", "")
    if isinstance(data, dict):
        return {k: strip_null_bytes(v) for k, v in data.items()}
    if isinstance(data, list):
        return [strip_null_bytes(item) for item in data]
    if isinstance(data, tuple):
        return tuple(strip_null_bytes(item) for item in data)
    if isinstance(data, set):
        return {strip_null_bytes(item) for item in data}
    return data


def safe_dumps(data: Any, max_depth: int = DEFAULT_MAX_RECURSE_DEPTH) -> str:
    def _serialize(obj, depth, seen):
        ...
        if isinstance(obj, str):
            return strip_null_bytes(obj)   # ← strip here
        ...
        try:
            return strip_null_bytes(str(obj))   # ← and here for fallback
        except Exception:
            return "Unserializable Object"

Additionally, replace ad-hoc json.dumps() calls in spend_tracking_utils.py with safe_dumps():

# Before
return json.dumps(messages, default=str)
_request_body_json_str = json.dumps(_request_body, default=str)

# After  
return safe_dumps(messages)
_request_body_json_str = safe_dumps(_request_body)

Why in safe_dumps vs. caller level

Centralizing null byte stripping in safe_dumps() ensures all serialization paths are protected without requiring every call site to remember to strip. This is more robust than the current ad-hoc approach.

Related Issues

I have a PR ready with tests.

Environment

  • LiteLLM proxy with PostgreSQL backend
  • Triggered by: multimodal requests, tool use responses, or any model that returns binary/null content in its output

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions