Skip to content

fix: sanitize surrogate characters in messages before sending to LLM#4987

Open
graydeon wants to merge 1 commit intoAider-AI:mainfrom
graydeon:fix/unicode-surrogate-error
Open

fix: sanitize surrogate characters in messages before sending to LLM#4987
graydeon wants to merge 1 commit intoAider-AI:mainfrom
graydeon:fix/unicode-surrogate-error

Conversation

@graydeon
Copy link
Copy Markdown

@graydeon graydeon commented Apr 3, 2026

Summary

Fixes #3460 — On Windows with certain locales (e.g. Chinese edition), file content or console input can contain surrogate characters (\udcb0) that cause UnicodeEncodeError when httpx tries to JSON-encode the outgoing request to the LLM provider.

Adds sanitize_for_utf8() that recursively walks the message structure and replaces any surrogates with the Unicode replacement character, applied in send_completion() right before messages are passed to litellm.completion().

Test plan

  • New TestSanitizeForUtf8 test class with 4 tests covering surrogates in strings, nested message structures, non-surrogate Unicode preservation, and non-string passthrough
  • pytest tests/basic/test_models.py — all 27 tests pass

On Windows systems with certain locales (e.g. Chinese edition), file
content or console input can contain surrogate characters that cause
UnicodeEncodeError when httpx tries to JSON-encode the outgoing LLM
request.

Add sanitize_for_utf8() that recursively walks the message structure
and replaces surrogates with the Unicode replacement character before
passing messages to litellm.completion().

Fixes Aider-AI#3460
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 3, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UnicodeEncodeError: 'utf-8' codec can't encode character '\udcb0' in position 5044: surrogates not allowed

2 participants