Skip to content

Conversation

@grytrn
Copy link

@grytrn grytrn commented Jun 21, 2025

Summary

Fixes #32 - JSON parsing fails at position 130 for large tool results

This PR replaces anyio.TextReceiveStream with raw byte stream reading to fix a critical bug where the SDK fails to parse JSON responses larger than ~10KB that contain UTF-8 characters.

The Problem

When Claude reads files or generates large tool results, the SDK consistently fails with:

json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 131 (char 130)

The issue occurs because TextReceiveStream appears to corrupt large lines containing UTF-8 characters (like the arrow used in line numbers).

The Solution

This fix:

  • Removes the dependency on TextReceiveStream
  • Reads stdout/stderr directly as byte streams
  • Implements manual line buffering with proper UTF-8 decoding
  • Handles remaining buffer content after stream closure

Testing

Tested with:

  • ✅ Large file reads (>30KB)
  • ✅ Files containing UTF-8 characters and emojis
  • ✅ Multiple concurrent tool uses
  • ✅ Various line ending formats
  • ✅ Incomplete JSON at stream end

Reproduction

The bug can be reproduced by asking Claude to read any Python file larger than 10KB:

from claude_code_sdk import query, ClaudeCodeOptions

async for msg in query(
    prompt="Read the file whatsapp_claude_chat.py",
    options=ClaudeCodeOptions(allowed_tools=["*"])
):
    print(msg)  # Fails with JSONDecodeError

Technical Details

The root cause appears to be in how anyio.TextReceiveStream handles buffering for large lines. By reading raw bytes and manually handling line splitting and UTF-8 decoding, we avoid this issue entirely.

Fixes anthropics#32 - JSON parsing fails for large tool results

The anyio.TextReceiveStream has issues when reading large lines (>10KB)
containing UTF-8 characters, causing JSON parsing to fail at position 130.

This fix:
- Removes dependency on TextReceiveStream
- Reads stdout/stderr as raw byte streams
- Manually handles line buffering and UTF-8 decoding
- Properly processes remaining buffer content

The issue occurred because TextReceiveStream appears to have a bug when
handling large lines with specific content patterns, particularly those
containing UTF-8 characters like the arrow symbol (→) used in line numbers.

Tested with:
- Large file reads (>30KB)
- Files containing UTF-8 characters
- Multiple concurrent tool uses
@tim-watcha
Copy link

This does work for me! Thank you

@ltawfik
Copy link
Collaborator

ltawfik commented Jun 27, 2025

Thanks for the PR! This issue was already fixed in commit 97c651b which removed TextReceiveStream and implemented raw byte stream reading for the same problem (issue #32).

@ltawfik
Copy link
Collaborator

ltawfik commented Jun 28, 2025

Thanks @grytrn .This issue was already fixed in PR #5. The current implementation correctly handles multiple JSON objects on one line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JSONDecodeError: Subprocess buffer truncates large messages

3 participants