Skip to content

Conversation

@reshab48
Copy link

@reshab48 reshab48 commented Jun 28, 2025

This implementation addresses parsing failures that occur when large JSON responses from Claude Code CLI are split across multiple lines due to stream buffering. The solution introduces an incomplete JSON buffer reconstruction system with deferred yielding to ensure reliable message parsing.

Implementation Details

Core Problem

When JSON objects exceed buffer boundaries, they arrive split across multiple lines:
Line 1: {"type": "message", "data": "very long content that gets cut off mid-
Line 2: way through the JSON object"}

Solution Architecture

  1. Incomplete JSON Detection
  • Monitor json.JSONDecodeError exceptions
  • Distinguish between incomplete JSON (starts with {/[) vs genuinely malformed JSON
  • Only treat as incomplete if the JSON appears to be cut off mid-stream
  1. Buffer Reconstruction System
incomplete_json_line_str = None  # Persistent buffer across iterations

if incomplete_json_line_str:
    # Reconstruct: previous incomplete + current line
    line_str = incomplete_json_line_str + line_str
  1. Deferred Yielding Pattern

Since reconstruction may reprocess the same line content multiple times, yielding must be
deferred:

parsed_json_outputs = []  # Collection phase

# Parse all JSON objects in the line
for json_line in line_str.split("\n"):
    try:
        data = json.loads(json_line)
        incomplete_json_line_str = None  # Clear buffer on success
        parsed_json_outputs.append(data)  # Collect, don't yield
    except json.JSONDecodeError:
        if json_line.startswith("{"):
            incomplete_json_line_str = line_str  # Buffer entire line
            break  # Stop processing this line

Yielding phase - only when no incomplete JSON pending

if not incomplete_json_line_str:
    for json_output_data in parsed_json_outputs:
        yield json_output_data

Multi-Iteration Processing Flow

Iteration 1 (Line: valid_json1\nvalid_json2\nincomplete_large_json_part1):

  • Parse valid_json1 ✅ → collect
  • Parse valid_json2 ✅ → collect
  • Parse incomplete_large_json_part1 ❌ → buffer entire line
  • Skip yielding phase (incomplete JSON pending)

Iteration 2 (Line: large_json_part2):

  • Reconstruct: valid_json1\nvalid_json2\nincomplete_large_json_part1large_json_part2
  • Parse valid_json1 ✅ → collect
  • Parse valid_json2 ✅ → collect
  • Parse complete_large_json ✅ → collect, clear buffer
  • Yield all three messages

Key Design Decisions

  1. Line-level buffering: Buffer the entire line (not just the incomplete JSON) to handle mixed
    valid/invalid content
  2. Break on incomplete: Stop processing the current line when incomplete JSON is detected
  3. Conditional yielding: Only yield when no incomplete JSON is pending reconstruction
  4. Buffer reset: Clear the incomplete buffer only after successful parsing

Error Handling

  • Malformed but complete JSON (ends with } or ]) still raises CLIJSONDecodeError
  • Only buffer JSON that appears genuinely incomplete due to stream splitting
  • Preserve existing error semantics for debugging

Impact

This implementation ensures that large JSON responses are never lost due to buffer boundaries
while maintaining the integrity of the message stream and preventing duplicate outputs during
the reconstruction process.

@ltawfik
Copy link
Collaborator

ltawfik commented Jul 1, 2025

Thanks, closing in favor of #53 which provides a cleaner solution without the critical bugs (infinite loop, silent data loss) found in this implementation.

@ltawfik ltawfik closed this Jul 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants