Skip to content

Conversation

danielaskdd
Copy link
Contributor

Fix: Ollama chunk parser fails to handle complete reasoning blocks in single chunk

Problem

The original Ollama streaming chunk parser could not handle complete <think>...</think> blocks within a single chunk. This caused critical issues:

Primary Issue - Complete Reasoning Blocks:

# Input chunk: "Here is text. <think>thought content</think> More text."
# Expected: content="Here is text. More text.", reasoning_content="thought content"
# Actual (BEFORE): Parser failed to detect complete block, content was misprocessed

Secondary Issues - Mixed Content:

  1. Content before <think> tag: "Regular text <think>thought" → "Regular text" was lost or misclassified
  2. Content after </think> tag: "thought</think> more text" → "more text" was lost or misclassified
  3. Whitespace issues: Double spaces appeared after removing reasoning blocks

Root Cause

The original implementation only checked for individual tags (<think> OR </think>), never both simultaneously. It couldn't detect when a complete reasoning block existed in one chunk, leading to incorrect content classification.

Solution

Modified litellm/llms/ollama/chat/transformation.py:

  1. Added complete block detection: Now checks for BOTH tags in same chunk first

    has_start_tag = "<think>" in message_content
    has_end_tag = "</think>" in message_content
    
    if has_start_tag and has_end_tag:  # NEW: Handle complete block
        # Extract reasoning, remove tags, normalize whitespace
  2. Proper content splitting for partial tags:

    • <think> only: Split to separate content before tag from reasoning after
    • </think> only: Split to separate reasoning before tag from content after
  3. Regex-based whitespace normalization: Cleanly removes tags and extra spaces in complete blocks

Added comprehensive test suite tests/llm_responses_api_testing/test_ollama_chunk_parser.py:

New test file with 7 test cases:

  • test_full_think_block_in_one_chunk - Main fix: Complete block handling
  • test_content_before_and_after_think_tag - Complete block with surrounding text
  • test_think_tags_split_across_chunks - Multi-chunk reasoning
  • test_content_before_think_tag - Mixed content before tag
  • test_content_after_think_end_tag - Mixed content after tag
  • test_mixed_content_across_multiple_chunks - Complex multi-chunk scenario
  • test_whitespace_chunks - Edge case handling

All tests passing ✅ | Code quality checks passing ✅

Before/After Comparison

Scenario 1: Complete reasoning block (PRIMARY FIX)

Input: "Preamble. <think>This is a thought.</think> Postamble."

BEFORE (BROKEN):
- Failed to detect complete block
- Incorrect content classification
- Whitespace issues

AFTER (FIXED):
- content = "Preamble. Postamble."
- reasoning_content = "This is a thought."

Scenario 2: Content before tag

Input: "Hello <think>thinking"

BEFORE: reasoning_content = "Hello thinking"AFTER:  content = "Hello ", reasoning_content = "thinking"

Scenario 3: Content after tag

Input: "thought</think> world"

BEFORE: reasoning_content = "thought world"AFTER:  reasoning_content = "thought", content = " world"

Impact

  • Fixes: Critical bug preventing proper handling of complete reasoning blocks in Ollama streaming
  • Improves: Correct content/reasoning classification for all mixed content scenarios
  • Maintains: Full backward compatibility for multi-chunk reasoning (already working)
  • Scope: Only affects Ollama provider's streaming responses with <think> tags

Files Changed

  • litellm/llms/ollama/chat/transformation.py - Fixed chunk_parser to detect complete blocks
  • tests/llm_responses_api_testing/test_ollama_chunk_parser.py - New test suite (7 tests, 144 lines)

Testing

All 7 tests pass:

$ pytest tests/llm_responses_api_testing/test_ollama_chunk_parser.py -v
============== 7 passed in 8.27s ==============

Linter checks pass:

$ ruff check tests/llm_responses_api_testing/test_ollama_chunk_parser.py
# No errors

• Handle complete think blocks in one chunk
• Split content before/after think tags
• Preserve whitespace and mixed content
• Add comprehensive test coverage
• Fix content extraction logic
Copy link

vercel bot commented Oct 3, 2025

@danielaskdd is attempting to deploy a commit to the CLERKIEAI Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants