Fix: Ollama chunk parser fails to handle complete reasoning blocks in single chunk #15166

danielaskdd · 2025-10-03T12:30:27Z

Fix: Ollama chunk parser fails to handle complete reasoning blocks in single chunk

Problem

The original Ollama streaming chunk parser could not handle complete <think>...</think> blocks within a single chunk. This caused critical issues:

Primary Issue - Complete Reasoning Blocks:

# Input chunk: "Here is text. <think>thought content</think> More text."
# Expected: content="Here is text. More text.", reasoning_content="thought content"
# Actual (BEFORE): Parser failed to detect complete block, content was misprocessed

Secondary Issues - Mixed Content:

Content before <think> tag: "Regular text <think>thought" → "Regular text" was lost or misclassified
Content after </think> tag: "thought</think> more text" → "more text" was lost or misclassified
Whitespace issues: Double spaces appeared after removing reasoning blocks

Root Cause

The original implementation only checked for individual tags (<think> OR </think>), never both simultaneously. It couldn't detect when a complete reasoning block existed in one chunk, leading to incorrect content classification.

Solution

Modified litellm/llms/ollama/chat/transformation.py:

Added complete block detection: Now checks for BOTH tags in same chunk first

has_start_tag = "<think>" in message_content
has_end_tag = "</think>" in message_content

if has_start_tag and has_end_tag:  # NEW: Handle complete block
    # Extract reasoning, remove tags, normalize whitespace

Proper content splitting for partial tags:
- <think> only: Split to separate content before tag from reasoning after
- </think> only: Split to separate reasoning before tag from content after
Regex-based whitespace normalization: Cleanly removes tags and extra spaces in complete blocks

Added comprehensive test suite tests/llm_responses_api_testing/test_ollama_chunk_parser.py:

New test file with 7 test cases:

✅ test_full_think_block_in_one_chunk - Main fix: Complete block handling
✅ test_content_before_and_after_think_tag - Complete block with surrounding text
✅ test_think_tags_split_across_chunks - Multi-chunk reasoning
✅ test_content_before_think_tag - Mixed content before tag
✅ test_content_after_think_end_tag - Mixed content after tag
✅ test_mixed_content_across_multiple_chunks - Complex multi-chunk scenario
✅ test_whitespace_chunks - Edge case handling

All tests passing ✅ | Code quality checks passing ✅

Before/After Comparison

Scenario 1: Complete reasoning block (PRIMARY FIX)

Input: "Preamble. <think>This is a thought.</think> Postamble."

BEFORE (BROKEN):
- Failed to detect complete block
- Incorrect content classification
- Whitespace issues

AFTER (FIXED):
- content = "Preamble. Postamble."
- reasoning_content = "This is a thought."

Scenario 2: Content before tag

Input: "Hello <think>thinking"

BEFORE: reasoning_content = "Hello thinking" ❌
AFTER:  content = "Hello ", reasoning_content = "thinking" ✅

Scenario 3: Content after tag

Input: "thought</think> world"

BEFORE: reasoning_content = "thought world" ❌
AFTER:  reasoning_content = "thought", content = " world" ✅

Impact

Fixes: Critical bug preventing proper handling of complete reasoning blocks in Ollama streaming
Improves: Correct content/reasoning classification for all mixed content scenarios
Maintains: Full backward compatibility for multi-chunk reasoning (already working)
Scope: Only affects Ollama provider's streaming responses with <think> tags

Files Changed

litellm/llms/ollama/chat/transformation.py - Fixed chunk_parser to detect complete blocks
tests/llm_responses_api_testing/test_ollama_chunk_parser.py - New test suite (7 tests, 144 lines)

Testing

All 7 tests pass:

$ pytest tests/llm_responses_api_testing/test_ollama_chunk_parser.py -v
============== 7 passed in 8.27s ==============

Linter checks pass:

$ ruff check tests/llm_responses_api_testing/test_ollama_chunk_parser.py
# No errors

• Handle complete think blocks in one chunk • Split content before/after think tags • Preserve whitespace and mixed content • Add comprehensive test coverage • Fix content extraction logic

vercel · 2025-10-03T12:30:33Z

@danielaskdd is attempting to deploy a commit to the CLERKIEAI Team on Vercel.

A member of the Team first needs to authorize it.

tests/test_litellm/llms/ollama/test_ollama_chunk_parser.py

Improve Ollama chunk parsing for reasoning content with think tags

faef3be

• Handle complete think blocks in one chunk • Split content before/after think tags • Preserve whitespace and mixed content • Add comprehensive test coverage • Fix content extraction logic

krrishdholakia reviewed Oct 3, 2025

View reviewed changes

tests/test_litellm/llms/ollama/test_ollama_chunk_parser.py Show resolved Hide resolved

danielaskdd added 2 commits October 4, 2025 12:29

Move Ollama chunk parser test to proper directory structure

70b8b5a

Merge branch 'main' into fix-ollama-think-tag

bf418e6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix: Ollama chunk parser fails to handle complete reasoning blocks in single chunk #15166

Fix: Ollama chunk parser fails to handle complete reasoning blocks in single chunk #15166

danielaskdd commented Oct 3, 2025

Uh oh!

vercel bot commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fix: Ollama chunk parser fails to handle complete reasoning blocks in single chunk #15166

Are you sure you want to change the base?

Fix: Ollama chunk parser fails to handle complete reasoning blocks in single chunk #15166

Conversation

danielaskdd commented Oct 3, 2025