Skip to content

fix: extract <think> tags from streaming content in agent and team paths#6654

Open
hztBUAA wants to merge 7 commits intoagno-agi:mainfrom
hztBUAA:fix/workflow-thinking-mode-content
Open

fix: extract <think> tags from streaming content in agent and team paths#6654
hztBUAA wants to merge 7 commits intoagno-agi:mainfrom
hztBUAA:fix/workflow-thinking-mode-content

Conversation

@hztBUAA
Copy link
Contributor

@hztBUAA hztBUAA commented Feb 19, 2026

Summary

Fixes workflow steps producing empty content when using thinking models (e.g., qwen3 with enable_thinking=True) served via OpenAI-compatible APIs.

Root cause: In the streaming path, models that embed reasoning in <think>...</think> tags (qwen3, deepseek-r1 via OpenAI-compatible endpoints like vLLM, DashScope) accumulate raw tags in run_response.content. The non-streaming path already extracted these tags at the provider level (_parse_provider_response), but the streaming path had no post-streaming extraction. This caused downstream workflow steps to receive content with embedded <think> tags, confusing the next model and producing empty output.

Fix: Add extract_thinking_content() after the streaming loop completes in both the agent and team response handlers (sync + async = 4 locations). A guard condition ("</think>" in content) ensures zero cost for non-thinking models.

Fixes #6305

Changes

File Change
agno/agent/_response.py Extract <think> tags after streaming in handle_model_response_stream (sync) and ahandle_model_response_stream (async)
agno/team/_response.py Same extraction in _handle_model_response_stream (sync) and _ahandle_model_response_stream (async)
tests/unit/workflow/test_thinking_content.py 19 unit tests: extract_thinking_content, _process_step_output, _prepare_message, streaming extraction simulation, end-to-end workflow flow
tests/unit/team/test_thinking_content.py 8 unit tests: team streaming extraction, propagation to TeamRunOutput, edge cases (None, non-string, incomplete tags, native reasoning preserved)

Type of change

  • Bug fix

Checklist

  • Code complies with style guidelines
  • Ran format/validation scripts (./scripts/format.sh and ./scripts/validate.sh)
  • Self-review completed
  • Documentation updated (comments, docstrings)
  • Tested in clean environment
  • Tests added/updated (if applicable)

Testing

  • Unit tests: 891 passed (agent + team + workflow), 0 failures
  • Integration tests: Verified with real Ollama models (qwen3:4b via OpenAILike, native Ollama, llama3.2)
  • Non-thinking models: Confirmed zero impact (guard condition skips entirely)
  • format.sh / validate.sh: All pass, mypy 0 issues

Additional Notes

  • Native Ollama API separates thinking into a dedicated thinking field (no tags in content). The fix targets OpenAI-compatible API endpoints where thinking is embedded as <think> tags.
  • Existing reasoning_content from native model fields is never overwritten by tag-based extraction.

@hztBUAA hztBUAA requested a review from a team as a code owner February 19, 2026 21:57
@chatgpt-codex-connector
Copy link

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@hztBUAA
Copy link
Contributor Author

hztBUAA commented Feb 25, 2026

Thanks for the review and feedback. I am following up on this PR now and will either push the requested changes or reply point-by-point shortly.

@hztBUAA
Copy link
Contributor Author

hztBUAA commented Feb 25, 2026

Quick follow-up: I am reviewing the feedback and will update this PR shortly.

harshsinha03 and others added 3 commits March 5, 2026 18:37
Apply the same <think> tag extraction fix to the team streaming
handlers (_handle_model_response_stream and ahandle variant) that
was applied to the agent streaming handlers in 575d64b.

Without this, teams using thinking models (qwen3, deepseek-r1 via
OpenAI-compatible APIs) would leak <think> tags into streamed content.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@harshsinha03 harshsinha03 changed the title [fix] preserve content output in workflow steps with thinking mode enabled fix: extract <think> tags from streaming content in agent and team paths Mar 5, 2026
Copy link
Contributor

@harshsinha03 harshsinha03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! @hztBUAA Thanks for contributing

@Qxiaobei
Copy link

Qxiaobei commented Mar 5, 2026

I'm using Qwen3 with OpenAILike. When I open the thinking mode using Qwen3's thinking model, the Agno documentation doesn't provide a mechanism to automatically remove the <think> tag. The <think> content returned by the model is treated as a regular assistant message. In the agent loop, Agno treats the content between <think> and the <think> tag as 'role': 'assistant', places it in the context, and sends it back to the model. This affects my model calls, sometimes causing function calls to frequently get stuck in a loop of thinking and tool calls. How can I quickly resolve this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Workflows mode outputs no content.

4 participants