fix: extract <think> tags from streaming content in agent and team paths#6654
fix: extract <think> tags from streaming content in agent and team paths#6654hztBUAA wants to merge 7 commits intoagno-agi:mainfrom
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
|
Thanks for the review and feedback. I am following up on this PR now and will either push the requested changes or reply point-by-point shortly. |
|
Quick follow-up: I am reviewing the feedback and will update this PR shortly. |
Apply the same <think> tag extraction fix to the team streaming handlers (_handle_model_response_stream and ahandle variant) that was applied to the agent streaming handlers in 575d64b. Without this, teams using thinking models (qwen3, deepseek-r1 via OpenAI-compatible APIs) would leak <think> tags into streamed content. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
harshsinha03
left a comment
There was a problem hiding this comment.
LGTM! @hztBUAA Thanks for contributing
|
I'm using Qwen3 with OpenAILike. When I open the thinking mode using Qwen3's thinking model, the Agno documentation doesn't provide a mechanism to automatically remove the |
Summary
Fixes workflow steps producing empty content when using thinking models (e.g., qwen3 with
enable_thinking=True) served via OpenAI-compatible APIs.Root cause: In the streaming path, models that embed reasoning in
<think>...</think>tags (qwen3, deepseek-r1 via OpenAI-compatible endpoints like vLLM, DashScope) accumulate raw tags inrun_response.content. The non-streaming path already extracted these tags at the provider level (_parse_provider_response), but the streaming path had no post-streaming extraction. This caused downstream workflow steps to receive content with embedded<think>tags, confusing the next model and producing empty output.Fix: Add
extract_thinking_content()after the streaming loop completes in both the agent and team response handlers (sync + async = 4 locations). A guard condition ("</think>" in content) ensures zero cost for non-thinking models.Fixes #6305
Changes
agno/agent/_response.py<think>tags after streaming inhandle_model_response_stream(sync) andahandle_model_response_stream(async)agno/team/_response.py_handle_model_response_stream(sync) and_ahandle_model_response_stream(async)tests/unit/workflow/test_thinking_content.pyextract_thinking_content,_process_step_output,_prepare_message, streaming extraction simulation, end-to-end workflow flowtests/unit/team/test_thinking_content.pyTeamRunOutput, edge cases (None, non-string, incomplete tags, native reasoning preserved)Type of change
Checklist
./scripts/format.shand./scripts/validate.sh)Testing
Additional Notes
thinkingfield (no tags in content). The fix targets OpenAI-compatible API endpoints where thinking is embedded as<think>tags.reasoning_contentfrom native model fields is never overwritten by tag-based extraction.