Scope check
Due diligence
What problem does this solve?
When using streaming with tools, Claude models (and sometimes others) generate intermediate "thinking" text before calling a tool. For example:
chat = RubyLLM.chat(model: 'claude-opus-4-6').with_tool(MyTool)
chat.ask("What's in this document?") do |chunk|
print chunk.content if chunk.content
end
Output:
Let me retrieve the content of the document. ← intermediate text (unwanted)
[tool executes]
Based on the document, here is the summary... ← final answer (wanted)
The streaming block receives chunks from all rounds — including the intermediate text from rounds that end with tool calls. The Streaming with Tools docs describe this as expected behavior.
In a web application (Rails + Turbo Streams), this intermediate text gets broadcast to the browser. For chat UIs, users see "Let me retrieve..." flash on screen before the final answer.
Workaround limitations
Using on_tool_call callback to detect and reset:
chat.on_tool_call { stream_handler.reset }
chat.ask(messages) do |chunk|
stream_handler.call(chunk.content) if chunk.content
end
This has inherent limitations:
- Flicker:
on_tool_call fires after streaming completes for that round, so intermediate text is already broadcast to the client before the reset
- No way to distinguish rounds: the streaming block has no context about which round it's in
on_end_message is post-hoc: by the time it fires with response.tool_call?, the text has already been yielded through the block
LiteLLM has the same issue.
Why this belongs in RubyLLM
- This is a streaming behavior concern, not application logic
- Every developer building a chat UI with tools faces this problem
Scope check
Due diligence
What problem does this solve?
When using streaming with tools, Claude models (and sometimes others) generate intermediate "thinking" text before calling a tool. For example:
Output:
The streaming block receives chunks from all rounds — including the intermediate text from rounds that end with tool calls. The Streaming with Tools docs describe this as expected behavior.
In a web application (Rails + Turbo Streams), this intermediate text gets broadcast to the browser. For chat UIs, users see "Let me retrieve..." flash on screen before the final answer.
Workaround limitations
Using
on_tool_callcallback to detect and reset:This has inherent limitations:
on_tool_callfires after streaming completes for that round, so intermediate text is already broadcast to the client before the reseton_end_messageis post-hoc: by the time it fires withresponse.tool_call?, the text has already been yielded through the blockLiteLLM has the same issue.
Why this belongs in RubyLLM