[FEATURE] Option to suppress intermediate text chunks during tool call streaming rounds

## Scope check

- [x] This is **core LLM communication** (not application logic)
- [x] This **benefits most users** (not just my use case)
- [x] This **can't be solved in application code** with current RubyLLM (see workaround limitations below)
- [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)

## Due diligence

- [x] I searched existing issues
- [x] I checked the documentation

## What problem does this solve?

When using streaming with tools, Claude models (and sometimes others) generate intermediate "thinking" text before calling a tool. For example:

```ruby
chat = RubyLLM.chat(model: 'claude-opus-4-6').with_tool(MyTool)

chat.ask("What's in this document?") do |chunk|
  print chunk.content if chunk.content
end
```

Output:
```
Let me retrieve the content of the document.    ← intermediate text (unwanted)
[tool executes]
Based on the document, here is the summary...  ← final answer (wanted)
```

The streaming block receives chunks from **all rounds** — including the intermediate text from rounds that end with tool calls. The [Streaming with Tools docs](https://rubyllm.com/guides/streaming#streaming-with-tools) describe this as expected behavior.

In a web application (Rails + Turbo Streams), this intermediate text gets broadcast to the browser. For chat UIs, users see "Let me retrieve..." flash on screen before the final answer.

### Workaround limitations

Using `on_tool_call` callback to detect and reset:

```ruby
chat.on_tool_call { stream_handler.reset }

chat.ask(messages) do |chunk|
  stream_handler.call(chunk.content) if chunk.content
end
```

This has inherent limitations:
- **Flicker**: `on_tool_call` fires *after* streaming completes for that round, so intermediate text is already broadcast to the client before the reset
- **No way to distinguish rounds**: the streaming block has no context about which round it's in
- **`on_end_message` is post-hoc**: by the time it fires with `response.tool_call?`, the text has already been yielded through the block

LiteLLM has the [same issue](https://github.com/BerriAI/litellm/issues/12996).

## Why this belongs in RubyLLM

- This is a **streaming behavior** concern, not application logic
- Every developer building a chat UI with tools faces this problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Option to suppress intermediate text chunks during tool call streaming rounds #725

Scope check

Due diligence

What problem does this solve?

Workaround limitations

Why this belongs in RubyLLM

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Option to suppress intermediate text chunks during tool call streaming rounds #725

Description

Scope check

Due diligence

What problem does this solve?

Workaround limitations

Why this belongs in RubyLLM

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions