[FEATURE] Better Thinking/Thinking Streaming support

### Scope check

- [x] This is **core LLM communication** (not application logic)
- [x] This **benefits most users** (not just my use case)
- [x] This **can't be solved in application code** with current RubyLLM
- [x] I read the [Contributing Guide](https://github.com/crmne/ruby_llm/blob/main/CONTRIBUTING.md)

### Due diligence

- [x] I searched existing issues
- [x] I checked the documentation

### What problem does this solve?

When using Anthropic's extended thinking feature via RubyLLM:

1. **Thinking content is not captured during streaming** - The `build_chunk` method in `providers/anthropic/streaming.rb` only extracts `data.dig('delta', 'text')`, ignoring `thinking_delta` events.

2. **Thinking blocks are lost in response parsing** - The `extract_text_content` method in `providers/anthropic/chat.rb` only extracts blocks where `type == 'text'`, discarding thinking blocks.

3. **Conversation history breaks with thinking enabled** - When thinking is enabled, Anthropic requires previous assistant messages to include their thinking blocks. Since we don't store/replay thinking content, multi-turn conversations fail with:
   ```
   messages.3.content.0.type: Expected `thinking` or `redacted_thinking`, but found `text`.
   When `thinking` is enabled, a final `assistant` message must start with a thinking block.
   ```

4. **No unified API for thinking across providers** - Each provider has different thinking implementations, but RubyLLM doesn't abstract this.

### Proposed solution

### 1. Add `thinking` Attribute to Message Class

**File:** `lib/ruby_llm/message.rb`

```ruby
class Message
  attr_reader :role, :model_id, :tool_calls, :tool_call_id, :input_tokens, :output_tokens,
              :cached_tokens, :cache_creation_tokens, :raw, :thinking

  def initialize(options = {})
    # ... existing code ...
    @thinking = options[:thinking]
  end

  def to_h
    {
      # ... existing fields ...
      thinking: thinking
    }.compact
  end
end
```

### 2. Parse Thinking in Anthropic Streaming

**File:** `lib/ruby_llm/providers/anthropic/streaming.rb`

```ruby
def build_chunk(data)
  Chunk.new(
    role: :assistant,
    model_id: extract_model_id(data),
    content: extract_content_delta(data),
    thinking: extract_thinking_delta(data),
    # ... other fields ...
  )
end

def extract_content_delta(data)
  return data.dig('delta', 'text') if data.dig('delta', 'type') == 'text_delta'
  nil
end

def extract_thinking_delta(data)
  return data.dig('delta', 'thinking') if data.dig('delta', 'type') == 'thinking_delta'
  nil
end
```

### 3. Parse Thinking in Anthropic Response

**File:** `lib/ruby_llm/providers/anthropic/chat.rb`

```ruby
def parse_completion_response(response)
  data = response.body
  content_blocks = data['content'] || []

  text_content = extract_text_content(content_blocks)
  thinking_content = extract_thinking_content(content_blocks)
  tool_use_blocks = Tools.find_tool_uses(content_blocks)

  build_message(data, text_content, thinking_content, tool_use_blocks, response)
end

def extract_thinking_content(blocks)
  thinking_blocks = blocks.select { |c| c['type'] == 'thinking' }
  thinking_blocks.map { |c| c['thinking'] }.join
end

def build_message(data, content, thinking, tool_use_blocks, response)
  Message.new(
    # ... existing fields ...
    thinking: thinking.presence
  )
end
```

### 4. Include Thinking in Message Formatting

**File:** `lib/ruby_llm/providers/anthropic/chat.rb`

When formatting assistant messages for the API, include thinking blocks:

```ruby
def format_basic_message(msg)
  content_blocks = []

  # Include thinking block if present (or redacted_thinking placeholder)
  if msg.thinking.present?
    content_blocks << { type: 'thinking', thinking: msg.thinking }
  elsif msg.role == :assistant && thinking_enabled?
    # Placeholder for redacted thinking when original thinking is unavailable
    content_blocks << { type: 'redacted_thinking', data: '' }
  end

  # Add text content
  content_blocks.concat(Media.format_content(msg.content))

  {
    role: convert_role(msg.role),
    content: content_blocks
  }
end
```

### 5. Add Gemini Thinking Support

**File:** `lib/ruby_llm/providers/gemini/streaming.rb`

Parse thought parts in streaming:

```ruby
def build_chunk(data)
  parts = data.dig('candidates', 0, 'content', 'parts') || []

  text_parts = parts.reject { |p| p['thought'] }.map { |p| p['text'] }.join
  thought_parts = parts.select { |p| p['thought'] }.map { |p| p['text'] }.join

  Chunk.new(
    content: text_parts.presence,
    thinking: thought_parts.presence,
    # ... other fields ...
  )
end
```

**File:** `lib/ruby_llm/providers/gemini/chat.rb`

Handle thought signatures for multi-turn conversations.

### 6. Add xAI Grok Reasoning Support

**File:** `lib/ruby_llm/providers/openai/streaming.rb` (xAI uses OpenAI-compatible API)

```ruby
def build_chunk(data)
  Chunk.new(
    content: data.dig('choices', 0, 'delta', 'content'),
    thinking: data.dig('choices', 0, 'delta', 'reasoning_content'),
    # ... other fields ...
  )
end
```

### 7. Add Thinking Configuration Helpers

**File:** `lib/ruby_llm/chat.rb` (or new file `lib/ruby_llm/thinking.rb`)

```ruby
module RubyLLM
  class Chat
    def with_thinking(budget: 10000, effort: nil)
      case detect_provider
      when :anthropic
        with_params(
          thinking: { type: "enabled", budget_tokens: budget },
          max_tokens: budget + 8000
        )
      when :gemini
        with_params(
          thinking_config: { include_thoughts: true, thinking_budget: budget }
        )
      when :openai
        with_params(
          reasoning: { effort: effort || "medium" },
          max_completion_tokens: budget
        )
      when :xai
        with_params(
          reasoning: { enabled: true },
          reasoning_effort: effort || "high"
        )
      end
    end
  end
end
```

### Why this belongs in RubyLLM

This is core functionality (handling thinking from various providers). "Thinking" is a fairly core features of advanced models and many users of RubyLLM, particularly those setting up conversational frameworks, will want to make use of it. Currently it doesn't work reliably across providers, and is undocumented.

In this patch we will both improve the functionality of the thinking (and thinking streaming), and document it (currently there is no mention of the words "think" or "thought" in the documentation).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEATURE] Better Thinking/Thinking Streaming support #551

Scope check

Due diligence

What problem does this solve?

Proposed solution

1. Add `thinking` Attribute to Message Class

2. Parse Thinking in Anthropic Streaming

3. Parse Thinking in Anthropic Response

4. Include Thinking in Message Formatting

5. Add Gemini Thinking Support

6. Add xAI Grok Reasoning Support

7. Add Thinking Configuration Helpers

Why this belongs in RubyLLM

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[FEATURE] Better Thinking/Thinking Streaming support #551

Description

Scope check

Due diligence

What problem does this solve?

Proposed solution

1. Add thinking Attribute to Message Class

2. Parse Thinking in Anthropic Streaming

3. Parse Thinking in Anthropic Response

4. Include Thinking in Message Formatting

5. Add Gemini Thinking Support

6. Add xAI Grok Reasoning Support

7. Add Thinking Configuration Helpers

Why this belongs in RubyLLM

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

1. Add `thinking` Attribute to Message Class