Skip to content

[FEATURE] Better Thinking/Thinking Streaming support #551

@swombat

Description

@swombat

Scope check

  • This is core LLM communication (not application logic)
  • This benefits most users (not just my use case)
  • This can't be solved in application code with current RubyLLM
  • I read the Contributing Guide

Due diligence

  • I searched existing issues
  • I checked the documentation

What problem does this solve?

When using Anthropic's extended thinking feature via RubyLLM:

  1. Thinking content is not captured during streaming - The build_chunk method in providers/anthropic/streaming.rb only extracts data.dig('delta', 'text'), ignoring thinking_delta events.

  2. Thinking blocks are lost in response parsing - The extract_text_content method in providers/anthropic/chat.rb only extracts blocks where type == 'text', discarding thinking blocks.

  3. Conversation history breaks with thinking enabled - When thinking is enabled, Anthropic requires previous assistant messages to include their thinking blocks. Since we don't store/replay thinking content, multi-turn conversations fail with:

    messages.3.content.0.type: Expected `thinking` or `redacted_thinking`, but found `text`.
    When `thinking` is enabled, a final `assistant` message must start with a thinking block.
    
  4. No unified API for thinking across providers - Each provider has different thinking implementations, but RubyLLM doesn't abstract this.

Proposed solution

1. Add thinking Attribute to Message Class

File: lib/ruby_llm/message.rb

class Message
  attr_reader :role, :model_id, :tool_calls, :tool_call_id, :input_tokens, :output_tokens,
              :cached_tokens, :cache_creation_tokens, :raw, :thinking

  def initialize(options = {})
    # ... existing code ...
    @thinking = options[:thinking]
  end

  def to_h
    {
      # ... existing fields ...
      thinking: thinking
    }.compact
  end
end

2. Parse Thinking in Anthropic Streaming

File: lib/ruby_llm/providers/anthropic/streaming.rb

def build_chunk(data)
  Chunk.new(
    role: :assistant,
    model_id: extract_model_id(data),
    content: extract_content_delta(data),
    thinking: extract_thinking_delta(data),
    # ... other fields ...
  )
end

def extract_content_delta(data)
  return data.dig('delta', 'text') if data.dig('delta', 'type') == 'text_delta'
  nil
end

def extract_thinking_delta(data)
  return data.dig('delta', 'thinking') if data.dig('delta', 'type') == 'thinking_delta'
  nil
end

3. Parse Thinking in Anthropic Response

File: lib/ruby_llm/providers/anthropic/chat.rb

def parse_completion_response(response)
  data = response.body
  content_blocks = data['content'] || []

  text_content = extract_text_content(content_blocks)
  thinking_content = extract_thinking_content(content_blocks)
  tool_use_blocks = Tools.find_tool_uses(content_blocks)

  build_message(data, text_content, thinking_content, tool_use_blocks, response)
end

def extract_thinking_content(blocks)
  thinking_blocks = blocks.select { |c| c['type'] == 'thinking' }
  thinking_blocks.map { |c| c['thinking'] }.join
end

def build_message(data, content, thinking, tool_use_blocks, response)
  Message.new(
    # ... existing fields ...
    thinking: thinking.presence
  )
end

4. Include Thinking in Message Formatting

File: lib/ruby_llm/providers/anthropic/chat.rb

When formatting assistant messages for the API, include thinking blocks:

def format_basic_message(msg)
  content_blocks = []

  # Include thinking block if present (or redacted_thinking placeholder)
  if msg.thinking.present?
    content_blocks << { type: 'thinking', thinking: msg.thinking }
  elsif msg.role == :assistant && thinking_enabled?
    # Placeholder for redacted thinking when original thinking is unavailable
    content_blocks << { type: 'redacted_thinking', data: '' }
  end

  # Add text content
  content_blocks.concat(Media.format_content(msg.content))

  {
    role: convert_role(msg.role),
    content: content_blocks
  }
end

5. Add Gemini Thinking Support

File: lib/ruby_llm/providers/gemini/streaming.rb

Parse thought parts in streaming:

def build_chunk(data)
  parts = data.dig('candidates', 0, 'content', 'parts') || []

  text_parts = parts.reject { |p| p['thought'] }.map { |p| p['text'] }.join
  thought_parts = parts.select { |p| p['thought'] }.map { |p| p['text'] }.join

  Chunk.new(
    content: text_parts.presence,
    thinking: thought_parts.presence,
    # ... other fields ...
  )
end

File: lib/ruby_llm/providers/gemini/chat.rb

Handle thought signatures for multi-turn conversations.

6. Add xAI Grok Reasoning Support

File: lib/ruby_llm/providers/openai/streaming.rb (xAI uses OpenAI-compatible API)

def build_chunk(data)
  Chunk.new(
    content: data.dig('choices', 0, 'delta', 'content'),
    thinking: data.dig('choices', 0, 'delta', 'reasoning_content'),
    # ... other fields ...
  )
end

7. Add Thinking Configuration Helpers

File: lib/ruby_llm/chat.rb (or new file lib/ruby_llm/thinking.rb)

module RubyLLM
  class Chat
    def with_thinking(budget: 10000, effort: nil)
      case detect_provider
      when :anthropic
        with_params(
          thinking: { type: "enabled", budget_tokens: budget },
          max_tokens: budget + 8000
        )
      when :gemini
        with_params(
          thinking_config: { include_thoughts: true, thinking_budget: budget }
        )
      when :openai
        with_params(
          reasoning: { effort: effort || "medium" },
          max_completion_tokens: budget
        )
      when :xai
        with_params(
          reasoning: { enabled: true },
          reasoning_effort: effort || "high"
        )
      end
    end
  end
end

Why this belongs in RubyLLM

This is core functionality (handling thinking from various providers). "Thinking" is a fairly core features of advanced models and many users of RubyLLM, particularly those setting up conversational frameworks, will want to make use of it. Currently it doesn't work reliably across providers, and is undocumented.

In this patch we will both improve the functionality of the thinking (and thinking streaming), and document it (currently there is no mention of the words "think" or "thought" in the documentation).

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions