-
-
Notifications
You must be signed in to change notification settings - Fork 359
Description
Scope check
- This is core LLM communication (not application logic)
- This benefits most users (not just my use case)
- This can't be solved in application code with current RubyLLM
- I read the Contributing Guide
Due diligence
- I searched existing issues
- I checked the documentation
What problem does this solve?
When using Anthropic's extended thinking feature via RubyLLM:
-
Thinking content is not captured during streaming - The
build_chunkmethod inproviders/anthropic/streaming.rbonly extractsdata.dig('delta', 'text'), ignoringthinking_deltaevents. -
Thinking blocks are lost in response parsing - The
extract_text_contentmethod inproviders/anthropic/chat.rbonly extracts blocks wheretype == 'text', discarding thinking blocks. -
Conversation history breaks with thinking enabled - When thinking is enabled, Anthropic requires previous assistant messages to include their thinking blocks. Since we don't store/replay thinking content, multi-turn conversations fail with:
messages.3.content.0.type: Expected `thinking` or `redacted_thinking`, but found `text`. When `thinking` is enabled, a final `assistant` message must start with a thinking block. -
No unified API for thinking across providers - Each provider has different thinking implementations, but RubyLLM doesn't abstract this.
Proposed solution
1. Add thinking Attribute to Message Class
File: lib/ruby_llm/message.rb
class Message
attr_reader :role, :model_id, :tool_calls, :tool_call_id, :input_tokens, :output_tokens,
:cached_tokens, :cache_creation_tokens, :raw, :thinking
def initialize(options = {})
# ... existing code ...
@thinking = options[:thinking]
end
def to_h
{
# ... existing fields ...
thinking: thinking
}.compact
end
end2. Parse Thinking in Anthropic Streaming
File: lib/ruby_llm/providers/anthropic/streaming.rb
def build_chunk(data)
Chunk.new(
role: :assistant,
model_id: extract_model_id(data),
content: extract_content_delta(data),
thinking: extract_thinking_delta(data),
# ... other fields ...
)
end
def extract_content_delta(data)
return data.dig('delta', 'text') if data.dig('delta', 'type') == 'text_delta'
nil
end
def extract_thinking_delta(data)
return data.dig('delta', 'thinking') if data.dig('delta', 'type') == 'thinking_delta'
nil
end3. Parse Thinking in Anthropic Response
File: lib/ruby_llm/providers/anthropic/chat.rb
def parse_completion_response(response)
data = response.body
content_blocks = data['content'] || []
text_content = extract_text_content(content_blocks)
thinking_content = extract_thinking_content(content_blocks)
tool_use_blocks = Tools.find_tool_uses(content_blocks)
build_message(data, text_content, thinking_content, tool_use_blocks, response)
end
def extract_thinking_content(blocks)
thinking_blocks = blocks.select { |c| c['type'] == 'thinking' }
thinking_blocks.map { |c| c['thinking'] }.join
end
def build_message(data, content, thinking, tool_use_blocks, response)
Message.new(
# ... existing fields ...
thinking: thinking.presence
)
end4. Include Thinking in Message Formatting
File: lib/ruby_llm/providers/anthropic/chat.rb
When formatting assistant messages for the API, include thinking blocks:
def format_basic_message(msg)
content_blocks = []
# Include thinking block if present (or redacted_thinking placeholder)
if msg.thinking.present?
content_blocks << { type: 'thinking', thinking: msg.thinking }
elsif msg.role == :assistant && thinking_enabled?
# Placeholder for redacted thinking when original thinking is unavailable
content_blocks << { type: 'redacted_thinking', data: '' }
end
# Add text content
content_blocks.concat(Media.format_content(msg.content))
{
role: convert_role(msg.role),
content: content_blocks
}
end5. Add Gemini Thinking Support
File: lib/ruby_llm/providers/gemini/streaming.rb
Parse thought parts in streaming:
def build_chunk(data)
parts = data.dig('candidates', 0, 'content', 'parts') || []
text_parts = parts.reject { |p| p['thought'] }.map { |p| p['text'] }.join
thought_parts = parts.select { |p| p['thought'] }.map { |p| p['text'] }.join
Chunk.new(
content: text_parts.presence,
thinking: thought_parts.presence,
# ... other fields ...
)
endFile: lib/ruby_llm/providers/gemini/chat.rb
Handle thought signatures for multi-turn conversations.
6. Add xAI Grok Reasoning Support
File: lib/ruby_llm/providers/openai/streaming.rb (xAI uses OpenAI-compatible API)
def build_chunk(data)
Chunk.new(
content: data.dig('choices', 0, 'delta', 'content'),
thinking: data.dig('choices', 0, 'delta', 'reasoning_content'),
# ... other fields ...
)
end7. Add Thinking Configuration Helpers
File: lib/ruby_llm/chat.rb (or new file lib/ruby_llm/thinking.rb)
module RubyLLM
class Chat
def with_thinking(budget: 10000, effort: nil)
case detect_provider
when :anthropic
with_params(
thinking: { type: "enabled", budget_tokens: budget },
max_tokens: budget + 8000
)
when :gemini
with_params(
thinking_config: { include_thoughts: true, thinking_budget: budget }
)
when :openai
with_params(
reasoning: { effort: effort || "medium" },
max_completion_tokens: budget
)
when :xai
with_params(
reasoning: { enabled: true },
reasoning_effort: effort || "high"
)
end
end
end
endWhy this belongs in RubyLLM
This is core functionality (handling thinking from various providers). "Thinking" is a fairly core features of advanced models and many users of RubyLLM, particularly those setting up conversational frameworks, will want to make use of it. Currently it doesn't work reliably across providers, and is undocumented.
In this patch we will both improve the functionality of the thinking (and thinking streaming), and document it (currently there is no mention of the words "think" or "thought" in the documentation).