Add Extended Thinking support for reasoning models #552

swombat · 2026-01-03T08:49:12Z

Summary

Adds support for Extended Thinking (also known as reasoning) across Anthropic, Gemini, and OpenAI/Grok providers. This feature exposes the model's internal reasoning process, allowing applications to access both the thinking content and the final response.

Usage

chat = RubyLLM.chat(model: 'claude-opus-4-5-20251101')
  .with_thinking(budget: :medium)  # or :low, :high, or Integer

response = chat.ask('What is 15 * 23?')
response.thinking  # => 'Let me break this down step by step...'
response.content   # => 'The answer is 345.'

# Streaming with thinking
chat.ask('Solve this') do |chunk|
  print chunk.thinking if chunk.thinking
  print chunk.content
end

Provider Support

Provider	Models	Implementation
Anthropic	claude-opus-4-, claude-sonnet-4-	`thinking` block with `budget_tokens`
Gemini	gemini-2.5-, gemini-3-	`thinkingConfig` with budget or effort level
OpenAI/Grok	grok-* models	`reasoning_effort` parameter

Budget symbols (:low, :medium, :high) are translated to appropriate provider-specific values. Integer budgets specify token counts directly.

Changes

Core:

Message: Added thinking and protected thinking_signature attributes
Chat: Added with_thinking(budget:) and thinking_enabled? methods
StreamAccumulator: Accumulates thinking content during streaming
UnsupportedFeatureError: New error for unsupported feature requests

Providers:

Anthropic: Full thinking support with signature for multi-turn
Gemini: Supports both 2.5 (budget) and 3.0 (effort level) APIs
OpenAI: Supports Grok models via reasoning_effort
Bedrock/Mistral: Accept thinking parameter (no-op for compatibility)

ActiveRecord:

Migration template includes thinking and thinking_signature columns
ChatMethods: Added with_thinking delegation and persistence
MessageMethods: Extracts thinking attributes in to_llm

Documentation:

New guide: docs/_core_features/thinking.md

Tests:

82 examples covering unit and integration tests
VCR cassettes for claude-sonnet-4, claude-opus-4, claude-opus-4-5, and gemini-2.5-flash

Type of change

Bug fix
New feature
Breaking change

Scope check

I read the Contributing Guide
This aligns with RubyLLM's focus on LLM communication
This isn't application-specific logic that belongs in user code
This benefits most users, not just my specific use case

Quality check

I ran overcommit --install and all hooks pass
I tested my changes thoroughly
For provider changes: Re-recorded VCR cassettes
All tests pass: bundle exec rspec (736 examples, 0 failures)
I updated documentation
I didn't modify auto-generated files manually (except adding 3 models for testing)

API changes

Breaking change
New public methods/classes (with_thinking, thinking_enabled?, UnsupportedFeatureError)
Changed method signatures
No API changes

Related issues

Closes #551

## Summary Adds support for Extended Thinking (also known as reasoning) across Anthropic, Gemini, and OpenAI/Grok providers. This feature exposes the model's internal reasoning process, allowing applications to access both the thinking content and the final response. ## Usage ```ruby chat = RubyLLM.chat(model: 'claude-opus-4-5-20251101') .with_thinking(budget: :medium) # or :low, :high, or Integer response = chat.ask('What is 15 * 23?') response.thinking # => 'Let me break this down step by step...' response.content # => 'The answer is 345.' # Streaming with thinking chat.ask('Solve this') do |chunk| print chunk.thinking if chunk.thinking print chunk.content end ``` ## Provider Support - Anthropic: Uses thinking block with budget_tokens parameter - Gemini 2.5: Uses thinkingConfig with thinkingBudget (tokens) - Gemini 3: Uses thinkingConfig with thinkingLevel (low/medium/high) - OpenAI/Grok: Uses reasoning_effort parameter (low/high) Budget symbols (:low, :medium, :high) are translated to appropriate provider-specific values. Integer budgets specify token counts directly. ## Changes Core: - Message: Added thinking and protected thinking_signature attributes - Chat: Added with_thinking(budget:) and thinking_enabled? methods - StreamAccumulator: Accumulates thinking content during streaming - UnsupportedFeatureError: New error for unsupported feature requests Providers: - Anthropic: Full thinking support with signature for multi-turn - Gemini: Supports both 2.5 (budget) and 3.0 (effort level) APIs - OpenAI: Supports Grok models via reasoning_effort - Bedrock/Mistral: Accept thinking parameter (no-op for compatibility) ActiveRecord: - Migration template includes thinking and thinking_signature columns - ChatMethods: Added with_thinking delegation and persistence - MessageMethods: Extracts thinking attributes in to_llm Tests: - 82 examples covering unit and integration tests - VCR cassettes for claude-sonnet-4, claude-opus-4, claude-opus-4-5, and gemini-2.5-flash ## Type of change - [ ] Bug fix - [x] New feature - [ ] Breaking change

tpaulshippy · 2026-01-03T21:59:00Z

lib/ruby_llm/providers/openai/chat.rb

          payload
        end

+        def grok_model?(model)


I'm a bit confused by this. Does OpenAI provide a model called "grok"?

No, but RubyLLM routes to Grok via OpenRouter, which uses openai/chat.rb.

AI explanation:

OpenRouter inherits from OpenAI: class OpenRouter < OpenAI (line 6 in openrouter.rb)

Grok models are via OpenRouter: "provider": "openrouter" in models.json

No dedicated xAI provider exists

So yes, Grok API calls via OpenRouter do use openai/chat.rb because OpenRouter inherits all of OpenAI's chat logic.

The grok_model? method in openai/chat.rb is there because:

OpenRouter uses OpenAI-compatible API format

When a Grok model is detected, the reasoning_effort parameter is added for thinking support

The naming is technically correct but could be confusing.

We added a clarifying comment to the method.

- Fix incorrect comment that said "OpenAI" instead of "Anthropic" - Replace .present? with && !.empty? to avoid ActiveSupport dependency in core library code Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

Explains why Grok model detection exists in the OpenAI provider: Grok models are accessed via OpenRouter which inherits from OpenAI. Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>

swombat added 2 commits January 2, 2026 10:46

Fix lint error

61a0830

tpaulshippy reviewed Jan 3, 2026

View reviewed changes

swombat and others added 2 commits January 8, 2026 22:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add Extended Thinking support for reasoning models #552

Add Extended Thinking support for reasoning models #552

Uh oh!

swombat commented Jan 3, 2026

Uh oh!

tpaulshippy Jan 3, 2026

Uh oh!

swombat Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Add Extended Thinking support for reasoning models #552

Are you sure you want to change the base?

Add Extended Thinking support for reasoning models #552

Uh oh!

Conversation

swombat commented Jan 3, 2026

Summary

Usage

Provider Support

Changes

Type of change

Scope check

Quality check

API changes

Related issues

Uh oh!

tpaulshippy Jan 3, 2026

Choose a reason for hiding this comment

Uh oh!

swombat Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants