-
-
Notifications
You must be signed in to change notification settings - Fork 352
Add Extended Thinking support for reasoning models #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
## Summary
Adds support for Extended Thinking (also known as reasoning) across
Anthropic, Gemini, and OpenAI/Grok providers. This feature exposes the
model's internal reasoning process, allowing applications to access both
the thinking content and the final response.
## Usage
```ruby
chat = RubyLLM.chat(model: 'claude-opus-4-5-20251101')
.with_thinking(budget: :medium) # or :low, :high, or Integer
response = chat.ask('What is 15 * 23?')
response.thinking # => 'Let me break this down step by step...'
response.content # => 'The answer is 345.'
# Streaming with thinking
chat.ask('Solve this') do |chunk|
print chunk.thinking if chunk.thinking
print chunk.content
end
```
## Provider Support
- Anthropic: Uses thinking block with budget_tokens parameter
- Gemini 2.5: Uses thinkingConfig with thinkingBudget (tokens)
- Gemini 3: Uses thinkingConfig with thinkingLevel (low/medium/high)
- OpenAI/Grok: Uses reasoning_effort parameter (low/high)
Budget symbols (:low, :medium, :high) are translated to appropriate
provider-specific values. Integer budgets specify token counts directly.
## Changes
Core:
- Message: Added thinking and protected thinking_signature attributes
- Chat: Added with_thinking(budget:) and thinking_enabled? methods
- StreamAccumulator: Accumulates thinking content during streaming
- UnsupportedFeatureError: New error for unsupported feature requests
Providers:
- Anthropic: Full thinking support with signature for multi-turn
- Gemini: Supports both 2.5 (budget) and 3.0 (effort level) APIs
- OpenAI: Supports Grok models via reasoning_effort
- Bedrock/Mistral: Accept thinking parameter (no-op for compatibility)
ActiveRecord:
- Migration template includes thinking and thinking_signature columns
- ChatMethods: Added with_thinking delegation and persistence
- MessageMethods: Extracts thinking attributes in to_llm
Tests:
- 82 examples covering unit and integration tests
- VCR cassettes for claude-sonnet-4, claude-opus-4, claude-opus-4-5,
and gemini-2.5-flash
## Type of change
- [ ] Bug fix
- [x] New feature
- [ ] Breaking change
| payload | ||
| end | ||
|
|
||
| def grok_model?(model) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by this. Does OpenAI provide a model called "grok"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but RubyLLM routes to Grok via OpenRouter, which uses openai/chat.rb.
AI explanation:
- OpenRouter inherits from OpenAI: class OpenRouter < OpenAI (line 6 in openrouter.rb)
- Grok models are via OpenRouter: "provider": "openrouter" in models.json
- No dedicated xAI provider exists
So yes, Grok API calls via OpenRouter do use openai/chat.rb because OpenRouter inherits all of OpenAI's chat logic.
The grok_model? method in openai/chat.rb is there because:
- OpenRouter uses OpenAI-compatible API format
- When a Grok model is detected, the reasoning_effort parameter is added for thinking support
The naming is technically correct but could be confusing.
We added a clarifying comment to the method.
- Fix incorrect comment that said "OpenAI" instead of "Anthropic" - Replace .present? with && !.empty? to avoid ActiveSupport dependency in core library code Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Explains why Grok model detection exists in the OpenAI provider: Grok models are accessed via OpenRouter which inherits from OpenAI. Generated with Claude Code (https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
Summary
Adds support for Extended Thinking (also known as reasoning) across Anthropic, Gemini, and OpenAI/Grok providers. This feature exposes the model's internal reasoning process, allowing applications to access both the thinking content and the final response.
Usage
Provider Support
thinkingblock withbudget_tokensthinkingConfigwith budget or effort levelreasoning_effortparameterBudget symbols (
:low,:medium,:high) are translated to appropriate provider-specific values. Integer budgets specify token counts directly.Changes
Core:
Message: Addedthinkingand protectedthinking_signatureattributesChat: Addedwith_thinking(budget:)andthinking_enabled?methodsStreamAccumulator: Accumulates thinking content during streamingUnsupportedFeatureError: New error for unsupported feature requestsProviders:
ActiveRecord:
thinkingandthinking_signaturecolumnsChatMethods: Addedwith_thinkingdelegation and persistenceMessageMethods: Extracts thinking attributes into_llmDocumentation:
docs/_core_features/thinking.mdTests:
Type of change
Scope check
Quality check
overcommit --installand all hooks passbundle exec rspec(736 examples, 0 failures)API changes
with_thinking,thinking_enabled?,UnsupportedFeatureError)Related issues
Closes #551