crmne · swombat · Jan 2, 2026 · Jan 3, 2026 · Jan 8, 2026 · Jan 8, 2026
diff --git a/docs/_core_features/thinking.md b/docs/_core_features/thinking.md
@@ -0,0 +1,234 @@
+---
+layout: default
+title: Extended Thinking
+nav_order: 8
+description: Access the model's internal reasoning process with Extended Thinking
+redirect_from:
+  - /guides/thinking
+  - /guides/reasoning
+---
+
+# {{ page.title }}
+{: .no_toc }
+
+{{ page.description }}
+{: .fs-6 .fw-300 }
+
+## Table of contents
+{: .no_toc .text-delta }
+
+1. TOC
+{:toc}
+
+---
+
+After reading this guide, you will know:
+
+*   How to enable Extended Thinking for supported models.
+*   How to access thinking content in responses.
+*   How to stream thinking content in real-time.
+*   How thinking works across different providers.
+*   How to persist thinking content with ActiveRecord.
+
+## What is Extended Thinking?
+
+Extended Thinking (also known as "reasoning") is a feature that exposes the model's internal reasoning process. When enabled, models will "think through" problems step-by-step before providing their final response. This is particularly useful for:
+
+*   Complex mathematical or logical problems
+*   Multi-step reasoning tasks
+*   Debugging and understanding model behavior
+*   Applications where transparency in reasoning is valuable
+
+## Enabling Extended Thinking
+
+Use the `with_thinking` method to enable Extended Thinking on a chat:
+
+```ruby
+chat = RubyLLM.chat(model: 'claude-opus-4-5-20251101')
+  .with_thinking(budget: :medium)
+
+response = chat.ask("What is 15 * 23? Show your reasoning.")
+
+puts "Thinking: #{response.thinking}"
+# => "Let me break this down step by step. 15 * 23 = 15 * 20 + 15 * 3..."
+
+puts "Answer: #{response.content}"
+# => "The answer is 345."
+```
+
+### Budget Options
+
+The `budget` parameter controls how much "thinking" the model should do:
+
+| Budget | Description |
+|--------|-------------|
+| `:low` | Minimal thinking, faster responses |
+| `:medium` | Balanced thinking (default) |
+| `:high` | Maximum thinking, most thorough |
+| Integer | Specific token budget (provider-dependent) |
+
+```ruby
+# Symbol budgets
+chat.with_thinking(budget: :low)
+chat.with_thinking(budget: :medium)
+chat.with_thinking(budget: :high)
+
+# Integer budget (tokens)
+chat.with_thinking(budget: 10_000)
+```
+
+### Checking if Thinking is Enabled
+
+```ruby
+chat = RubyLLM.chat(model: 'claude-opus-4-5-20251101')
+
+chat.thinking_enabled?  # => false
+
+chat.with_thinking(budget: :medium)
+
+chat.thinking_enabled?  # => true
+```
+
+## Streaming with Thinking
+
+When streaming, thinking content is available on each chunk:
+
+```ruby
+chat = RubyLLM.chat(model: 'claude-opus-4-5-20251101')
+  .with_thinking(budget: :medium)
+
+chat.ask("Solve this step by step: What is 127 * 43?") do |chunk|
+  # Print thinking content as it streams
+  if chunk.thinking
+    print "[Thinking] #{chunk.thinking}"
+  end
+
+  # Print response content
+  if chunk.content
+    print chunk.content
+  end
+end
+```
+
+### Separating Thinking from Response
+
+For UI applications, you may want to display thinking separately:
+
+```ruby
+thinking_content = ""
+response_content = ""
+
+chat.ask("Complex question here...") do |chunk|
+  thinking_content << chunk.thinking if chunk.thinking
+  response_content << chunk.content if chunk.content
+
+  # Update UI with separated content
+  update_thinking_panel(thinking_content)
+  update_response_panel(response_content)
+end
+```
+
+## Supported Models
+
+Extended Thinking requires models with the `reasoning` capability. Use `with_thinking` only on supported models:
+
+```ruby
+# Check if a model supports thinking
+model = RubyLLM::Models.find('claude-opus-4-5-20251101')
+model.supports?('reasoning')  # => true
+
+# Using with_thinking on unsupported models raises an error
+chat = RubyLLM.chat(model: 'gpt-4o')
+chat.with_thinking(budget: :medium)
+# => raises RubyLLM::UnsupportedFeatureError
+```
+
+### Provider-Specific Behavior
+
+| Provider | Models | Implementation |
+|----------|--------|----------------|
+| Anthropic | claude-opus-4-*, claude-sonnet-4-* | `thinking` block with `budget_tokens` |
+| Gemini | gemini-2.5-*, gemini-3-* | `thinkingConfig` with budget or effort level |
+| OpenAI/Grok | grok-* models | `reasoning_effort` parameter |
+
+Budget symbols are automatically translated to provider-specific values:
+
+| Symbol | Anthropic | Gemini 2.5 | Gemini 3 | Grok |
+|--------|-----------|------------|----------|------|
+| `:low` | 1,024 tokens | 1,024 tokens | "low" | "low" |
+| `:medium` | 10,000 tokens | 8,192 tokens | "medium" | "high" |
+| `:high` | 32,000 tokens | 24,576 tokens | "high" | "high" |
+
+## Multi-Turn Conversations
+
+Extended Thinking works seamlessly in multi-turn conversations. The model maintains context of its previous reasoning:
+
+```ruby
+chat = RubyLLM.chat(model: 'claude-opus-4-5-20251101')
+  .with_thinking(budget: :medium)
+
+response1 = chat.ask("What is 15 * 23?")
+puts response1.thinking  # Shows step-by-step calculation
+
+response2 = chat.ask("Now multiply that result by 2")
+puts response2.thinking  # References previous calculation
+puts response2.content   # => "690"
+```
+
+## ActiveRecord Integration
+
+When using `acts_as_chat` and `acts_as_message`, thinking content is automatically persisted:
+
+```ruby
+# Migration (generated automatically with new installs)
+# t.text :thinking
+# t.text :thinking_signature
+
+# Using thinking with persisted chats
+chat_record = Chat.create!
+chat_record.with_thinking(budget: :medium)
+
+response = chat_record.ask("Explain quantum entanglement")
+
+# Thinking is saved to the message record
+last_message = chat_record.messages.last
+last_message.thinking  # => "Let me break down quantum entanglement..."
+```
+
+### Upgrading Existing Installations
+
+If you have an existing RubyLLM installation, add the thinking columns:
+
+```ruby
+class AddThinkingToMessages < ActiveRecord::Migration[7.0]
+  def change
+    add_column :messages, :thinking, :text
+    add_column :messages, :thinking_signature, :text
+  end
+end
+```
+
+## Error Handling
+
+```ruby
+begin
+  chat = RubyLLM.chat(model: 'gpt-4o')  # Doesn't support thinking
+  chat.with_thinking(budget: :medium)
+rescue RubyLLM::UnsupportedFeatureError => e
+  puts "This model doesn't support Extended Thinking"
+  puts e.message  # => "Model 'gpt-4o' does not support extended thinking"
+end
+```
+
+## Best Practices
+
+1. **Choose appropriate budgets**: Use `:low` for simple tasks, `:high` for complex reasoning
+2. **Stream for long responses**: Thinking can be lengthy; streaming provides better UX
+3. **Don't always display thinking**: Consider whether users need to see the reasoning
+4. **Handle gracefully**: Check `thinking_enabled?` before relying on thinking content
+
+## Next Steps
+
+*   [Streaming Responses]({% link _core_features/streaming.md %})
+*   [Rails Integration]({% link _advanced/rails.md %})
+*   [Error Handling]({% link _advanced/error-handling.md %})
diff --git a/lib/generators/ruby_llm/install/templates/create_messages_migration.rb.tt b/lib/generators/ruby_llm/install/templates/create_messages_migration.rb.tt
@@ -4,6 +4,8 @@ class Create<%= message_model_name.gsub('::', '').pluralize %> < ActiveRecord::M
       t.string :role, null: false
       t.text :content
       t.json :content_raw
+      t.text :thinking
+      t.text :thinking_signature
       t.integer :input_tokens
       t.integer :output_tokens
       t.integer :cached_tokens

diff --git a/lib/ruby_llm/active_record/chat_methods.rb b/lib/ruby_llm/active_record/chat_methods.rb
@@ -124,6 +124,11 @@ def with_temperature(...)
         self
       end
 
+      def with_thinking(...)
+        to_llm.with_thinking(...)
+        self
+      end
+
       def with_params(...)
         to_llm.with_params(...)
         self
@@ -262,6 +267,8 @@ def persist_message_completion(message)
           if @message.has_attribute?(:cache_creation_tokens)
             attrs[:cache_creation_tokens] = message.cache_creation_tokens
           end
+          attrs[:thinking] = message.thinking if @message.has_attribute?(:thinking)
+          attrs[:thinking_signature] = Messages.signature_for(message) if @message.has_attribute?(:thinking_signature)
 
           # Add model association dynamically
           attrs[self.class.model_association_name] = model_association

diff --git a/lib/ruby_llm/active_record/message_methods.rb b/lib/ruby_llm/active_record/message_methods.rb
@@ -11,24 +11,39 @@ module MessageMethods
       end
 
       def to_llm
-        cached = has_attribute?(:cached_tokens) ? self[:cached_tokens] : nil
-        cache_creation = has_attribute?(:cache_creation_tokens) ? self[:cache_creation_tokens] : nil
-
         RubyLLM::Message.new(
           role: role.to_sym,
           content: extract_content,
+          thinking: thinking_value,
+          thinking_signature: thinking_signature_value,
           tool_calls: extract_tool_calls,
           tool_call_id: extract_tool_call_id,
           input_tokens: input_tokens,
           output_tokens: output_tokens,
-          cached_tokens: cached,
-          cache_creation_tokens: cache_creation,
+          cached_tokens: cached_value,
+          cache_creation_tokens: cache_creation_value,
           model_id: model_association&.model_id
         )
       end
 
       private
 
+      def thinking_value
+        has_attribute?(:thinking) ? self[:thinking] : nil
+      end
+
+      def thinking_signature_value
+        has_attribute?(:thinking_signature) ? self[:thinking_signature] : nil
+      end
+
+      def cached_value
+        has_attribute?(:cached_tokens) ? self[:cached_tokens] : nil
+      end
+
+      def cache_creation_value
+        has_attribute?(:cache_creation_tokens) ? self[:cache_creation_tokens] : nil
+      end
+
       def extract_tool_calls
         tool_calls_association.to_h do |tool_call|
           [

diff --git a/lib/ruby_llm/chat.rb b/lib/ruby_llm/chat.rb
@@ -22,6 +22,7 @@ def initialize(model: nil, provider: nil, assume_model_exists: false, context: n
       @params = {}
       @headers = {}
       @schema = nil
+      @thinking_budget = nil
       @on = {
         new_message: nil,
         end_message: nil,
@@ -67,6 +68,16 @@ def with_temperature(temperature)
       self
     end
 
+    def with_thinking(budget: :medium)
+      validate_thinking_support!
+      @thinking_budget = budget
+      self
+    end
+
+    def thinking_enabled?
+      !@thinking_budget.nil?
+    end
+
     def with_context(context)
       @context = context
       @config = context.config
@@ -130,6 +141,7 @@ def complete(&) # rubocop:disable Metrics/PerceivedComplexity
         params: @params,
         headers: @headers,
         schema: @schema,
+        thinking: @thinking_budget,
         &wrap_streaming_block(&)
       )
 
@@ -169,6 +181,18 @@ def instance_variables
 
     private
 
+    def validate_thinking_support!
+      return if @model.supports?('reasoning')
+      return if gemini_thinking_model?
+
+      raise UnsupportedFeatureError,
+            "Model '#{@model.id}' does not support extended thinking"
+    end
+
+    def gemini_thinking_model?
+      @model.id.to_s.match?(/gemini-[23]|gemini-2\.\d-.*thinking/)
+    end
+
     def wrap_streaming_block(&block)
       return nil unless block_given?
 

diff --git a/lib/ruby_llm/error.rb b/lib/ruby_llm/error.rb
@@ -18,6 +18,13 @@ class InvalidRoleError < StandardError; end
   class ModelNotFoundError < StandardError; end
   class UnsupportedAttachmentError < StandardError; end
 
+  # Error raised when a feature is not supported by a model
+  class UnsupportedFeatureError < Error
+    def initialize(message)
+      super(nil, message)
+    end
+  end
+
   # Error classes for different HTTP status codes
   class BadRequestError < Error; end
   class ForbiddenError < Error; end