Skip to content

Commit 869a755

Browse files
committed
Raw Content Blocks, Anthropic Prompt Caching, and Cached Token Tracking
RubyLLM formats your messages properly for each provider. However, sometimes you may want to change the content payload to a custom one, e.g., to enable Anthropic Prompt Caching. Raw Content Blocks (RubyLLM::Content::Raw and RubyLLM::Providers::Anthropic::Content) enable you to do just that. Also added cached token tracking.
1 parent c5c0027 commit 869a755

File tree

38 files changed

+727
-85
lines changed

38 files changed

+727
-85
lines changed

docs/_advanced/rails.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ After reading this guide, you will know:
2828
* How to use `acts_as_chat` and `acts_as_message` with your models
2929
* How to persist AI model metadata in your database with `acts_as_model`
3030
* How to send file attachments to AI models with ActiveStorage
31+
* How to store raw provider payloads (Anthropic prompt caching, etc.)
3132
* How to integrate streaming responses with Hotwire/Turbo Streams
3233
* How to customize the persistence behavior for validation-focused scenarios
3334

@@ -87,6 +88,7 @@ rails db:migrate
8788

8889
Your Rails app is now AI-ready!
8990

91+
9092
### Adding a Chat UI
9193

9294
Want a ready-to-use chat interface? Run the chat UI generator:
@@ -148,6 +150,29 @@ class Message < ApplicationRecord
148150
end
149151
```
150152

153+
### Working with Raw Provider Payloads, Anthropic Prompt Caching
154+
{: .d-inline-block }
155+
156+
v1.9.0+
157+
{: .label .label-green }
158+
159+
Providers like Anthropic expose advanced features (prompt caching, fine-grained metadata) by embedding rich structures inside each prompt block. Use `RubyLLM::Content::Raw` to persist those blocks alongside your conversation history:
160+
161+
```ruby
162+
raw_block = RubyLLM::Content::Raw.new([
163+
{ type: 'text', text: 'Reusable analysis prompt', cache_control: { type: 'ephemeral' } },
164+
{ type: 'text', text: "Today's request: #{summary}" }
165+
])
166+
167+
chat = Chat.create!(model: 'claude-sonnet-4-5')
168+
chat.ask(raw_block)
169+
```
170+
171+
The v1.9 schema adds a `content_raw` column so raw payloads live alongside the plain-text `content` field. When you load messages via `acts_as_message`, RubyLLM reconstructs the original `Content::Raw` automatically.
172+
173+
> Existing apps: run `rails generate ruby_llm:upgrade_to_v1_9` to add cached-token tracking and raw content storage columns introduced in v1.9.0. New apps will get the proper columns from the install generator.
174+
{: .note }
175+
151176
### Configuring RubyLLM
152177

153178
Set up your API keys and other configuration in the initializer:

docs/_core_features/chat.md

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,8 @@ puts response.content
4949
# The response object contains metadata
5050
puts "Model Used: #{response.model_id}"
5151
puts "Tokens Used: #{response.input_tokens} input, #{response.output_tokens} output"
52+
puts "Cached Prompt Tokens: #{response.cached_tokens}" # v1.9.0+
53+
puts "Cache Writes: #{response.cache_creation_tokens}" # v1.9.0+
5254
```
5355

5456
The `ask` method adds your message to the conversation history with the `:user` role, sends the entire conversation history to the AI provider, and returns a `RubyLLM::Message` object containing the assistant's response.
@@ -307,6 +309,88 @@ puts JSON.parse(response.content)
307309
> Available parameters vary by provider and model. Always consult the provider's documentation for supported features. RubyLLM passes these parameters through without validation, so incorrect parameters may cause API errors. Parameters from `with_params` take precedence over RubyLLM's defaults, allowing you to override any aspect of the request payload.
308310
{: .warning }
309311

312+
## Raw Content Blocks
313+
{: .d-inline-block }
314+
315+
v1.9.0+
316+
{: .label .label-green }
317+
318+
{: .d-inline-block }
319+
320+
v1.9.0+
321+
{: .label .label-green }
322+
323+
Most of the time you can rely on RubyLLM to format messages for each provider. When you need to send a custom payload as content, wrap it in `RubyLLM::Content::Raw`. The block is forwarded verbatim, with no additional processing.
324+
325+
```ruby
326+
raw_block = RubyLLM::Content::Raw.new([
327+
{ type: 'text', text: 'Reusable analysis prompt' },
328+
{ type: 'text', text: "Today's request: #{summary}" }
329+
])
330+
331+
chat = RubyLLM.chat
332+
chat.add_message(role: :system, content: raw_block)
333+
chat.ask(raw_block)
334+
```
335+
336+
Use raw blocks sparingly: they bypass cross-provider safeguards, so it is your responsibility to ensure the payload matches the provider's expectations. `Chat#ask`, `Chat#add_message`, tool results, and streaming accumulators all understand `Content::Raw` values.
337+
338+
### Anthropic Prompt Caching
339+
{: .d-inline-block }
340+
341+
v1.9.0+
342+
{: .label .label-green }
343+
344+
One use case for Raw Content Blocks is Anthropic Prompt Caching.
345+
346+
Anthropic lets you mark individual prompt blocks for caching, which can dramatically reduce costs on long conversations. RubyLLM provides a convenience builder that returns a `Content::Raw` instance with the proper structure:
347+
348+
```ruby
349+
system_block = RubyLLM::Providers::Anthropic::Content.new(
350+
"You are a release-notes assistant. Always group changes by subsystem.",
351+
cache: true # shorthand for cache_control: { type: 'ephemeral' }
352+
)
353+
354+
chat = RubyLLM.chat(model: '{{ site.models.anthropic_latest }}')
355+
chat.add_message(role: :system, content: system_block)
356+
357+
response = chat.ask(
358+
RubyLLM::Providers::Anthropic::Content.new(
359+
"Summarize the API changes in this diff.",
360+
cache_control: { type: 'ephemeral', ttl: '1h' }
361+
)
362+
)
363+
```
364+
365+
Need something even more custom? Build the payload manually and wrap it in `Content::Raw`:
366+
367+
```ruby
368+
raw_prompt = RubyLLM::Content::Raw.new([
369+
{ type: 'text', text: File.read('/a/large/file'), cache_control: { type: 'ephemeral' } },
370+
{ type: 'text', text: "Today's request: #{summary}" }
371+
])
372+
373+
chat.ask(raw_prompt)
374+
```
375+
376+
The same idea applies to tool definitions:
377+
378+
```ruby
379+
class ChangelogTool < RubyLLM::Tool
380+
description "Formats commits into human-readable changelog entries."
381+
param :commits, type: :array, desc: "List of commits to summarize"
382+
383+
with_params cache_control: { type: 'ephemeral' }
384+
385+
def execute(commits:)
386+
# ...
387+
end
388+
end
389+
```
390+
391+
Providers that do not understand these extra fields silently ignore them, so you can reuse the same tools across models.
392+
See the [Tool Provider Parameters]({% link _core_features/tools.md %}#provider-specific-parameters) section for more detail.
393+
310394
### Custom HTTP Headers
311395

312396
Some providers offer beta features or special capabilities through custom HTTP headers. The `with_headers` method lets you add these headers to your API requests while maintaining RubyLLM's security model.
@@ -502,9 +586,13 @@ response = chat.ask "Explain the Ruby Global Interpreter Lock (GIL)."
502586

503587
input_tokens = response.input_tokens # Tokens in the prompt sent TO the model
504588
output_tokens = response.output_tokens # Tokens in the response FROM the model
589+
cached_tokens = response.cached_tokens # Tokens served from the provider's prompt cache (if supported) - v1.9.0+
590+
cache_creation_tokens = response.cache_creation_tokens # Tokens written to the cache (Anthropic/Bedrock) - v1.9.0+
505591

506592
puts "Input Tokens: #{input_tokens}"
507593
puts "Output Tokens: #{output_tokens}"
594+
puts "Cached Prompt Tokens: #{cached_tokens}" # v1.9.0+
595+
puts "Cache Creation Tokens: #{cache_creation_tokens}" # v1.9.0+
508596
puts "Total Tokens for this turn: #{input_tokens + output_tokens}"
509597

510598
# Estimate cost for this turn
@@ -523,6 +611,8 @@ total_conversation_tokens = chat.messages.sum { |msg| (msg.input_tokens || 0) +
523611
puts "Total Conversation Tokens: #{total_conversation_tokens}"
524612
```
525613

614+
`cached_tokens` captures the portion of the prompt served from the provider's cache. OpenAI reports this value automatically for prompts over 1024 tokens, while Anthropic and Bedrock/Claude expose both cache hits and cache writes. When the provider does not send cache data the attributes remain `nil`, so the example above falls back to zero for display. Available from v1.9+
615+
526616
Refer to the [Working with Models Guide]({% link _advanced/models.md %}) for details on accessing model-specific pricing.
527617

528618
## Chat Event Handlers

docs/_core_features/tools.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -86,6 +86,30 @@ end
8686
> ```
8787
{: .note }
8888
89+
### Provider-Specific Parameters
90+
{: .d-inline-block }
91+
92+
v1.9.0+
93+
{: .label .label-green }
94+
95+
Some providers allow you to attach extra metadata to tool definitions (for example, Anthropic's `cache_control` directive for prompt caching). Use `with_params` on your tool class to declare these once and RubyLLM will merge them into the API payload when the provider understands them.
96+
97+
```ruby
98+
class TodoTool < RubyLLM::Tool
99+
description "Adds a task to the shared TODO list"
100+
param :title, desc: "Human-friendly task description"
101+
102+
with_params cache_control: { type: 'ephemeral' }
103+
104+
def execute(title:)
105+
Todo.create!(title:)
106+
"Added “#{title}” to the list."
107+
end
108+
end
109+
```
110+
111+
Provider-specific tool parameters are passed through verbatim. Currently implemented only for the Anthropic provider, other providers will ignore `with_params` for now. Use `RUBYLLM_DEBUG=true` and keep an eye on your logs when rolling out new metadata.
112+
89113
## Returning Rich Content from Tools
90114

91115
Tools can return `RubyLLM::Content` objects with file attachments, allowing you to pass images, documents, or other files from your tools to the AI model:

lib/generators/ruby_llm/install/templates/create_messages_migration.rb.tt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,11 @@ class Create<%= message_model_name.gsub('::', '').pluralize %> < ActiveRecord::M
33
create_table :<%= message_table_name %> do |t|
44
t.string :role, null: false
55
t.text :content
6+
t.json :content_raw
67
t.integer :input_tokens
78
t.integer :output_tokens
9+
t.integer :cached_tokens
10+
t.integer :cache_creation_tokens
811
t.timestamps
912
end
1013

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
class AddRubyLlmV19Columns < ActiveRecord::Migration<%= migration_version %>
2+
def change
3+
unless column_exists?(:<%= message_table_name %>, :cached_tokens)
4+
add_column :<%= message_table_name %>, :cached_tokens, :integer
5+
end
6+
7+
unless column_exists?(:<%= message_table_name %>, :cache_creation_tokens)
8+
add_column :<%= message_table_name %>, :cache_creation_tokens, :integer
9+
end
10+
11+
unless column_exists?(:<%= message_table_name %>, :content_raw)
12+
add_column :<%= message_table_name %>, :content_raw, :json
13+
end
14+
end
15+
end
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
# frozen_string_literal: true
2+
3+
require 'rails/generators'
4+
require 'rails/generators/active_record'
5+
require_relative '../generator_helpers'
6+
7+
module RubyLLM
8+
module Generators
9+
# Generator to add v1.9 columns (cached tokens + raw content support) to existing apps.
10+
class UpgradeToV19Generator < Rails::Generators::Base
11+
include Rails::Generators::Migration
12+
include RubyLLM::GeneratorHelpers
13+
14+
namespace 'ruby_llm:upgrade_to_v1_9'
15+
source_root File.expand_path('templates', __dir__)
16+
17+
argument :model_mappings, type: :array, default: [], banner: 'message:MessageName'
18+
19+
desc 'Adds cached token columns and raw content storage fields introduced in v1.9.0'
20+
21+
def self.next_migration_number(dirname)
22+
::ActiveRecord::Generators::Base.next_migration_number(dirname)
23+
end
24+
25+
def create_migration_file
26+
parse_model_mappings
27+
28+
migration_template 'add_v1_9_message_columns.rb.tt',
29+
'db/migrate/add_ruby_llm_v1_9_columns.rb',
30+
migration_version: migration_version,
31+
message_table_name: message_table_name
32+
end
33+
34+
def show_next_steps
35+
say_status :success, 'Upgrade prepared!', :green
36+
say <<~INSTRUCTIONS
37+
38+
Next steps:
39+
1. Review the generated migration
40+
2. Run: rails db:migrate
41+
3. Restart your application server
42+
43+
📚 See the v1.9.0 release notes for details on cached token tracking and raw content support.
44+
45+
INSTRUCTIONS
46+
end
47+
end
48+
end
49+
end

lib/ruby_llm/active_record/chat_methods.rb

Lines changed: 41 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -174,8 +174,16 @@ def on_tool_result(...)
174174
end
175175

176176
def create_user_message(content, with: nil)
177-
message_record = messages_association.create!(role: :user, content: content)
177+
content_text, attachments, content_raw = prepare_content_for_storage(content)
178+
179+
message_record = messages_association.build(role: :user)
180+
message_record.content = content_text
181+
message_record.content_raw = content_raw if message_record.respond_to?(:content_raw=)
182+
message_record.save!
183+
178184
persist_content(message_record, with) if with.present?
185+
persist_content(message_record, attachments) if attachments.present?
186+
179187
message_record
180188
end
181189

@@ -235,28 +243,25 @@ def persist_new_message
235243
@message = messages_association.create!(role: :assistant, content: '')
236244
end
237245

238-
def persist_message_completion(message) # rubocop:disable Metrics/PerceivedComplexity
246+
# rubocop:disable Metrics/PerceivedComplexity
247+
def persist_message_completion(message)
239248
return unless message
240249

241250
tool_call_id = find_tool_call_id(message.tool_call_id) if message.tool_call_id
242251

243252
transaction do
244-
content = message.content
245-
attachments_to_persist = nil
246-
247-
if content.is_a?(RubyLLM::Content)
248-
attachments_to_persist = content.attachments if content.attachments.any?
249-
content = content.text
250-
elsif content.is_a?(Hash) || content.is_a?(Array)
251-
content = content.to_json
252-
end
253+
content_text, attachments_to_persist, content_raw = prepare_content_for_storage(message.content)
253254

254255
attrs = {
255256
role: message.role,
256-
content: content,
257+
content: content_text,
257258
input_tokens: message.input_tokens,
258259
output_tokens: message.output_tokens
259260
}
261+
attrs[:cached_tokens] = message.cached_tokens if @message.has_attribute?(:cached_tokens)
262+
if @message.has_attribute?(:cache_creation_tokens)
263+
attrs[:cache_creation_tokens] = message.cache_creation_tokens
264+
end
260265

261266
# Add model association dynamically
262267
attrs[self.class.model_association_name] = model_association
@@ -266,12 +271,15 @@ def persist_message_completion(message) # rubocop:disable Metrics/PerceivedCompl
266271
attrs[parent_tool_call_assoc.foreign_key] = tool_call_id
267272
end
268273

269-
@message.update!(attrs)
274+
@message.assign_attributes(attrs)
275+
@message.content_raw = content_raw if @message.respond_to?(:content_raw=)
276+
@message.save!
270277

271278
persist_content(@message, attachments_to_persist) if attachments_to_persist
272279
persist_tool_calls(message.tool_calls) if message.tool_calls.present?
273280
end
274281
end
282+
# rubocop:enable Metrics/PerceivedComplexity
275283

276284
def persist_tool_calls(tool_calls)
277285
tool_calls.each_value do |tool_call|
@@ -331,6 +339,26 @@ def convert_to_active_storage_format(source)
331339
RubyLLM.logger.warn "Failed to process attachment #{source}: #{e.message}"
332340
nil
333341
end
342+
343+
def prepare_content_for_storage(content)
344+
attachments = nil
345+
content_raw = nil
346+
content_text = content
347+
348+
case content
349+
when RubyLLM::Content::Raw
350+
content_raw = content.value
351+
content_text = nil
352+
when RubyLLM::Content
353+
attachments = content.attachments if content.attachments.any?
354+
content_text = content.text
355+
when Hash, Array
356+
content_raw = content
357+
content_text = nil
358+
end
359+
360+
[content_text, attachments, content_raw]
361+
end
334362
end
335363
end
336364
end

0 commit comments

Comments
 (0)