-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Description
The HeliconeLanguageModel's message conversion function strips cache_control and providerOptions from all messages when building the request body for the Helicone gateway. This means Anthropic prompt caching never works through the AI SDK provider, even though the Helicone gateway itself supports it (as documented at https://docs.helicone.ai/gateway/concepts/prompt-caching).
Reproduction
import { createHelicone } from '@helicone/ai-sdk-provider';
import { generateText } from 'ai';
const helicone = createHelicone({ apiKey: process.env.HELICONE_API_KEY });
const result = await generateText({
model: helicone('claude-4.6-sonnet/anthropic'),
messages: [
{
role: 'system',
content: 'You are a helpful assistant.',
providerOptions: {
anthropic: { cacheControl: { type: 'ephemeral' } },
},
},
{
role: 'user',
content: [
{
type: 'text',
text: 'Very long context that should be cached...',
providerOptions: {
anthropic: { cacheControl: { type: 'ephemeral' } },
},
},
{ type: 'text', text: 'What is this about?' },
],
},
],
});Expected: The gateway receives cache_control directives on messages/content blocks and forwards them to Anthropic, enabling prompt caching.
Actual: The message converter drops all providerOptions and cache_control. The gateway receives plain messages with no caching directives. Every request pays full input token cost — no cache reads ever occur.
Root Cause
The message conversion function (minified as Bt in the built output) only copies a subset of fields:
System messages — only copies { role, content }, drops message-level providerOptions:
case "system":
e.push({ role: "system", content: t.content });
break;User text content blocks — only copies { type, text }, drops block-level providerOptions and cache_control:
case "text":
return { type: "text", text: o.text };Fix
The converter should preserve cache_control and/or providerOptions on both messages and content blocks. Since the Helicone gateway translates to the OpenAI-compatible format with cache_control extensions (via @helicone/helpers), the fix should map providerOptions.anthropic.cacheControl to the cache_control field that the gateway already supports:
For system messages:
case "system": {
const msg = { role: "system", content: t.content };
if (t.providerOptions?.anthropic?.cacheControl) {
msg.cache_control = t.providerOptions.anthropic.cacheControl;
}
e.push(msg);
break;
}For user text content blocks:
case "text": {
const block = { type: "text", text: o.text };
if (o.providerOptions?.anthropic?.cacheControl) {
block.cache_control = o.providerOptions.anthropic.cacheControl;
}
return block;
}This would align the AI SDK provider with the gateway's existing cache_control support documented at https://docs.helicone.ai/gateway/concepts/prompt-caching.
Impact
Without this fix, all Anthropic requests through the AI SDK provider pay full input token cost on every request. For applications making sequential requests with shared context (agents, pipelines, multi-turn conversations), this can result in 5-10x higher costs than necessary.
Environment
@helicone/ai-sdk-provider: 1.0.12ai(Vercel AI SDK): 6.0.6- Model:
claude-4.6-sonnet/anthropicthroughai-gateway.helicone.ai