Skip to content

Commit 59927b8

Browse files
Merge pull request #295 from Portkey-AI/prompt-caching-bedrock
2 parents 249a11d + e0857b2 commit 59927b8

File tree

3 files changed

+217
-1
lines changed

3 files changed

+217
-1
lines changed

docs.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -257,7 +257,8 @@
257257
"integrations/llms/bedrock/aws-bedrock",
258258
"integrations/llms/bedrock/files",
259259
"integrations/llms/bedrock/batches",
260-
"integrations/llms/bedrock/fine-tuning"
260+
"integrations/llms/bedrock/fine-tuning",
261+
"integrations/llms/bedrock/prompt-caching"
261262
]
262263
},
263264
"integrations/llms/aws-sagemaker",

integrations/llms/anthropic/prompt-caching.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,6 +179,8 @@ Anthropic currently has certain restrictions on prompt caching, like:
179179

180180
For more, refer to Anthropic's prompt caching documentation [here](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching).
181181

182+
183+
182184
## Seeing Cache Results in Portkey
183185

184186
Portkey automatically calculate the correct pricing for your prompt caching requests & responses based on Anthropic's calculations here:
Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
---
2+
title: 'Prompt Caching on Bedrock'
3+
---
4+
5+
Prompt caching on Amazon Bedrock lets you cache specific portions of your requests for repeated use. This feature significantly reduces inference response latency and input token costs by allowing the model to skip recomputation of previously processed content.
6+
7+
With Portkey, you can easily implement Amazon Bedrock's prompt caching through our OpenAI-compliant unified API and prompt templates.
8+
9+
## Model Support
10+
11+
Amazon Bedrock prompt caching is generally available with the following models:
12+
13+
<Info>
14+
**Currently Supported Models:**
15+
- Claude 3.7 Sonnet
16+
- Claude 3.5 Haiku
17+
- Amazon Nova Micro
18+
- Amazon Nova Lite
19+
- Amazon Nova Pro
20+
21+
Customers who were given access to Claude 3.5 Sonnet v2 during the prompt caching preview will retain their access, but no additional customers will be granted access to prompt caching on the Claude 3.5 Sonnet v2 model.
22+
</Info>
23+
24+
## How Bedrock Prompt Caching Works
25+
26+
When using prompt caching, you define **cache checkpoints** - markers that indicate parts of your prompt to cache. These cached sections must be static between requests; any alterations will result in a cache miss.
27+
28+
<Note>
29+
You can also use Bedrock Prompt Caching Feature with Portkey's Prompt Templates.
30+
</Note>
31+
32+
33+
34+
## Implementation Examples
35+
36+
Here's how to implement prompt caching with Portkey:
37+
38+
<CodeGroup>
39+
40+
```javascript NodeJS
41+
import Portkey from 'portkey-ai'
42+
43+
const portkey = new Portkey({
44+
apiKey: "PORTKEY_API_KEY", // defaults to process.env["PORTKEY_API_KEY"]
45+
virtualKey: "VIRTUAL_KEY" // Your Bedrock Virtual Key
46+
})
47+
48+
const chatCompletion = await portkey.chat.completions.create({
49+
messages: [
50+
{ "role": 'system', "content": [
51+
{
52+
"type":"text","text":"You are a helpful assistant"
53+
},
54+
{
55+
"type":"text","text":"This is a large document I want to cache...",
56+
"cache_control": {"type": "ephemeral"}
57+
}
58+
]},
59+
{ "role": 'user', "content": 'Summarize the above document for me in 20 words' }
60+
],
61+
model: 'anthropic.claude-3-7-sonnet-20250219-v1:0'
62+
});
63+
64+
console.log(chatCompletion.choices[0].message.content);
65+
```
66+
67+
```python Python
68+
from portkey_ai import Portkey
69+
70+
portkey = Portkey(
71+
api_key="PORTKEY_API_KEY",
72+
virtual_key="BEDROCK_VIRTUAL_KEY",
73+
)
74+
75+
chat_completion = portkey.chat.completions.create(
76+
messages= [
77+
{ "role": 'system', "content": [
78+
{
79+
"type":"text","text":"You are a helpful assistant"
80+
},
81+
{
82+
"type":"text","text":"This is a large document I want to cache...",
83+
"cache_control": {"type": "ephemeral"}
84+
}
85+
]},
86+
{ "role": 'user', "content": 'Summarize the above document in 20 words' }
87+
],
88+
model= 'anthropic.claude-3-7-sonnet-20250219-v1:0',
89+
)
90+
91+
print(chat_completion.choices[0].message.content)
92+
```
93+
94+
```javascript OpenAI NodeJS
95+
import OpenAI from "openai";
96+
import { PORTKEY_GATEWAY_URL, createHeaders } from "portkey-ai";
97+
98+
const openai = new OpenAI({
99+
apiKey: "BEDROCK_API_KEY",
100+
baseURL: PORTKEY_GATEWAY_URL,
101+
defaultHeaders: createHeaders({
102+
provider: "bedrock",
103+
apiKey: "PORTKEY_API_KEY",
104+
}),
105+
});
106+
107+
const chatCompletion = await openai.chat.completions.create({
108+
messages: [
109+
{ "role": 'system', "content": [
110+
{
111+
"type":"text","text":"You are a helpful assistant"
112+
},
113+
{
114+
"type":"text","text":"This is a large document I want to cache...",
115+
"cache_control": {"type": "ephemeral"}
116+
}
117+
]},
118+
{ "role": 'user', "content": 'Summarize the above document for me in 20 words' }
119+
],
120+
model: 'anthropic.claude-3-7-sonnet-20250219-v1:0',
121+
});
122+
123+
console.log(chatCompletion.choices[0].message.content);
124+
```
125+
126+
```python OpenAI Python
127+
from openai import OpenAI
128+
from portkey_ai import PORTKEY_GATEWAY_URL, createHeaders
129+
130+
client = OpenAI(
131+
api_key="BEDROCK_API_KEY",
132+
base_url=PORTKEY_GATEWAY_URL,
133+
default_headers=createHeaders(
134+
api_key="PORTKEY_API_KEY",
135+
provider="bedrock",
136+
)
137+
)
138+
139+
chat_completion = client.chat.completions.create(
140+
messages= [
141+
{ "role": 'system', "content": [
142+
{
143+
"type":"text","text":"You are a helpful assistant"
144+
},
145+
{
146+
"type":"text","text":"This is a large document I want to cache...",
147+
"cache_control": {"type": "ephemeral"}
148+
}
149+
]},
150+
{ "role": 'user', "content": 'Summarize the above document in 20 words' }
151+
],
152+
model= 'anthropic.claude-3-7-sonnet-20250219-v1:0',
153+
)
154+
155+
print(chat_completion.choices[0].message.content)
156+
```
157+
158+
```sh REST API
159+
curl https://api.portkey.ai/v1/chat/completions \
160+
-H "Content-Type: application/json" \
161+
-H "x-portkey-virtual-key: $BEDROCK_VIRTUAL_KEY" \
162+
-H "x-portkey-api-key: $PORTKEY_API_KEY" \
163+
-d '{
164+
"model": "anthropic.claude-3-7-sonnet-20250219-v1:0",
165+
"messages": [
166+
{ "role": "system", "content": [
167+
{
168+
"type":"text","text":"You are a helpful assistant"
169+
},
170+
{
171+
"type":"text","text":"This is a large document I want to cache...",
172+
"cache_control": {"type": "ephemeral"}
173+
}
174+
]},
175+
{ "role": "user", "content": "Summarize the above document for me in 20 words" }
176+
]
177+
}'
178+
```
179+
180+
</CodeGroup>
181+
182+
## Supported Features and Limitations
183+
184+
<Info>
185+
**Supported Features**
186+
- Text prompts and images embedded within text prompts
187+
- Multiple cache checkpoints per request
188+
- Caching in system prompts, messages, and tools fields (model-dependent)
189+
</Info>
190+
191+
### Supported Models and Limits
192+
193+
Below is a detailed table of supported models, their minimum token requirements, maximum cache checkpoints, and fields that support caching:
194+
195+
| Model | Model ID | Min tokens per checkpoint | Max checkpoints per request | Cacheable fields |
196+
|-------|----------|---------------------------|--------------------------|-----------------|
197+
| Claude 3.7 Sonnet | anthropic.claude-3-7-sonnet-20250219-v1:0 | 1,024 | 4 | system, messages, tools |
198+
| Claude 3.5 Haiku | anthropic.claude-3-5-haiku-20241022-v1:0 | 2,048 | 4 | system, messages, tools |
199+
| Amazon Nova Micro | amazon.nova-micro-v1:0 | 1,000 | 4 | system, messages |
200+
| Amazon Nova Lite | amazon.nova-lite-v1:0 | 1,000 | 4 | system, messages |
201+
| Amazon Nova Pro | amazon.nova-pro-v1:0 | 1,000 | 4 | system, messages |
202+
203+
<Note>
204+
- The Amazon Nova models support a maximum of 32k tokens for prompt caching.
205+
- For Claude models, tools caching is fully supported.
206+
- Tools caching is not supported for Amazon Nova models.
207+
</Note>
208+
209+
210+
## Related Resources
211+
<Card title="AWS Bedrock Prompt Caching Docs" href="https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html">
212+
For more detailed information on Bedrock prompt caching, refer to:
213+
</Card>

0 commit comments

Comments
 (0)