Skip to content

Commit b4800c5

Browse files
committed
feat: add prompt caching support for Claude and Nova models
Add comprehensive prompt caching support with flexible control options: Features: - ENV variable control (ENABLE_PROMPT_CACHING, default: false) - Per-request control via extra_body.prompt_caching - Pattern-based model detection (Claude, Nova) - Token limit warnings (Nova 20K limit) - OpenAI-compatible response format (prompt_tokens_details.cached_tokens) Supported models: - Claude 3+ models (anthropic.claude-*) - Nova models (amazon.nova-*) - Auto-detection prevents breaking unsupported models Implementation: - System prompts caching via extra_body.prompt_caching.system - Messages caching via extra_body.prompt_caching.messages - Non-streaming and streaming modes - Compatible with reasoning, thinking, and tool calls
1 parent 7756532 commit b4800c5

File tree

6 files changed

+377
-40
lines changed

6 files changed

+377
-40
lines changed

README.md

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ If you find this GitHub repository useful, please consider giving it a free star
2929
- [x] Support Application Inference Profiles (**new**)
3030
- [x] Support Reasoning (**new**)
3131
- [x] Support Interleaved thinking (**new**)
32+
- [x] Support Prompt Caching (**new**)
3233

3334
Please check [Usage Guide](./docs/Usage.md) for more details about how to use the new APIs.
3435

@@ -221,6 +222,78 @@ print(completion.choices[0].message.content)
221222

222223
For more information about creating and managing application inference profiles, see the [Amazon Bedrock User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/inference-profiles-create.html).
223224

225+
### Prompt Caching
226+
227+
This proxy now supports **Prompt Caching** for Claude and Nova models, which can reduce costs by up to 90% and latency by up to 85% for workloads with repeated prompts.
228+
229+
**Supported Models:**
230+
- Claude 3+ models (Claude 3.5 Haiku, Claude 3.7 Sonnet, Claude 4, Claude 4.5, etc.)
231+
- Nova models (Nova Micro, Nova Lite, Nova Pro, Nova Premier)
232+
233+
**Enabling Prompt Caching:**
234+
235+
You can enable prompt caching in two ways:
236+
237+
1. **Globally via Environment Variable** (set in ECS Task Definition or Lambda):
238+
```bash
239+
ENABLE_PROMPT_CACHING=true
240+
```
241+
242+
2. **Per-request via `extra_body`** :
243+
244+
**Python SDK:**
245+
```python
246+
from openai import OpenAI
247+
248+
client = OpenAI()
249+
250+
# Cache system prompts
251+
response = client.chat.completions.create(
252+
model="us.anthropic.claude-3-7-sonnet-20250219-v1:0",
253+
messages=[
254+
{"role": "system", "content": "You are an expert assistant with knowledge of..."},
255+
{"role": "user", "content": "Help me with this task"}
256+
],
257+
extra_body={
258+
"prompt_caching": {"system": True}
259+
}
260+
)
261+
262+
# Check cache hit
263+
if response.usage.prompt_tokens_details:
264+
cached_tokens = response.usage.prompt_tokens_details.cached_tokens
265+
print(f"Cached tokens: {cached_tokens}")
266+
```
267+
268+
**cURL:**
269+
```bash
270+
curl $OPENAI_BASE_URL/chat/completions \
271+
-H "Content-Type: application/json" \
272+
-H "Authorization: Bearer $OPENAI_API_KEY" \
273+
-d '{
274+
"model": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
275+
"messages": [
276+
{"role": "system", "content": "Long system prompt..."},
277+
{"role": "user", "content": "Question"}
278+
],
279+
"extra_body": {
280+
"prompt_caching": {"system": true}
281+
}
282+
}'
283+
```
284+
285+
**Cache Options:**
286+
- `"prompt_caching": {"system": true}` - Cache system prompts
287+
- `"prompt_caching": {"messages": true}` - Cache user messages
288+
- `"prompt_caching": {"system": true, "messages": true}` - Cache both
289+
290+
**Requirements:**
291+
- Prompt must be ≥1,024 tokens to enable caching
292+
- Cache TTL is 5 minutes (resets on each cache hit)
293+
- Nova models have a 20,000 token caching limit
294+
295+
For more information, see the [Amazon Bedrock Prompt Caching Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html).
296+
224297
## Other Examples
225298

226299
### LangChain

deployment/BedrockProxy.template

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,13 @@ Parameters:
1111
Type: String
1212
Default: anthropic.claude-3-sonnet-20240229-v1:0
1313
Description: The default model ID, please make sure the model ID is supported in the current region
14+
EnablePromptCaching:
15+
Type: String
16+
Default: "false"
17+
AllowedValues:
18+
- "true"
19+
- "false"
20+
Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings.
1421
Resources:
1522
VPCB9E5F0B4:
1623
Type: AWS::EC2::VPC
@@ -184,6 +191,8 @@ Resources:
184191
DEFAULT_EMBEDDING_MODEL: cohere.embed-multilingual-v3
185192
ENABLE_CROSS_REGION_INFERENCE: "true"
186193
ENABLE_APPLICATION_INFERENCE_PROFILES: "true"
194+
ENABLE_PROMPT_CACHING:
195+
Ref: EnablePromptCaching
187196
MemorySize: 1024
188197
PackageType: Image
189198
Role:

deployment/BedrockProxyFargate.template

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,13 @@ Parameters:
1111
Type: String
1212
Default: anthropic.claude-3-sonnet-20240229-v1:0
1313
Description: The default model ID, please make sure the model ID is supported in the current region
14+
EnablePromptCaching:
15+
Type: String
16+
Default: "false"
17+
AllowedValues:
18+
- "true"
19+
- "false"
20+
Description: Enable prompt caching for supported models (Claude, Nova). When enabled, adds cachePoint to system prompts and messages for cost savings.
1421
Resources:
1522
VPCB9E5F0B4:
1623
Type: AWS::EC2::VPC
@@ -251,6 +258,9 @@ Resources:
251258
Value: "true"
252259
- Name: ENABLE_APPLICATION_INFERENCE_PROFILES
253260
Value: "true"
261+
- Name: ENABLE_PROMPT_CACHING
262+
Value:
263+
Ref: EnablePromptCaching
254264
Essential: true
255265
Image:
256266
Ref: ContainerImageUri

0 commit comments

Comments
 (0)