Skip to content

Commit 8bf1d94

Browse files
committed
use anthropic_cache_messages
1 parent 0f0dd76 commit 8bf1d94

File tree

3 files changed

+132
-107
lines changed

3 files changed

+132
-107
lines changed

docs/models/anthropic.md

Lines changed: 58 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -85,20 +85,20 @@ Anthropic supports [prompt caching](https://docs.anthropic.com/en/docs/build-wit
8585
1. **Cache User Messages with [`CachePoint`][pydantic_ai.messages.CachePoint]**: Insert a `CachePoint` marker in your user messages to cache everything before it
8686
2. **Cache System Instructions**: Set [`AnthropicModelSettings.anthropic_cache_instructions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_instructions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly
8787
3. **Cache Tool Definitions**: Set [`AnthropicModelSettings.anthropic_cache_tool_definitions`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_tool_definitions] to `True` (uses 5m TTL by default) or specify `'5m'` / `'1h'` directly
88-
4. **Cache All (Convenience)**: Set [`AnthropicModelSettings.anthropic_cache_all`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_all] to `True` to automatically cache both system instructions and the last user message
88+
4. **Cache Last Message (Convenience)**: Set [`AnthropicModelSettings.anthropic_cache_messages`][pydantic_ai.models.anthropic.AnthropicModelSettings.anthropic_cache_messages] to `True` to automatically cache the last user message
8989

9090
You can combine multiple strategies for maximum savings:
9191

9292
```python {test="skip"}
9393
from pydantic_ai import Agent, CachePoint, RunContext
9494
from pydantic_ai.models.anthropic import AnthropicModelSettings
9595

96-
# Option 1: Use anthropic_cache_all for convenience (caches system + last message)
96+
# Option 1: Use anthropic_cache_messages for convenience (caches last message only)
9797
agent = Agent(
9898
'anthropic:claude-sonnet-4-5',
9999
system_prompt='Detailed instructions...',
100100
model_settings=AnthropicModelSettings(
101-
anthropic_cache_all=True, # Caches both system prompt and last message
101+
anthropic_cache_messages=True, # Caches the last user message
102102
),
103103
)
104104

@@ -159,35 +159,77 @@ async def main():
159159

160160
### Cache Point Limits
161161

162-
Anthropic enforces a maximum of 4 cache points per request. Pydantic AI automatically manages this limit:
162+
Anthropic enforces a maximum of 4 cache points per request. Pydantic AI automatically manages this limit to ensure your requests always comply without errors.
163163

164-
- **`anthropic_cache_all`**: Uses 2 cache points (system instructions + last message)
165-
- **`anthropic_cache_instructions`**: Uses 1 cache point
166-
- **`anthropic_cache_tool_definitions`**: Uses 1 cache point
167-
- **`CachePoint` markers**: Use remaining available cache points
164+
#### How Cache Points Are Allocated
168165

169-
When the total exceeds 4 cache points, Pydantic AI automatically removes cache points from **older messages** (keeping the most recent ones), ensuring your requests always comply with Anthropic's limits without errors.
166+
Cache points can be placed in three locations:
167+
168+
1. **System Prompt**: Via `anthropic_cache_instructions` setting (adds cache point to last system prompt block)
169+
2. **Tool Definitions**: Via `anthropic_cache_tool_definitions` setting (adds cache point to last tool definition)
170+
3. **Messages**: Via `CachePoint` markers or `anthropic_cache_messages` setting (adds cache points to message content)
171+
172+
Each setting uses **at most 1 cache point**, but you can combine them:
170173

171174
```python {test="skip"}
172175
from pydantic_ai import Agent, CachePoint
173176
from pydantic_ai.models.anthropic import AnthropicModelSettings
174177

178+
# Example: Using all 3 cache point sources
179+
agent = Agent(
180+
'anthropic:claude-sonnet-4-5',
181+
system_prompt='Detailed instructions...',
182+
model_settings=AnthropicModelSettings(
183+
anthropic_cache_instructions=True, # 1 cache point
184+
anthropic_cache_tool_definitions=True, # 1 cache point
185+
anthropic_cache_messages=True, # 1 cache point
186+
),
187+
)
188+
189+
@agent.tool_plain
190+
def my_tool() -> str:
191+
return 'result'
192+
193+
async def main():
194+
# This uses 3 cache points (instructions + tools + last message)
195+
# You can add 1 more CachePoint marker before hitting the limit
196+
result = await agent.run([
197+
'Context', CachePoint(), # 4th cache point - OK
198+
'Question'
199+
])
200+
```
201+
202+
#### Automatic Cache Point Limiting
203+
204+
When cache points from all sources (settings + `CachePoint` markers) exceed 4, Pydantic AI automatically removes excess cache points from **older message content** (keeping the most recent ones):
205+
206+
```python {test="skip"}
175207
agent = Agent(
176208
'anthropic:claude-sonnet-4-5',
177209
system_prompt='Instructions...',
178210
model_settings=AnthropicModelSettings(
179-
anthropic_cache_all=True, # Uses 2 cache points
211+
anthropic_cache_instructions=True, # 1 cache point
212+
anthropic_cache_tool_definitions=True, # 1 cache point
180213
),
181214
)
182215

216+
@agent.tool_plain
217+
def search() -> str:
218+
return 'data'
219+
183220
async def main():
184-
# Even with multiple CachePoint markers, only 2 more will be kept
185-
# (4 total limit - 2 from cache_all = 2 available)
221+
# Already using 2 cache points (instructions + tools)
222+
# Can add 2 more CachePoint markers (4 total limit)
186223
result = await agent.run([
187-
'Context 1', CachePoint(), # Will be kept
188-
'Context 2', CachePoint(), # Will be kept
189-
'Context 3', CachePoint(), # Automatically removed (oldest)
224+
'Context 1', CachePoint(), # Oldest - will be removed
225+
'Context 2', CachePoint(), # Will be kept (3rd point)
226+
'Context 3', CachePoint(), # Will be kept (4th point)
190227
'Question'
191228
])
192-
print(result.output)
229+
# Final cache points: instructions + tools + Context 2 + Context 3 = 4
193230
```
231+
232+
**Key Points**:
233+
- System and tool cache points are **always preserved**
234+
- Message cache points are removed from oldest to newest when limit is exceeded
235+
- This ensures critical caching (instructions/tools) is maintained while still benefiting from message-level caching

pydantic_ai_slim/pydantic_ai/models/anthropic.py

Lines changed: 53 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -169,18 +169,15 @@ class AnthropicModelSettings(ModelSettings, total=False):
169169
See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
170170
"""
171171

172-
anthropic_cache_all: bool | Literal['5m', '1h']
173-
"""Convenience setting to enable caching for both system instructions and the last user message.
172+
anthropic_cache_messages: bool | Literal['5m', '1h']
173+
"""Convenience setting to enable caching for the last user message.
174174
175-
When enabled, this automatically adds cache points to:
176-
1. The last system prompt block (system instructions)
177-
2. The last content block in the final user message
178-
179-
This is equivalent to setting both `anthropic_cache_instructions` and adding a cache point
180-
to the last message, but more convenient for common use cases.
175+
When enabled, this automatically adds a cache point to the last content block
176+
in the final user message, which is useful for caching conversation history
177+
or context in multi-turn conversations.
181178
If `True`, uses TTL='5m'. You can also specify '5m' or '1h' directly.
182179
183-
Note: Uses 2 of Anthropic's 4 available cache points per request. Any additional CachePoint
180+
Note: Uses 1 of Anthropic's 4 available cache points per request. Any additional CachePoint
184181
markers in messages will be automatically limited to respect the 4-cache-point maximum.
185182
See https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching for more information.
186183
"""
@@ -349,7 +346,7 @@ async def _messages_create(
349346
tool_choice = self._infer_tool_choice(tools, model_settings, model_request_parameters)
350347

351348
system_prompt, anthropic_messages = await self._map_message(messages, model_request_parameters, model_settings)
352-
349+
self._limit_cache_points(system_prompt, anthropic_messages, tools)
353350
try:
354351
extra_headers = self._map_extra_headers(beta_features, model_settings)
355352

@@ -392,7 +389,7 @@ async def _messages_count_tokens(
392389
tool_choice = self._infer_tool_choice(tools, model_settings, model_request_parameters)
393390

394391
system_prompt, anthropic_messages = await self._map_message(messages, model_request_parameters, model_settings)
395-
392+
self._limit_cache_points(system_prompt, anthropic_messages, tools)
396393
try:
397394
extra_headers = self._map_extra_headers(beta_features, model_settings)
398395

@@ -494,10 +491,7 @@ def _get_tools(
494491
]
495492

496493
# Add cache_control to the last tool if enabled
497-
if tools and (
498-
cache_tool_defs := model_settings.get('anthropic_cache_tool_definitions')
499-
or model_settings.get('anthropic_cache_all')
500-
):
494+
if tools and (cache_tool_defs := model_settings.get('anthropic_cache_tool_definitions')):
501495
# If True, use '5m'; otherwise use the specified ttl value
502496
ttl: Literal['5m', '1h'] = '5m' if cache_tool_defs is True else cache_tool_defs
503497
last_tool = tools[-1]
@@ -766,9 +760,9 @@ async def _map_message( # noqa: C901
766760
system_prompt_parts.insert(0, instructions)
767761
system_prompt = '\n\n'.join(system_prompt_parts)
768762

769-
# Add cache_control to the last message content if anthropic_cache_all is enabled
770-
if anthropic_messages and (cache_all := model_settings.get('anthropic_cache_all')):
771-
ttl: Literal['5m', '1h'] = '5m' if cache_all is True else cache_all
763+
# Add cache_control to the last message content if anthropic_cache_messages is enabled
764+
if anthropic_messages and (cache_messages := model_settings.get('anthropic_cache_messages')):
765+
ttl: Literal['5m', '1h'] = '5m' if cache_messages is True else cache_messages
772766
m = anthropic_messages[-1]
773767
content = m['content']
774768
if isinstance(content, str):
@@ -785,13 +779,8 @@ async def _map_message( # noqa: C901
785779
content = cast(list[BetaContentBlockParam], content)
786780
self._add_cache_control_to_last_param(content, ttl)
787781

788-
# Ensure total cache points don't exceed Anthropic's limit of 4
789-
self._limit_cache_points(anthropic_messages, model_settings)
790782
# If anthropic_cache_instructions is enabled, return system prompt as a list with cache_control
791-
if system_prompt and (
792-
cache_instructions := model_settings.get('anthropic_cache_instructions')
793-
or model_settings.get('anthropic_cache_all')
794-
):
783+
if system_prompt and (cache_instructions := model_settings.get('anthropic_cache_instructions')):
795784
# If True, use '5m'; otherwise use the specified ttl value
796785
ttl: Literal['5m', '1h'] = '5m' if cache_instructions is True else cache_instructions
797786
system_prompt_blocks = [
@@ -806,60 +795,57 @@ async def _map_message( # noqa: C901
806795
return system_prompt, anthropic_messages
807796

808797
@staticmethod
809-
def _limit_cache_points(messages: list[BetaMessageParam], model_settings: AnthropicModelSettings) -> None:
810-
"""Limit the number of cache points in messages to comply with Anthropic's 4-cache-point maximum.
811-
812-
Anthropic allows a maximum of 4 cache points per request. This method ensures compliance by:
813-
1. Calculating how many cache points are already used by system-level settings
814-
(anthropic_cache_instructions, anthropic_cache_tool_definitions, anthropic_cache_all)
815-
2. Determining how many cache points remain available for message-level caching
816-
3. Traversing messages from newest to oldest, keeping only the allowed number of cache points
817-
4. Removing cache_control from older cache points that exceed the limit
798+
def _limit_cache_points(
799+
system_prompt: str | list[BetaTextBlockParam],
800+
anthropic_messages: list[BetaMessageParam],
801+
tools: list[BetaToolUnionParam],
802+
) -> None:
803+
"""Limit the number of cache points in the request to Anthropic's maximum.
804+
805+
Strategy:
806+
1. Keep the last cache point in system_prompt and tools (if present)
807+
2. Count cache points already used in system_prompt and tools
808+
3. Traverse messages from newest to oldest, keeping the most recent cache points
809+
until the maximum limit is reached
810+
"""
811+
MAX_CACHE_POINTS = 4
818812

819-
This prioritizes recent cache points, which are typically more valuable for conversation continuity.
813+
# Count existing cache points in system prompt
814+
used_cache_points = (
815+
sum(1 for block in system_prompt if 'cache_control' in cast(dict[str, Any], block))
816+
if isinstance(system_prompt, list)
817+
else 0
818+
)
820819

821-
Args:
822-
messages: List of message parameters to limit cache points in.
823-
model_settings: Model settings containing cache configuration.
824-
"""
825-
# Anthropic's maximum cache points per request
826-
max_cache_points = 4
827-
used_cache_points = 0
828-
829-
# Calculate cache points used by system-level settings
830-
if model_settings.get('anthropic_cache_all'):
831-
# anthropic_cache_all adds cache points for both system instructions and last message
832-
used_cache_points += 2
833-
else:
834-
if model_settings.get('anthropic_cache_instructions'):
835-
used_cache_points += 1
836-
if model_settings.get('anthropic_cache_tool_definitions'):
837-
# Assume used one cache point for tool definitions
820+
# Count existing cache points in tools (any tool may have cache_control)
821+
# Note: cache_control can be in the middle of tools list if builtin tools are added after
822+
for tool in tools:
823+
if 'cache_control' in tool:
838824
used_cache_points += 1
839825

840-
# Calculate remaining cache points available for message content
841-
keep_cache_points = max_cache_points - used_cache_points
842-
843-
# Traverse messages from back to front (newest to oldest)
844-
remaining_cache_points = keep_cache_points
845-
for message in reversed(messages):
826+
# Calculate remaining cache points budget for messages
827+
remaining_budget = MAX_CACHE_POINTS - used_cache_points
828+
if remaining_budget < 0: # pragma: no cover
829+
raise UserError(
830+
f'Too many cache points for Anthropic request. '
831+
f'System prompt and tool definitions already use {used_cache_points} cache points, '
832+
f'which exceeds the maximum of {MAX_CACHE_POINTS}.'
833+
)
834+
# Remove excess cache points from messages (newest to oldest)
835+
for message in reversed(anthropic_messages):
846836
content = message['content']
847-
# Skip if content is a string or None
848837
if isinstance(content, str): # pragma: no cover
849838
continue
850-
content = cast(list[BetaContentBlockParam], content)
851-
# Traverse content blocks from back to front within each message
852-
for block in reversed(content):
853-
# Cast to dict for TypedDict manipulation
839+
840+
# Process content blocks in reverse order (newest first)
841+
for block in reversed(cast(list[BetaContentBlockParam], content)):
854842
block_dict = cast(dict[str, Any], block)
855843

856-
# Check if this block has cache_control
857844
if 'cache_control' in block_dict:
858-
if remaining_cache_points > 0:
859-
# Keep this cache point (within limit)
860-
remaining_cache_points -= 1
845+
if remaining_budget > 0:
846+
remaining_budget -= 1
861847
else:
862-
# Remove cache_control as we've exceeded the limit
848+
# Exceeded limit, remove this cache point
863849
del block_dict['cache_control']
864850

865851
@staticmethod

0 commit comments

Comments
 (0)