Skip to content

Add Bedrock/Anthropic prompt caching #2560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

larryhudson
Copy link

@larryhudson larryhudson commented Aug 14, 2025

Closes #1041

@larryhudson larryhudson force-pushed the larryhudson/add-bedrock-prompt-caching branch from accb936 to 84c745c Compare August 14, 2025 19:40
@DouweM DouweM self-assigned this Aug 14, 2025
Copy link
Collaborator

@DouweM DouweM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @larryhudson, I've left some notes on the implementation. I'll review the tests and examples once the implementation itself has stablized.

@@ -449,7 +461,7 @@ class ToolReturn:


# Ideally this would be a Union of types, but Python 3.9 requires it to be a string, and strings don't work with `isinstance``.
MultiModalContentTypes = (ImageUrl, AudioUrl, DocumentUrl, VideoUrl, BinaryContent)
MultiModalContentTypes = (ImageUrl, AudioUrl, DocumentUrl, VideoUrl, BinaryContent, CachePoint)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this should be added here, as cache points are not multi-modal content

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

if isinstance(item, CachePoint):
if last_block is not None:
# Add cache_control to the previous content block
last_block['cache_control'] = {'type': 'ephemeral'}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a ModelRequest has two UserPromptParts, or a SystemPromptPart followed by a UserPromptPart, and the cache point is the first item on the second part, I think we should apply this to the last part of the first part. That means we should probably do this one level up in _map_message. We can allow _map_user_prompt to yield a CachePoint directly and handle it in a special way.

That will also allow putting this after a ToolReturnPart, and having it be added to the tool return part itself.

This would also simplify the example because you can always just add a new UserPromptPart with a CachePoint instead of adding it to the existing one and having to check whether it's a str etc.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I've implemented this pattern in models/bedrock.py and will apply the same logic in models/anthropic.py. I think handling the case when the UserPromptPart starts with a CachePoint, immediately following the SystemPromptPart, makes the code a little hard to follow! So open to any ideas to make it clearer / simpler for the user

cache_creation_tokens = details.get('cache_creation_input_tokens', 0)
cache_read_tokens = details.get('cache_read_input_tokens', 0)

request_tokens = input_tokens + cache_creation_tokens + cache_read_tokens

return usage.Usage(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we leave the Usage changes off for now and do them in a followup PR? We just did some refactoring there: #2378 so I'll have to check how this corresponds to the other places we're going to be using the usage data.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks

profile = self._provider.model_profile(self.model_name)
if isinstance(profile, BedrockModelProfile):
return profile.bedrock_supports_prompt_caching
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use BedrockModelProfile.from_profile(self.profile).bedrock_supports_prompt_caching. We can inline that where we use it and drop this helper method

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, have cleaned it up

provider_to_profile: dict[str, Callable[[str], ModelProfile | None]] = {
'anthropic': lambda model_name: BedrockModelProfile(
bedrock_supports_tool_choice=False, bedrock_send_back_thinking_parts=True
).update(anthropic_model_profile(model_name)),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we restore this method to how it originally looked, and then move the new logic into new bedrock_anthropic_model_profile and bedrock_amazon_model_profile functions defined further up? We can do the same for bedrock_mistral_model_profile for consistency

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, good idea

@DouweM
Copy link
Collaborator

DouweM commented Aug 16, 2025

@larryhudson Just a heads-up that I'll be out this coming week and will be back the 25th. Assuming this is not urgent I'll review it then. If it is, please ping Kludex! Appreciate the patience :)

@larryhudson larryhudson force-pushed the larryhudson/add-bedrock-prompt-caching branch 10 times, most recently from ff79d68 to ef70980 Compare August 16, 2025 17:59
@larryhudson larryhudson force-pushed the larryhudson/add-bedrock-prompt-caching branch from ef70980 to ac6a93c Compare August 16, 2025 17:59
@larryhudson
Copy link
Author

@larryhudson Just a heads-up that I'll be out this coming week and will be back the 25th. Assuming this is not urgent I'll review it then. If it is, please ping Kludex! Appreciate the patience :)

Thanks @DouweM, I've spent some time on this today and will spend a little more time cleaning it up. All good to wait until you're back, have a good time off!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Anthropic prompt caching (inc. Anthropic on Bedrock)
2 participants