-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add Bedrock/Anthropic prompt caching #2560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add Bedrock/Anthropic prompt caching #2560
Conversation
accb936
to
84c745c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @larryhudson, I've left some notes on the implementation. I'll review the tests and examples once the implementation itself has stablized.
@@ -449,7 +461,7 @@ class ToolReturn: | |||
|
|||
|
|||
# Ideally this would be a Union of types, but Python 3.9 requires it to be a string, and strings don't work with `isinstance``. | |||
MultiModalContentTypes = (ImageUrl, AudioUrl, DocumentUrl, VideoUrl, BinaryContent) | |||
MultiModalContentTypes = (ImageUrl, AudioUrl, DocumentUrl, VideoUrl, BinaryContent, CachePoint) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this should be added here, as cache points are not multi-modal content
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
if isinstance(item, CachePoint): | ||
if last_block is not None: | ||
# Add cache_control to the previous content block | ||
last_block['cache_control'] = {'type': 'ephemeral'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a ModelRequest
has two UserPromptPart
s, or a SystemPromptPart
followed by a UserPromptPart
, and the cache point is the first item on the second part, I think we should apply this to the last part of the first part. That means we should probably do this one level up in _map_message
. We can allow _map_user_prompt
to yield a CachePoint
directly and handle it in a special way.
That will also allow putting this after a ToolReturnPart
, and having it be added to the tool return part itself.
This would also simplify the example because you can always just add a new UserPromptPart
with a CachePoint
instead of adding it to the existing one and having to check whether it's a str
etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I've implemented this pattern in models/bedrock.py and will apply the same logic in models/anthropic.py. I think handling the case when the UserPromptPart starts with a CachePoint, immediately following the SystemPromptPart, makes the code a little hard to follow! So open to any ideas to make it clearer / simpler for the user
cache_creation_tokens = details.get('cache_creation_input_tokens', 0) | ||
cache_read_tokens = details.get('cache_read_input_tokens', 0) | ||
|
||
request_tokens = input_tokens + cache_creation_tokens + cache_read_tokens | ||
|
||
return usage.Usage( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we leave the Usage changes off for now and do them in a followup PR? We just did some refactoring there: #2378 so I'll have to check how this corresponds to the other places we're going to be using the usage data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks
profile = self._provider.model_profile(self.model_name) | ||
if isinstance(profile, BedrockModelProfile): | ||
return profile.bedrock_supports_prompt_caching | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should use BedrockModelProfile.from_profile(self.profile).bedrock_supports_prompt_caching
. We can inline that where we use it and drop this helper method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, have cleaned it up
provider_to_profile: dict[str, Callable[[str], ModelProfile | None]] = { | ||
'anthropic': lambda model_name: BedrockModelProfile( | ||
bedrock_supports_tool_choice=False, bedrock_send_back_thinking_parts=True | ||
).update(anthropic_model_profile(model_name)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we restore this method to how it originally looked, and then move the new logic into new bedrock_anthropic_model_profile
and bedrock_amazon_model_profile
functions defined further up? We can do the same for bedrock_mistral_model_profile
for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, good idea
@larryhudson Just a heads-up that I'll be out this coming week and will be back the 25th. Assuming this is not urgent I'll review it then. If it is, please ping |
ff79d68
to
ef70980
Compare
ef70980
to
ac6a93c
Compare
Thanks @DouweM, I've spent some time on this today and will spend a little more time cleaning it up. All good to wait until you're back, have a good time off! |
Closes #1041