Skip to content

feat(anthropic): add support for prompt caching #2669

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

tzolov
Copy link
Contributor

@tzolov tzolov commented Apr 8, 2025

NOTE: This is a rebased version of the original #1413 PR by @Claudio-code.

@Claudio-code is this the original author for this PR:

Implements Anthropic's prompt caching feature to improve token efficiency.

  • Adds cache control support in AnthropicApi and AnthropicChatModel
  • Creates AnthropicCacheType enum with EPHEMERAL cache type
  • Extends AbstractMessage and UserMessage to support cache parameters
  • Updates Usage tracking to include cache-related token metrics
  • Adds integration test to verify prompt caching functionality

This implementation follows Anthropic's prompt caching API (beta-2024-07-31) which allows for more efficient token usage by caching frequently used prompts.

Implements Anthropic's prompt caching feature to improve token efficiency.

- Adds cache control support in AnthropicApi and AnthropicChatModel
- Creates AnthropicCacheType enum with EPHEMERAL cache type
- Extends AbstractMessage and UserMessage to support cache parameters
- Updates Usage tracking to include cache-related token metrics
- Adds integration test to verify prompt caching functionality

This implementation follows Anthropic's prompt caching API (beta-2024-07-31) which allows
for more efficient token usage by caching frequently used prompts.
@tzolov tzolov added this to the 1.0.0-M7 milestone Apr 8, 2025
@tzolov tzolov self-assigned this Apr 8, 2025
@ilayaperumalg ilayaperumalg modified the milestones: 1.0.0-M7, 1.0.0-RC1 Apr 10, 2025
@Claudio-code
Copy link
Contributor

Hello, it seems right to me, thank you for updating the branch.

@markpollack markpollack modified the milestones: 1.0.0-RC1-triage, 1.0.x May 9, 2025
@zhahaoyu
Copy link

@tzolov @Claudio-code Can we merge this?

@Claudio-code
Copy link
Contributor

This implementation has a bug, the cache needs to be added only to the last message/tool on the list, you would need to make this change to be able to merge. langchain4j/langchain4j#3337

@Claudio-code
Copy link
Contributor

Without this fix the user will reach the cache limit faster than it should and this will generate an error in the anthropic API

@sobychacko
Copy link
Contributor

@Claudio-code Thanks for the ping. I am reviewing the PR right now and trying to merge. Is the bug you mentioned applicable to the user messages? The langchain code you pointed out is adding the last message for system messages. Thanks!

@sobychacko
Copy link
Contributor

@Claudio-code Are you planning to address this fix in this PR? In that case, feel free to take this PR and rebase against the latest main and then add your fix. Either way, please let us know here. Thanks!

@Claudio-code
Copy link
Contributor

It's because there people didn't want to add support for user messages because of the cache limitation of up to 4 cached items, the bug is in all messages and tools that can have cache

@Claudio-code
Copy link
Contributor

I made fix #4100

But this implementation doesn't support cached system messages, but adding this support requires changing the structure because currently the system message is just a string and would need to be a ContentBlock instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants