Skip to content

[Feature request] - Support for prompt caching for AWS bedrock #8669

@dempsey-ryan

Description

@dempsey-ryan

AWS Bedrock has a prompt caching feature, similar to Anthropic, that has the potential to significantly reduce the cost of using Roo-Code. Is it possible to use this feature today, if not, is it something that could be added? Here is some quick research / background (using Perplexity)


  1. Enable prompt caching when making API calls to Amazon Bedrock. This can be done by adding the explicitPromptCaching='enabled' parameter to your invoke_model request[2].

  2. Structure your prompts to take advantage of caching:

    • Ensure your prompts meet the minimum token requirement for creating cache checkpoints. For example, the Anthropic Claude 3.5 Sonnet v2 model requires at least 1,024 tokens[2].
    • Use the cache_control property in your request body to specify which parts of the prompt should be cached[2].
  3. Here's an example of how to structure your API call:

response = bedrock_client.invoke_model(
    body={
        "anthropic_version": "bedrock-2023-05-31",
        "system": "Reply concisely",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Your main prompt here"
                    },
                    {
                        "type": "text",
                        "text": "Additional context to meet minimum token requirement",
                        "cache_control": {"type": "ephemeral"}
                    }
                ]
            }
        ],
        "max_tokens": 2048,
        "temperature": 0.5
    },
    modelId=modelId,
    accept=accept,
    contentType=contentType,
    explicitPromptCaching='enabled'
)[2]
  1. Be aware that cache checkpoints have a five-minute Time To Live (TTL), which resets with each successful cache hit[2].

  2. For more complex implementations, consider using Amazon Bedrock's features like Agents, which automatically handle prompt caching when enabled[2].

By implementing these steps, ROO Code can benefit from reduced costs (up to 90%) and improved latency (up to 85%) when using Bedrock models with prompt caching[1].

Citations:
[1] https://aws.amazon.com/bedrock/prompt-caching/
[2] https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
[3] langchain-ai/langchain#25610
[4] https://www.reddit.com/r/ClaudeAI/comments/1esto2i/anthropic_just_released_prompt_caching_making/
[5] https://api.python.langchain.com/en/latest/aws/llms/langchain_aws.llms.bedrock.Bedrock.html
[6] https://opentools.ai/news/aws-supercharges-bedrock-llm-service-with-prompt-routing-and-caching
[7] https://www.youtube.com/watch?v=2mNXSv7cTLI
[8] https://news.ycombinator.com/item?id=41284639

Originally posted by @mdlmarkham in #482

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue/PR - TriageNew issue. Needs quick review to confirm validity and assign labels.feature requestFeature request, not a bug

    Type

    No type

    Projects

    Status

    Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions