[Feature request] - Support for prompt caching for AWS bedrock

AWS Bedrock has a prompt caching feature, similar to Anthropic, that has the potential to significantly reduce the cost of using Roo-Code. Is it possible to use this feature today, if not, is it something that could be added?  Here is some quick research / background (using Perplexity)

-------------------------------------------------------------------------------------------------------------

1. Enable prompt caching when making API calls to Amazon Bedrock. This can be done by adding the `explicitPromptCaching='enabled'` parameter to your `invoke_model` request[2].

2. Structure your prompts to take advantage of caching:

   - Ensure your prompts meet the minimum token requirement for creating cache checkpoints. For example, the Anthropic Claude 3.5 Sonnet v2 model requires at least 1,024 tokens[2].
   - Use the `cache_control` property in your request body to specify which parts of the prompt should be cached[2].

3. Here's an example of how to structure your API call:

```python
response = bedrock_client.invoke_model(
    body={
        "anthropic_version": "bedrock-2023-05-31",
        "system": "Reply concisely",
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": "Your main prompt here"
                    },
                    {
                        "type": "text",
                        "text": "Additional context to meet minimum token requirement",
                        "cache_control": {"type": "ephemeral"}
                    }
                ]
            }
        ],
        "max_tokens": 2048,
        "temperature": 0.5
    },
    modelId=modelId,
    accept=accept,
    contentType=contentType,
    explicitPromptCaching='enabled'
)[2]
```

4. Be aware that cache checkpoints have a five-minute Time To Live (TTL), which resets with each successful cache hit[2].

5. For more complex implementations, consider using Amazon Bedrock's features like Agents, which automatically handle prompt caching when enabled[2].

By implementing these steps, ROO Code can benefit from reduced costs (up to 90%) and improved latency (up to 85%) when using Bedrock models with prompt caching[1].

Citations:
[1] https://aws.amazon.com/bedrock/prompt-caching/
[2] https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html
[3] https://github.com/langchain-ai/langchain/discussions/25610
[4] https://www.reddit.com/r/ClaudeAI/comments/1esto2i/anthropic_just_released_prompt_caching_making/
[5] https://api.python.langchain.com/en/latest/aws/llms/langchain_aws.llms.bedrock.Bedrock.html
[6] https://opentools.ai/news/aws-supercharges-bedrock-llm-service-with-prompt-routing-and-caching
[7] https://www.youtube.com/watch?v=2mNXSv7cTLI
[8] https://news.ycombinator.com/item?id=41284639

_Originally posted by @mdlmarkham in https://github.com/RooCodeInc/Roo-Code/discussions/482_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature request] - Support for prompt caching for AWS bedrock #8669

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature request] - Support for prompt caching for AWS bedrock #8669

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions