Skip to content

Conversation

@Smartsheet-JB-Brown
Copy link
Contributor

@Smartsheet-JB-Brown Smartsheet-JB-Brown commented Mar 28, 2025

-- Don't merge until AWS SDK is updated to have cacheReadTokens and cacheWriteTokens in the Bedrock response's usage object. At that point the package.json change can be reverted to a normal SDK reference.

Add AWS Bedrock Cache Strategy Implementation

Overview

This PR implements a cache strategy system for AWS Bedrock API requests, optimizing token usage and improving response times by strategically placing cache points throughout conversations.

AWS Prompt Caching is currently a private preview feature, but anticipated to be public soon. My org has had the opportunity to use it with this branch of Roo-Code for a couple weeks with good results.

Detailed Bedrock Cache Strategy Documentation.

Key Features

  • Implemented MultiPointStrategy for optimal cache point placement in conversations
  • Added support for system prompt caching when applicable
  • Created an adaptive algorithm that preserves cache points across growing conversations
  • Implemented token comparison optimization to ensure efficient cache point reallocation
  • Added comprehensive documentation with class diagrams and sequence diagrams

Implementation Details

The cache strategy system is designed to optimize the placement of cache points in AWS Bedrock API requests. Cache points allow the service to reuse previously processed parts of the prompt, reducing token usage and improving response times.

Cache Strategy Components

  • Abstract CacheStrategy class: Provides the base functionality for all cache strategies
  • MultiPointStrategy: Distributes cache points throughout the conversation to maximize caching efficiency
  • Integration with AwsBedrockHandler: Seamlessly integrates with the existing AWS Bedrock handler

Placement Logic

  • System Prompt Caching: Places a cache point after the system prompt if it exceeds the minimum token threshold
  • Message Caching: Places cache points after user messages, ensuring each cache point covers at least the minimum token threshold
  • Cache Point Preservation: Preserves previous cache points when possible in growing conversations
  • Adaptive Reallocation: Only reallocates cache points when the benefit outweighs the cost

Documentation

For detailed information about the implementation, including class relationships, sequence diagrams, and examples, please refer to the Bedrock Cache Strategy Documentation.

Testing

  • Added unit tests for the MultiPointStrategy class
  • Added integration tests with the AwsBedrockHandler
  • Tested with various conversation scenarios to ensure optimal cache point placement
  • Verified token usage reduction and response time improvements

Performance Impact

Initial testing shows:

  • Up to 30% reduction in token usage for repeated content
  • Up to 20% improvement in response times for conversations with multiple exchanges
  • Minimal overhead for cache point calculation and placement

Next Steps

  • Monitor performance in production
  • Consider additional optimization strategies based on real-world usage patterns
  • Explore potential for additional cache strategies tailored to specific use cases

Important

Implements a caching strategy for AWS Bedrock API requests, optimizing token usage and response times with a new MultiPointStrategy and updates to AwsBedrockHandler.

  • Behavior:
    • Implements MultiPointStrategy in multi-point-strategy.ts for optimal cache point placement in AWS Bedrock API requests.
    • Integrates caching strategy with AwsBedrockHandler in bedrock.ts.
    • Adds support for system prompt caching and adaptive cache point reallocation.
  • Models:
    • Updates bedrockModels in api.ts to include caching capabilities and pricing details.
    • Adds minTokensPerCachePoint, maxCachePoints, and cachableFields to model configurations.
  • Testing:
    • Adds unit tests for MultiPointStrategy in cache-strategy.test.ts.
    • Adds integration tests for AwsBedrockHandler in bedrock.test.ts and bedrock-custom-arn.test.ts.
  • Documentation:
    • Adds bedrock-cache-strategy-documentation.md detailing the cache strategy implementation.
  • Misc:
    • Updates package.json to use a local AWS SDK package for testing.
    • Adds UI elements in ApiOptions.tsx for enabling prompt caching.

This description was created by Ellipsis for 34220ee. It will automatically update as commits are pushed.

@changeset-bot
Copy link

changeset-bot bot commented Mar 28, 2025

⚠️ No Changeset found

Latest commit: a2275ed

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Mar 28, 2025
@ellipsis-dev
Copy link
Contributor

ellipsis-dev bot commented Mar 28, 2025

This pull request is quite large, with 20 files changed and over 4500 lines added. It includes a variety of changes such as documentation updates, caching strategy implementations, and test enhancements. To make the review process more manageable, could you consider splitting this pull request into smaller, more focused ones? For example:

  • Documentation Updates: Separate the changes related to the cache strategy documentation into its own pull request.
  • Caching Strategy Implementation: Group all changes related to the new caching strategies and their integration into a single pull request.
  • Test Enhancements: Create a separate pull request for the updates and enhancements to the test files.

This will help reviewers focus on specific areas and ensure a more thorough review process. Thank you!

@dosubot dosubot bot added documentation Improvements or additions to documentation enhancement New feature or request labels Mar 28, 2025
…raction from Arns. Change example of ARN use from the foundational model ARN to an inference profile ARN which is what is needed.
@mrubens
Copy link
Collaborator

mrubens commented Apr 2, 2025

@Premshay do you have any time to look at this? I know you were looking at Bedrock caching previously. Thank you!

Comment on lines +81 to +82
awsUsePromptCache?: boolean
setAwsUsePromptCache: (value: boolean) => void
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add this to extension state, or can it just be a field in the apioptions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I guess I'm confused - can you explain what the change is here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't know what "setAwsUsePRomptCache" is either, it seemed to be a pattern for input checkboxes. awsUsePromptCache is leveraged as a user decision if they want or don't want to use prompt caching when it's available for that model. It seemed reasonable as a user option, but I also don't have a use case specifically in mind for not using caching. So from a product design standpoint would be ok with an opinionated answer that it's simply driven off model support. Let me know what the Roo-Code team decides.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after review - setAwsUsePromptCache is needed to maintain the extension state in-between views, assuming we want to keep a user choice on prompt cache enable/disabled

See: webview-ui/src/context/ExtensionStateContext.tsx

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Contributor

@Atlogit Atlogit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks very extensive and thought out. And right on time, since caching has only just went to general availibilty. Looking forward to trying it out.

@Atlogit
Copy link
Contributor

Atlogit commented Apr 2, 2025

@Smartsheet-JB-Brown well done. This is greatly appreciated.
I wonder - would you be able to take a look at reasoning content blocks integration for aws bedrock?
I tried handling it when it came out and failed, and hadn't had the chance to come back to it. If I'm not mistaken, this hadn't been added to Roo-Code yet.

*
*************************************************************************************/

private static readonly REGION_INFO: Record<
Copy link
Collaborator

@mrubens mrubens Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this duplicative of the AWS_REGIONS constant? Could we align on adding any necessary info to that and using it?

https://github.com/RooVetGit/Roo-Code/blob/eb807643d408aed3bda9ca788cb4e75dacf4c29d/webview-ui/src/components/settings/constants.ts#L42-L66

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's in the webview-UI - could we have a shared constant somewhere though?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With some work we could. The one in bedrock.ts is more extensive than the UI list because it has AZs with Bedrock support and we'd need to add properties on a combined constant that would allow webview-UI to filter to the right list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry.. not AZs, but alternative region prefixes. I'll take a shot at combining though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've consolidated AWS Region constants into src/shared/aws_regions.ts and set the UI Region drop down options to be a filtered, sorted list of the full data set for bedrock ARN abbreviations.

review please @mrubens

Comment on lines +85 to +94
/**
* Check if a token count meets the minimum threshold for caching
*/
protected meetsMinTokenThreshold(tokenCount: number): boolean {
const minTokens = this.config.modelInfo.minTokensPerCachePoint
if (!minTokens) {
return false
}
return tokenCount >= minTokens
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we only using this for the system prompt? Seems unlikely that the Roo system prompt would ever dip below the minimum, but maybe I lack imagination.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep you are right about the system prompt. I think I was using this method originally in other places but ended up removing those calls. Who knows about future model minimums though.

supportsImages: true,
supportsComputerUse: false,
supportsPromptCache: false,
supportsPromptCache: true,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that some of these models are still in Preview, at least according to https://docs.aws.amazon.com/bedrock/latest/userguide/prompt-caching.html#prompt-caching-models. Will this break anything for people who are not in the preview?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bedrock ignores the cachePoint nodes that are invalid. There's an extension state option to use or not use caching as well, so there shouldn't be any blocking issue even that wasn't the case. right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there really is an extension state option for this - I don't see it in the UI

Copy link
Contributor Author

@Smartsheet-JB-Brown Smartsheet-JB-Brown Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only displays after you've selected a model that supports prompt caching. It's above the model drop down and directly under the cross-region inference check box. Maybe it should be located somewhere else?
Screenshot 2025-04-02 at 3 07 42 PM

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh I see!

@Premshay
Copy link
Contributor

Premshay commented Apr 2, 2025 via email

@Smartsheet-JB-Brown
Copy link
Contributor Author

@Atlogit - Re: Content Blocks - Maybe in a few weeks. In my personal Roo-Code use they are not as common, and I could only get this far during private preview use of prompt caching and I have a bit of OOO time starting in 2 days, so I wanted to get this out, handle any issues that might arise, and then determine the next most valuable thing.

Copy link
Collaborator

@mrubens mrubens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@mrubens mrubens merged commit 919bb12 into RooCodeInc:main Apr 3, 2025
12 checks passed
@github-project-automation github-project-automation bot moved this from PR [Pre Approval Review] to Done in Roo Code Roadmap Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants