You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(inferenceprofiles): add inference and cross-region inference pro… (#35048)
### Issue # (if applicable)
Closes #<issue number here>.
### Reason for this change
This PR introduces comprehensive support for Amazon Bedrock Inference Profiles in the AWS CDK Bedrock Alpha construct library, addressing the need for better cost tracking, model usage optimization, and cross-region inference capabilities.
### Description of changes
1. **Application Inference Profiles** : Added support for user-defined inference profiles that enable cost tracking and model usage monitoring
Single-region application profiles for basic cost tracking
Multi-region application profiles using cross-region inference profiles
2. **Cross-Region Inference Profiles**: Implemented system-defined profiles that enable seamless traffic distribution across multiple AWS regions
- Support for handling unplanned traffic bursts
- Enhanced resilience during peak demand periods
- Geographic region-based routing (US, EU regions)
3. **Prompt Routers**: Added intelligent prompt routing capabilities
### Describe any new or updated permissions being added
Implemented `grantProfileUsage()` method for proper IAM permission handling
- Support for granting inference profile usage to other AWS resources
- Proper IAM policy generation for profile access
### Description of how you validated changes
Added unit test
Added integ test
And tested it with a cdkApp deployment.
### Checklist
- [ Y] My code adheres to the [CONTRIBUTING GUIDE](https://github.com/aws/aws-cdk/blob/main/CONTRIBUTING.md) and [DESIGN GUIDELINES](https://github.com/aws/aws-cdk/blob/main/docs/DESIGN_GUIDELINES.md)
----
*By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
promptVersion: '1', // optional, defaults to 'DRAFT'
808
814
});
809
815
```
816
+
817
+
## Inference Profiles
818
+
819
+
Amazon Bedrock Inference Profiles provide a way to manage and optimize inference configurations for your foundation models. They allow you to define reusable configurations that can be applied across different prompts and agents.
820
+
821
+
### Using Inference Profiles
822
+
823
+
Inference profiles can be used with prompts and agents to maintain consistent inference configurations across your application.
Amazon Bedrock offers two types of inference profiles:
866
+
867
+
#### Application Inference Profiles
868
+
869
+
Application inference profiles are user-defined profiles that help you track costs and model usage. They can be created for a single region or for multiple regions using a cross-region inference profile.
870
+
871
+
##### Single Region Application Profile
872
+
873
+
```ts fixture=default
874
+
// Create an application inference profile for one Region
description: 'Multi-region application profile for cost tracking',
899
+
});
900
+
```
901
+
902
+
#### System Defined Inference Profiles
903
+
904
+
Cross-region inference enables you to seamlessly manage unplanned traffic bursts by utilizing compute across different AWS Regions. With cross-region inference, you can distribute traffic across multiple AWS Regions, enabling higher throughput and enhanced resilience during periods of peak demands.
905
+
906
+
Before using a CrossRegionInferenceProfile, ensure that you have access to the models and regions defined in the inference profiles. For instance, if you use the system defined inference profile "us.anthropic.claude-3-5-sonnet-20241022-v2:0", inference requests will be routed to US East (Virginia) us-east-1, US East (Ohio) us-east-2 and US West (Oregon) us-west-2. Thus, you need to have model access enabled in those regions for the model anthropic.claude-3-5-sonnet-20241022-v2:0.
Amazon Bedrock intelligent prompt routing provides a single serverless endpoint for efficiently routing requests between different foundational models within the same model family. It can help you optimize for response quality and cost. They offer a comprehensive solution for managing multiple AI models through a single serverless endpoint, simplifying the process for you. Intelligent prompt routing predicts the performance of each model for each request, and dynamically routes each request to the model that it predicts is most likely to give the desired response at the lowest cost.
The `grantProfileUsage` method adds the necessary IAM permissions to the resource, allowing it to use the inference profile. This includes permissions to call `bedrock:GetInferenceProfile` and `bedrock:ListInferenceProfiles` actions on the inference profile resource.
974
+
975
+
### Inference Profiles Import Methods
976
+
977
+
You can import existing application inference profiles using the following methods:
0 commit comments