|
| 1 | +# Amazon Bedrock Service Tiers |
| 2 | + |
| 3 | +The GenAI IDP solution supports Amazon Bedrock service tiers, allowing you to optimize for performance and cost by selecting different service tiers for model inference operations. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +Amazon Bedrock offers three service tiers for on-demand inference: |
| 8 | + |
| 9 | +| Tier | Performance | Cost | Best For | |
| 10 | +|------|-------------|------|----------| |
| 11 | +| **Priority** | Fastest response times | Premium pricing (~25% more) | Customer-facing workflows, real-time interactions | |
| 12 | +| **Standard** | Consistent performance | Regular pricing | Everyday AI tasks, content generation | |
| 13 | +| **Flex** | Variable latency | Discounted pricing | Batch processing, evaluations, non-urgent workloads | |
| 14 | + |
| 15 | +## Configuration |
| 16 | + |
| 17 | +### Global Service Tier |
| 18 | + |
| 19 | +Set a default service tier for all operations in your configuration: |
| 20 | + |
| 21 | +```yaml |
| 22 | +# Global default applies to all operations |
| 23 | +service_tier: "standard" |
| 24 | +``` |
| 25 | +
|
| 26 | +### Operation-Specific Overrides |
| 27 | +
|
| 28 | +Override the global setting for specific operations: |
| 29 | +
|
| 30 | +```yaml |
| 31 | +# Global default |
| 32 | +service_tier: "standard" |
| 33 | + |
| 34 | +# Operation-specific overrides |
| 35 | +classification: |
| 36 | + service_tier: "priority" # Fast classification for real-time workflows |
| 37 | + model: "us.amazon.nova-pro-v1:0" |
| 38 | + # ... other settings |
| 39 | + |
| 40 | +extraction: |
| 41 | + service_tier: "flex" # Cost-effective extraction for batch processing |
| 42 | + model: "us.amazon.nova-pro-v1:0" |
| 43 | + # ... other settings |
| 44 | + |
| 45 | +assessment: |
| 46 | + service_tier: null # null = use global default (standard) |
| 47 | + # ... other settings |
| 48 | + |
| 49 | +summarization: |
| 50 | + service_tier: "flex" # Summarization can tolerate longer latency |
| 51 | + # ... other settings |
| 52 | +``` |
| 53 | + |
| 54 | +### Valid Values |
| 55 | + |
| 56 | +- `"priority"` - Fastest response times, premium pricing |
| 57 | +- `"standard"` - Default tier, consistent performance (also accepts `"default"`) |
| 58 | +- `"flex"` - Cost-effective, longer latency |
| 59 | +- `null` or omitted - Uses global default or "standard" if no global set |
| 60 | + |
| 61 | +## Web UI Configuration |
| 62 | + |
| 63 | +### Global Service Tier |
| 64 | + |
| 65 | +1. Navigate to the Configuration page |
| 66 | +2. Find the "Service Tier (Global Default)" dropdown near the top |
| 67 | +3. Select your preferred tier: |
| 68 | + - **Standard (Default)** - Consistent performance |
| 69 | + - **Priority (Fastest)** - Premium speed |
| 70 | + - **Flex (Cost-Effective)** - Budget-friendly |
| 71 | +4. Changes save automatically |
| 72 | + |
| 73 | +### Operation-Specific Overrides |
| 74 | + |
| 75 | +Within each operation section (Classification, Extraction, Assessment, Summarization): |
| 76 | + |
| 77 | +1. Find the "Service Tier Override" dropdown |
| 78 | +2. Select an option: |
| 79 | + - **Use Global Default** - Inherit global setting |
| 80 | + - **Priority (Fastest)** - Override with priority |
| 81 | + - **Standard** - Override with standard |
| 82 | + - **Flex (Cost-Effective)** - Override with flex |
| 83 | +3. The UI shows the current effective tier |
| 84 | + |
| 85 | +## CLI Usage |
| 86 | + |
| 87 | +### Deployment |
| 88 | + |
| 89 | +Specify service tier during stack deployment: |
| 90 | + |
| 91 | +```bash |
| 92 | +idp-cli deploy \ |
| 93 | + --stack-name my-idp-stack \ |
| 94 | + --pattern pattern-2 \ |
| 95 | + |
| 96 | + --service-tier flex |
| 97 | +``` |
| 98 | + |
| 99 | +### Batch Processing |
| 100 | + |
| 101 | +Override service tier for a specific batch: |
| 102 | + |
| 103 | +```bash |
| 104 | +idp-cli run-inference \ |
| 105 | + --stack-name my-idp-stack \ |
| 106 | + --dir ./documents/ \ |
| 107 | + --service-tier priority \ |
| 108 | + --monitor |
| 109 | +``` |
| 110 | + |
| 111 | +**Note:** CLI service tier parameter sets the global default in configuration. For operation-specific control, use configuration files or the Web UI. |
| 112 | + |
| 113 | +## Use Case Recommendations |
| 114 | + |
| 115 | +### Priority Tier |
| 116 | + |
| 117 | +**When to use:** |
| 118 | +- Customer-facing chat assistants |
| 119 | +- Real-time document processing |
| 120 | +- Interactive AI applications |
| 121 | +- Time-sensitive workflows |
| 122 | + |
| 123 | +**Example configuration:** |
| 124 | +```yaml |
| 125 | +service_tier: "priority" # All operations use priority |
| 126 | +``` |
| 127 | +
|
| 128 | +### Standard Tier |
| 129 | +
|
| 130 | +**When to use:** |
| 131 | +- General document processing |
| 132 | +- Content generation |
| 133 | +- Text analysis |
| 134 | +- Routine workflows |
| 135 | +
|
| 136 | +**Example configuration:** |
| 137 | +```yaml |
| 138 | +service_tier: "standard" # Default, no configuration needed |
| 139 | +``` |
| 140 | +
|
| 141 | +### Flex Tier |
| 142 | +
|
| 143 | +**When to use:** |
| 144 | +- Batch document processing |
| 145 | +- Model evaluations |
| 146 | +- Content summarization |
| 147 | +- Non-urgent workflows |
| 148 | +- Cost optimization |
| 149 | +
|
| 150 | +**Example configuration:** |
| 151 | +```yaml |
| 152 | +service_tier: "flex" # All operations use flex |
| 153 | + |
| 154 | +# Or mixed approach |
| 155 | +service_tier: "standard" # Global default |
| 156 | +classification: |
| 157 | + service_tier: "priority" # Fast classification |
| 158 | +extraction: |
| 159 | + service_tier: "flex" # Cost-effective extraction |
| 160 | +``` |
| 161 | +
|
| 162 | +## Mixed Tier Strategy |
| 163 | +
|
| 164 | +Optimize cost and performance by using different tiers for different operations: |
| 165 | +
|
| 166 | +```yaml |
| 167 | +# Global default for most operations |
| 168 | +service_tier: "standard" |
| 169 | + |
| 170 | +# Fast classification for real-time user experience |
| 171 | +classification: |
| 172 | + service_tier: "priority" |
| 173 | + model: "us.amazon.nova-pro-v1:0" |
| 174 | + |
| 175 | +# Standard extraction (inherit global) |
| 176 | +extraction: |
| 177 | + service_tier: null # Uses global "standard" |
| 178 | + model: "us.amazon.nova-pro-v1:0" |
| 179 | + |
| 180 | +# Cost-effective assessment (can tolerate latency) |
| 181 | +assessment: |
| 182 | + service_tier: "flex" |
| 183 | + model: "us.amazon.nova-lite-v1:0" |
| 184 | + |
| 185 | +# Cost-effective summarization (non-critical) |
| 186 | +summarization: |
| 187 | + service_tier: "flex" |
| 188 | + model: "us.amazon.nova-premier-v1:0" |
| 189 | +``` |
| 190 | +
|
| 191 | +## Performance Expectations |
| 192 | +
|
| 193 | +### Priority Tier |
| 194 | +- Up to 25% better output tokens per second (OTPS) latency vs standard |
| 195 | +- Requests prioritized over other tiers |
| 196 | +- Best for latency-sensitive applications |
| 197 | +
|
| 198 | +### Standard Tier |
| 199 | +- Consistent baseline performance |
| 200 | +- Suitable for most workloads |
| 201 | +- Balanced cost and performance |
| 202 | +
|
| 203 | +### Flex Tier |
| 204 | +- Variable latency (longer than standard) |
| 205 | +- Pricing discount over standard |
| 206 | +- Suitable for batch and background processing |
| 207 | +
|
| 208 | +## Cost Implications |
| 209 | +
|
| 210 | +- **Priority**: ~25% premium over standard pricing |
| 211 | +- **Standard**: Regular on-demand pricing (baseline) |
| 212 | +- **Flex**: Discounted pricing (varies by model) |
| 213 | +
|
| 214 | +Use the [AWS Pricing Calculator](https://calculator.aws/#/createCalculator/bedrock) to estimate costs for different service tiers. |
| 215 | +
|
| 216 | +## Monitoring |
| 217 | +
|
| 218 | +### CloudWatch Metrics |
| 219 | +
|
| 220 | +Service tier usage is tracked in CloudWatch metrics: |
| 221 | +- Dimension: `ServiceTier` shows requested tier |
| 222 | +- Dimension: `ResolvedServiceTier` shows actual tier that served the request |
| 223 | + |
| 224 | +### CloudWatch Logs |
| 225 | + |
| 226 | +Service tier information appears in Lambda function logs: |
| 227 | +``` |
| 228 | +Using service tier: default |
| 229 | +``` |
| 230 | + |
| 231 | +Look for this log message in: |
| 232 | +- OCR function logs |
| 233 | +- Classification function logs |
| 234 | +- Extraction function logs |
| 235 | +- Assessment function logs |
| 236 | +- Summarization function logs |
| 237 | + |
| 238 | +## Model Support |
| 239 | + |
| 240 | +Not all models support all service tiers. Check the [Amazon Bedrock documentation](https://docs.aws.amazon.com/bedrock/latest/userguide/service-tiers-inference.html) for current model support. |
| 241 | + |
| 242 | +**Supported models include:** |
| 243 | +- Amazon Nova models (Pro, Lite, Premier) |
| 244 | +- Anthropic Claude models |
| 245 | +- OpenAI models |
| 246 | +- Qwen models |
| 247 | +- DeepSeek models |
| 248 | + |
| 249 | +## Troubleshooting |
| 250 | + |
| 251 | +### Service Tier Not Applied |
| 252 | + |
| 253 | +**Symptom:** Logs don't show service tier being used |
| 254 | + |
| 255 | +**Solutions:** |
| 256 | +1. Verify service_tier is set in configuration |
| 257 | +2. Check for typos in tier name (must be: priority, standard, or flex) |
| 258 | +3. Ensure configuration is saved and loaded correctly |
| 259 | +4. Check CloudWatch logs for validation warnings |
| 260 | + |
| 261 | +### Invalid Service Tier Warning |
| 262 | + |
| 263 | +**Symptom:** Log shows "Invalid service_tier value" |
| 264 | + |
| 265 | +**Solutions:** |
| 266 | +1. Use only valid values: priority, standard, flex |
| 267 | +2. Check for extra spaces or incorrect casing |
| 268 | +3. Verify YAML syntax is correct |
| 269 | + |
| 270 | +### Model Not Supported |
| 271 | + |
| 272 | +**Symptom:** Bedrock API returns error about unsupported service tier |
| 273 | + |
| 274 | +**Solutions:** |
| 275 | +1. Check model supports the selected tier |
| 276 | +2. Refer to AWS documentation for model support matrix |
| 277 | +3. Fall back to standard tier for unsupported models |
| 278 | + |
| 279 | +## Best Practices |
| 280 | + |
| 281 | +1. **Start with Standard**: Use standard tier as baseline, then optimize |
| 282 | +2. **Monitor Costs**: Track usage by tier in CloudWatch and AWS Cost Explorer |
| 283 | +3. **Test Performance**: Compare latency across tiers for your workload |
| 284 | +4. **Mixed Strategy**: Use priority for critical paths, flex for batch operations |
| 285 | +5. **Document Decisions**: Note why specific tiers chosen for each operation |
| 286 | + |
| 287 | +## Additional Resources |
| 288 | + |
| 289 | +- [Amazon Bedrock Service Tiers User Guide](https://docs.aws.amazon.com/bedrock/latest/userguide/service-tiers-inference.html) |
| 290 | +- [Service Tiers API Reference](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ServiceTier.html) |
| 291 | +- [AWS Blog: Service Tiers Announcement](https://aws.amazon.com/blogs/aws/new-amazon-bedrock-service-tiers-help-you-match-ai-workload-performance-with-cost/) |
| 292 | +- [AWS Pricing Calculator](https://calculator.aws/#/createCalculator/bedrock) |
0 commit comments