You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/deployment-types.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,7 +33,7 @@ Azure OpenAI offers three types of deployments. These provide a varied level of
33
33
|**Best suited for**| Applications that don’t require data residency. Recommended starting place for customers. | For customers with data residency requirements. Optimized for low to medium volume. | Real-time scoring for large consistent volume. Includes the highest commitments and limits.|
34
34
|**How it works**| Traffic may be routed anywhere in the world |||
|**Cost**|[Baseline](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)|[Regional Pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)| May experience cost savings for consistent usage |
36
+
|**Cost**|[Global deployment pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)|[Regional pricing](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/)| May experience cost savings for consistent usage |
37
37
|**What you get**| Easy access to all new models with highest default pay-per-call limits.<br><br> Customers with high volume usage may see higher latency variability | Easy access with [SLA on availability](https://azure.microsoft.com/support/legal/sla/). Optimized for low to medium volume workloads with high burstiness. <br><br>Customers with high consistent volume may experience greater latency variability. | Regional access with very high & predictable throughput. Determine throughput per PTU using the provided [capacity calculator](./provisioned-throughput-onboarding.md#estimate-provisioned-throughput-and-cost)|
|**Per-call Latency**| Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time calling & low to medium volume usage. Customers with high volume usage may see higher latency variability. Threshold set per model | Optimized for real-time. |
0 commit comments