Skip to content

Commit c11962e

Browse files
Merge pull request #276480 from mrbullwinkle/mrb_05_28_2024_global
[Azure OpenAI] Update global standard
2 parents 1c057bd + ce94335 commit c11962e

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

articles/ai-services/openai/quotas-limits.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,18 @@ M = million | K = thousand
7575

7676
M = million | K = thousand
7777

78+
#### Usage tiers
79+
80+
Global Standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage may see more variability in response latency.
81+
82+
The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer’s usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
83+
84+
#### GPT-4o global standard & standard
85+
86+
|Model| Usage Tiers per month |
87+
|----|----|
88+
|`GPT-4o` |1.5 Billion tokens |
89+
7890
### General best practices to remain within rate limits
7991

8092
To minimize issues related to rate limits, it's a good idea to use the following techniques:

0 commit comments

Comments
 (0)