-
Notifications
You must be signed in to change notification settings - Fork 5k
Closed
Labels
Description
When deploying in a production environment, it's important to be aware of potential rate limits. For Azure OpenAI, there are specific limits in place: GPT-3.5 models have a maximum capacity of 240,000 transactions per minute (TPM), while GPT-4 models are limited to 60,000 TPM. To address these limitations, a viable strategy is to employ multiple Azure OpenAI instances distributed across different regions. These instances can then be accessed through a load balancer, helping to manage and distribute the incoming requests effectively
This issue is for a: (mark with an x)
- [ ] bug report -> please search issues before submitting
- [X ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
OS and Version?
Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
azd version?
run
azd versionand copy paste here.
Versions
Mention any other details that might be useful
Thanks! We'll be in touch soon.