Skip to content

Load Balancing Azure OpenAI using Application Gateway  #546

@vrajroutu

Description

@vrajroutu

When deploying in a production environment, it's important to be aware of potential rate limits. For Azure OpenAI, there are specific limits in place: GPT-3.5 models have a maximum capacity of 240,000 transactions per minute (TPM), while GPT-4 models are limited to 60,000 TPM. To address these limitations, a viable strategy is to employ multiple Azure OpenAI instances distributed across different regions. These instances can then be accessed through a load balancer, helping to manage and distribute the incoming requests effectively

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [X ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

azd version?

run azd version and copy paste here.

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions