Skip to content

Commit f23a9b1

Browse files
andredewesandredewespamelafox
authored
Updating load balancing instructions (#1598)
* Adding support for custom Azure deployments * Fixing formatting * Fixing using Pamela's suggestions * Fixing format * Better clarity * Updating load balancing instructions * Apply suggestions from code review * Update productionizing.md --------- Co-authored-by: andredewes <[email protected]> Co-authored-by: Pamela Fox <[email protected]>
1 parent 0c4c55c commit f23a9b1

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

docs/productionizing.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,9 @@ If the maximum TPM isn't enough for your expected load, you have a few options:
2424

2525
* Use a backoff mechanism to retry the request. This is helpful if you're running into a short-term quota due to bursts of activity but aren't over long-term quota. The [tenacity](https://tenacity.readthedocs.io/en/latest/) library is a good option for this, and this [pull request](https://github.com/Azure-Samples/azure-search-openai-demo/pull/500) shows how to apply it to this app.
2626

27-
* If you are consistently going over the TPM, then consider implementing a load balancer between OpenAI instances. Most developers implement that using Azure API Management using [the openai-apim-lb repo](https://github.com/Azure-Samples/openai-apim-lb) or with Azure Container Apps using [the openai-aca-lb repo](https://github.com/Azure-Samples/openai-aca-lb). Another approach is to use [LiteLLM's load balancer](https://docs.litellm.ai/docs/providers/azure#azure-api-load-balancing) with Azure Cache for Redis.
27+
* If you are consistently going over the TPM, then consider implementing a load balancer between OpenAI instances. Most developers implement that using Azure API Management or container-based load balancers. For seamless integration instructions with this sample, please check:
28+
* [Scale Azure OpenAI for Python with Azure API Management](https://learn.microsoft.com/azure/developer/python/get-started-app-chat-scaling-with-azure-api-management)
29+
* [Scale Azure OpenAI for Python chat using RAG with Azure Container Apps](https://learn.microsoft.com/azure/developer/python/get-started-app-chat-scaling-with-azure-container-apps)
2830

2931
### Azure Storage
3032

0 commit comments

Comments
 (0)