Merge pull request #262785 from msakande/TPM-increase-for-Llama-MaaS

prmerger-automator[bot] · web-flow · commit 5212e31be14c · 2024-01-09T17:01:56.000Z
TPM increase for Llama Maas
diff --git a/articles/ai-studio/how-to/deploy-models-llama.md b/articles/ai-studio/how-to/deploy-models-llama.md
@@ -440,7 +440,7 @@ Each time a project subscribes to a given offer from the Azure Marketplace, a ne
 
 :::image type="content" source="../media/cost-management/marketplace/costs-model-as-service-cost-details.png" alt-text="A screenshot showing different resources corresponding to different model offers and their associated meters."  lightbox="../media/cost-management/marketplace/costs-model-as-service-cost-details.png":::
 
-Quota is managed per deployment. Each deployment has a rate limit of 20,000 tokens per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits don’t suffice your scenarios.
+Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits don't suffice your scenarios.
 
 ### Considerations for Llama 2 models deployed as real-time endpoints