Update how-to-deploy-models-llama.md

geabdluca · web-flow · commit 4e5db2d991a1 · 2024-11-20T08:31:28.000-06:00
Correcting the TPM limit
diff --git a/articles/machine-learning/how-to-deploy-models-llama.md b/articles/machine-learning/how-to-deploy-models-llama.md
@@ -609,7 +609,7 @@ For more information on how to track costs, see [Monitor costs for models offere
 
 :::image type="content" source="media/how-to-deploy-models-llama/costs-model-as-service-cost-details.png" alt-text="A screenshot showing different resources corresponding to different model offerings and their associated meters." lightbox="media/how-to-deploy-models-llama/costs-model-as-service-cost-details.png":::
 
-Quota is managed per deployment. Each deployment has a rate limit of 400,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
+Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
 
 ### Cost and quota considerations for Meta Llama 3.1 models deployed managed compute