Skip to content

Commit 4e5db2d

Browse files
authored
Update how-to-deploy-models-llama.md
Correcting the TPM limit
1 parent d170b3e commit 4e5db2d

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/machine-learning/how-to-deploy-models-llama.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -609,7 +609,7 @@ For more information on how to track costs, see [Monitor costs for models offere
609609

610610
:::image type="content" source="media/how-to-deploy-models-llama/costs-model-as-service-cost-details.png" alt-text="A screenshot showing different resources corresponding to different model offerings and their associated meters." lightbox="media/how-to-deploy-models-llama/costs-model-as-service-cost-details.png":::
611611

612-
Quota is managed per deployment. Each deployment has a rate limit of 400,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
612+
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits aren't sufficient for your scenarios.
613613

614614
### Cost and quota considerations for Meta Llama 3.1 models deployed managed compute
615615

0 commit comments

Comments
 (0)