Skip to content

Commit 5212e31

Browse files
Merge pull request #262785 from msakande/TPM-increase-for-Llama-MaaS
TPM increase for Llama Maas
2 parents fad3ff8 + 6ecb7c1 commit 5212e31

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/ai-studio/how-to/deploy-models-llama.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -440,7 +440,7 @@ Each time a project subscribes to a given offer from the Azure Marketplace, a ne
440440

441441
:::image type="content" source="../media/cost-management/marketplace/costs-model-as-service-cost-details.png" alt-text="A screenshot showing different resources corresponding to different model offers and their associated meters." lightbox="../media/cost-management/marketplace/costs-model-as-service-cost-details.png":::
442442

443-
Quota is managed per deployment. Each deployment has a rate limit of 20,000 tokens per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits dont suffice your scenarios.
443+
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits don't suffice your scenarios.
444444

445445
### Considerations for Llama 2 models deployed as real-time endpoints
446446

0 commit comments

Comments
 (0)