Skip to content

Commit 6ecb7c1

Browse files
committed
TPM increase for Llama Maas
1 parent c63db17 commit 6ecb7c1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

articles/ai-studio/how-to/deploy-models-llama.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -440,7 +440,7 @@ Each time a project subscribes to a given offer from the Azure Marketplace, a ne
440440

441441
:::image type="content" source="../media/cost-management/marketplace/costs-model-as-service-cost-details.png" alt-text="A screenshot showing different resources corresponding to different model offers and their associated meters." lightbox="../media/cost-management/marketplace/costs-model-as-service-cost-details.png":::
442442

443-
Quota is managed per deployment. Each deployment has a rate limit of 20,000 tokens per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits dont suffice your scenarios.
443+
Quota is managed per deployment. Each deployment has a rate limit of 200,000 tokens per minute and 1,000 API requests per minute. However, we currently limit one deployment per model per project. Contact Microsoft Azure Support if the current rate limits don't suffice your scenarios.
444444

445445
### Considerations for Llama 2 models deployed as real-time endpoints
446446

0 commit comments

Comments
 (0)