-
Notifications
You must be signed in to change notification settings - Fork 422
Description
Describe the bug
- DeploymentGuide.md says "By default, the GPT model capacity in deployment is set to 140k tokens. To adjust quota settings, follow these steps (link to AzureGPTQuotaSettings.md)."
- Right below that is a statement that "We recommend increasing the capacity to 100k for optimal performance." Which contradicts the 140k previously stated.
- There's also a table above this that says that the default value for model capacity is 100k.
- quota_check.md says that the capacity should be at least 50k.
- CustomizingAzdParameters.md says that the model capacity used can be adjusted using the AZURE_ENV_MODEL_Capacity environment variable. It cannot.
- Azd up in an environment that has 50k tokens for gpt-4o returns an error that it needs 140k. Poking around, this is hard coded in the main.bicep file;
param capacity int = 140.
Expected behavior
- That we consistently call out the default/required token capacity.
- That when we say it can be modified by setting an environment variable, that it can be.
How does this bug make you feel?
Share a gif from giphy to tells us how you'd feel
Debugging information
Steps to reproduce
Steps to reproduce the behavior:
- Use a subscription that has only 50k tokens for the model
- Run azd up. Get error.
- Read through docs, notice inconsistencies.
- Use the quota_check to verify it sees 50. It does.
- Use the environment variable to set the quota I'd like it to use. It does not.
- Poke around in code, notice that main.bicep has
param capacity int = 140and change that to 50. Deployment completes.
Screenshots
If applicable, add screenshots to help explain your problem.
Logs
Packaging services (azd package)
Provisioning Azure resources (azd provision)
Provisioning Azure resources can take some time.
Subscription: larrysub (4d5a5064-e89b-4a64-b706-5c858d02f015)
Location: East US 2
| ===| Comparing deployment state
ERROR: error executing step command 'provision': deployment failed: error deploying infrastructure: validating deployment to resource group:
Validation Error Details:
InvalidTemplateDeployment: The template deployment 'ldf2-1746466999' is not valid according to the validation procedure. The tracking id is '512db1ce-16c9-45da-aad8-ad9da79a59fe'. See inner errors for details.
InsufficientQuota: This operation require 140 new capacity in quota Tokens Per Minute (thousands) - gpt-4o - GlobalStandard, which is bigger than the current available capacity 50. The current quota usage is 0 and the quota limit is 50 for quota Tokens Per Minute (thousands) - gpt-4o - GlobalStandard.
TraceID: ac84707be52ace9bc2a5c2fab8a161ed
Tasks
To be filled in by the engineer picking up the issue
- Task 1
- Task 2
- ...