Skip to content

Inconsistent token requirements in docs. #177

@Blackmist

Description

@Blackmist

Describe the bug

  • DeploymentGuide.md says "By default, the GPT model capacity in deployment is set to 140k tokens. To adjust quota settings, follow these steps (link to AzureGPTQuotaSettings.md)."
  • Right below that is a statement that "We recommend increasing the capacity to 100k for optimal performance." Which contradicts the 140k previously stated.
  • There's also a table above this that says that the default value for model capacity is 100k.
  • quota_check.md says that the capacity should be at least 50k.
  • CustomizingAzdParameters.md says that the model capacity used can be adjusted using the AZURE_ENV_MODEL_Capacity environment variable. It cannot.
  • Azd up in an environment that has 50k tokens for gpt-4o returns an error that it needs 140k. Poking around, this is hard coded in the main.bicep file; param capacity int = 140.

Expected behavior

  • That we consistently call out the default/required token capacity.
  • That when we say it can be modified by setting an environment variable, that it can be.

How does this bug make you feel?

Share a gif from giphy to tells us how you'd feel


Debugging information

Steps to reproduce

Steps to reproduce the behavior:

  1. Use a subscription that has only 50k tokens for the model
  2. Run azd up. Get error.
  3. Read through docs, notice inconsistencies.
  4. Use the quota_check to verify it sees 50. It does.
  5. Use the environment variable to set the quota I'd like it to use. It does not.
  6. Poke around in code, notice that main.bicep has param capacity int = 140 and change that to 50. Deployment completes.

Screenshots

If applicable, add screenshots to help explain your problem.

Logs

Packaging services (azd package)

Provisioning Azure resources (azd provision)
Provisioning Azure resources can take some time.

Subscription: larrysub (4d5a5064-e89b-4a64-b706-5c858d02f015)
Location: East US 2

| ===| Comparing deployment state
ERROR: error executing step command 'provision': deployment failed: error deploying infrastructure: validating deployment to resource group:

Validation Error Details:
InvalidTemplateDeployment: The template deployment 'ldf2-1746466999' is not valid according to the validation procedure. The tracking id is '512db1ce-16c9-45da-aad8-ad9da79a59fe'. See inner errors for details.
InsufficientQuota: This operation require 140 new capacity in quota Tokens Per Minute (thousands) - gpt-4o - GlobalStandard, which is bigger than the current available capacity 50. The current quota usage is 0 and the quota limit is 50 for quota Tokens Per Minute (thousands) - gpt-4o - GlobalStandard.

TraceID: ac84707be52ace9bc2a5c2fab8a161ed


Tasks

To be filled in by the engineer picking up the issue

  • Task 1
  • Task 2
  • ...

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions