Skip to content

[Bug]: Virtual Key Budget tracking does not work properly for routing #15223

@mxrcooo

Description

@mxrcooo

What happened?

I want to use model names by features in my application with tiered users. I want free users to be able to use a premium model (gpt-5) with a certain budget. Once the budget is exceeded. requests should be routed to a cheaper model (gpt-5-mini). Pro users have unlimited usage.

To implement this, I am using virtual keys with a model specific budget. My config is as follows:

model_list:
  - model_name: feedback
    litellm_params:
      # free users can use the premium model with a certain budget
      model: openai/gpt-5
      api_key: os.environ/OPENAI_API_KEY
      tags: ["free"]
  - model_name: feedback
    litellm_params:
      # pro users have unlimited usage of the premium model
      model: openai/gpt-5
      api_key: os.environ/OPENAI_API_KEY
      tags: ["pro"]
  - model_name: feedback-fallback
    litellm_params:
      # fallback model using the cheap model. if tags=["free"] exhaust their budget, they get routed here via fallback mechanism
      model: openai/gpt-5-mini
      api_key: os.environ/OPENAI_API_KEY
      tags: ["free", "pro"]

router_settings:
  num_retries: 2
  enable_tag_filtering: True
  fallbacks:
    - {"feedback": ["feedback-fallback"]}
  max_fallbacks: 1


litellm_settings:
  callbacks: ["prometheus"]

Not quite sure whether the tags are necessary, since the virtual key has the budget attached and should be routed to feedback-fallback because of the fallbacks configuration.

I wrote a quick test script to test this:

import asyncio
import re
import time
import uuid

import openai
from dotenv import load_dotenv

from app.ai.litellm.admin import litellm_admin_api
from app.core.config import settings

load_dotenv()


async def main():
    random_user_id = str(uuid.uuid4())

    user = await litellm_admin_api.create_user(random_user_id, user_role="customer")
    key = await litellm_admin_api.create_user_virtual_key(
        user_id=user.user_id,
        model_max_budget={
            "openai/gpt-5": {
                "budget_limit": 0.000000001,
                "time_period": "30d",
            }
        },
        tags=["free"],
    )
    print("key:", key)
    key = key.key

    print(f"User ID: {user.user_id}")
    print(f"User Key: {key}")

    client = openai.OpenAI(api_key=key, base_url=settings.LITELLM_BASE_URL)

    # this should return model=gpt-5-yyyy-mm-dd
    response = client.chat.completions.create(
        model="feedback",
        messages=[{"role": "user", "content": "Hello, world!"}],
        user=user.user_id,
        extra_body={"metadata": {"tags": ["free"]}},
    )
    print(f"First request used model {response.model}")
    assert re.match(r"gpt-5-\d{4}-\d{2}-\d{2}", response.model)

    time.sleep(5)

    # should get routed to mini now
    response = client.chat.completions.create(
        model="feedback",
        messages=[{"role": "user", "content": "Hello, world!"}],
        user=user.user_id,
        extra_body={"metadata": {"tags": ["free"]}},
    )
    print(f"Second request used model {response.model}")
    assert re.match(r"gpt-5-mini-\d{4}-\d{2}-\d{2}", response.model)


if __name__ == "__main__":
    asyncio.run(main())

When I run this on a fresh LiteLLM installation, I get routed to gpt-5 twice, even though the second request should be routed to gpt-5-mini

First request used model gpt-5-2025-08-07
Second request used model gpt-5-2025-08-07

I suspect it is because the budget is not properly tracked for routing:

budget_config = BudgetConfig(time_period="1d", budget_limit=0.1)

Relevant log output

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

v1.77.3-stable

Twitter / LinkedIn details

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions