Skip to content

[llm-gateway] Support Tiered Pricing for Gemini Models Based on Prompt Length #464

@jolestar

Description

@jolestar

Background

Google Gemini API has introduced tiered pricing for their newer models (Gemini 3 Pro Preview, Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemini 2.5 Flash Lite). The pricing differs based on the prompt length:

  • Standard pricing: Applied when prompt tokens ≤ 200k tokens
  • Long context pricing: Applied when prompt tokens > 200k tokens

Reference: https://ai.google.dev/gemini-api/docs/pricing

Current Status

Our current implementation (ModelPricing interface and PricingRegistry class) only supports a single pricing tier with two fields:

  • promptPerMTokUsd: Price per 1M prompt tokens
  • completionPerMTokUsd: Price per 1M completion tokens

As a conservative approach, we are currently using the long context pricing (higher tier) for all requests to avoid underestimating costs.

Example Pricing (Current Implementation)

  • gemini-3-pro-preview: $4.00 / $18.00 (always using long context price)
  • gemini-2.5-pro: $2.50 / $15.00 (always using long context price)
  • gemini-2.5-flash: $0.30 / $0.90 (always using long context price)

Problem

Using long context pricing for all requests leads to:

  1. Overestimation of costs for requests with ≤ 200k prompt tokens
  2. Less competitive pricing compared to actual API costs
  3. Potential customer dissatisfaction due to higher than necessary charges

Proposed Solution

Implement dynamic tiered pricing based on actual prompt token count.

Implementation Steps

  1. Extend the ModelPricing interface to support tiered pricing:
    export interface ModelPricing {
    /** Price per 1M prompt tokens in USD /
    promptPerMTokUsd: number;
    /
    * Price per 1M completion tokens in USD /
    completionPerMTokUsd: number;
    /
    * Optional: Price for long context prompts (>200k tokens) /
    promptPerMTokUsdLongContext?: number;
    /
    * Optional: Completion price for long context prompts /
    completionPerMTokUsdLongContext?: number;
    /
    * Optional: Threshold for long context pricing (default: 200000) */
    longContextThreshold?: number;
    }2. Update pricing configuration files to include tiered pricing data:
    {
    "gemini-3-pro-preview": {
    "promptPerMTokUsd": 2.00,
    "promptPerMTokUsdLongContext": 4.00,
    "completionPerMTokUsd": 12.00,
    "completionPerMTokUsdLongContext": 18.00,
    "longContextThreshold": 200000
    }
    }3. Modify calculateProviderCost method in PricingRegistry class:
    calculateProviderCost(provider: string, model: string, usage: UsageInfo): PricingResult | null {
    const pricing = this.getProviderPricing(provider, model);
    if (!pricing) {
    return null;
    }

const promptTokens = usage.promptTokens || 0;
const completionTokens = usage.completionTokens || 0;

// Determine if long context pricing applies
const threshold = pricing.longContextThreshold || 200000;
const isLongContext = promptTokens > threshold;

// Select appropriate pricing tier
const promptPrice = isLongContext && pricing.promptPerMTokUsdLongContext
? pricing.promptPerMTokUsdLongContext
: pricing.promptPerMTokUsd;

const completionPrice = isLongContext && pricing.completionPerMTokUsdLongContext
? pricing.completionPerMTokUsdLongContext
: pricing.completionPerMTokUsd;

// Calculate cost
const promptCost = (promptTokens / TOKENS_PER_MILLION) * promptPrice;
const completionCost = (completionTokens / TOKENS_PER_MILLION) * completionPrice;
const totalCost = promptCost + completionCost;

return {
costUsd: totalCost,
source: 'gateway-pricing',
pricingVersion: ${provider}-${this.getProviderVersion(provider)},
model,
usage,
};
}4. Add validation to ensure backward compatibility with models that don't have tiered pricing

Benefits

  1. Accurate cost calculation based on actual usage
  2. More competitive pricing for standard-length prompts
  3. Backward compatible with existing models that don't have tiered pricing
  4. Transparent pricing that matches provider's pricing model

Priority

Medium - Current workaround (using long context pricing) is conservative and functional, but implementing tiered pricing would provide better cost accuracy and customer experience.

Affected Files

  • nuwa-services/llm-gateway/src/billing/pricing.ts
  • nuwa-services/llm-gateway/src/config/pricingConfigLoader.ts
  • nuwa-services/llm-gateway/src/config/gemini-pricing.json
  • nuwa-services/llm-gateway/src/config/claude-pricing.json (if Claude implements tiered pricing in the future)
  • nuwa-services/llm-gateway/src/config/openai-pricing.json (if OpenAI implements tiered pricing in the future)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions