[llm-gateway] Support Tiered Pricing for Gemini Models Based on Prompt Length

## Background

Google Gemini API has introduced tiered pricing for their newer models (Gemini 3 Pro Preview, Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemini 2.5 Flash Lite). The pricing differs based on the prompt length:

- **Standard pricing**: Applied when prompt tokens ≤ 200k tokens
- **Long context pricing**: Applied when prompt tokens > 200k tokens

Reference: https://ai.google.dev/gemini-api/docs/pricing

## Current Status

Our current implementation (`ModelPricing` interface and `PricingRegistry` class) only supports a single pricing tier with two fields:
- `promptPerMTokUsd`: Price per 1M prompt tokens
- `completionPerMTokUsd`: Price per 1M completion tokens

As a conservative approach, we are currently using the **long context pricing** (higher tier) for all requests to avoid underestimating costs.

### Example Pricing (Current Implementation)
- `gemini-3-pro-preview`: $4.00 / $18.00 (always using long context price)
- `gemini-2.5-pro`: $2.50 / $15.00 (always using long context price)
- `gemini-2.5-flash`: $0.30 / $0.90 (always using long context price)

## Problem

Using long context pricing for all requests leads to:
1. **Overestimation of costs** for requests with ≤ 200k prompt tokens
2. **Less competitive pricing** compared to actual API costs
3. **Potential customer dissatisfaction** due to higher than necessary charges

## Proposed Solution

Implement dynamic tiered pricing based on actual prompt token count.

### Implementation Steps

1. **Extend the `ModelPricing` interface** to support tiered pricing:
export interface ModelPricing {
  /** Price per 1M prompt tokens in USD */
  promptPerMTokUsd: number;
  /** Price per 1M completion tokens in USD */
  completionPerMTokUsd: number;
  /** Optional: Price for long context prompts (>200k tokens) */
  promptPerMTokUsdLongContext?: number;
  /** Optional: Completion price for long context prompts */
  completionPerMTokUsdLongContext?: number;
  /** Optional: Threshold for long context pricing (default: 200000) */
  longContextThreshold?: number;
}2. **Update pricing configuration files** to include tiered pricing data:
{
  "gemini-3-pro-preview": {
    "promptPerMTokUsd": 2.00,
    "promptPerMTokUsdLongContext": 4.00,
    "completionPerMTokUsd": 12.00,
    "completionPerMTokUsdLongContext": 18.00,
    "longContextThreshold": 200000
  }
}3. **Modify `calculateProviderCost` method** in `PricingRegistry` class:
calculateProviderCost(provider: string, model: string, usage: UsageInfo): PricingResult | null {
  const pricing = this.getProviderPricing(provider, model);
  if (!pricing) {
    return null;
  }

  const promptTokens = usage.promptTokens || 0;
  const completionTokens = usage.completionTokens || 0;
  
  // Determine if long context pricing applies
  const threshold = pricing.longContextThreshold || 200000;
  const isLongContext = promptTokens > threshold;
  
  // Select appropriate pricing tier
  const promptPrice = isLongContext && pricing.promptPerMTokUsdLongContext 
    ? pricing.promptPerMTokUsdLongContext 
    : pricing.promptPerMTokUsd;
    
  const completionPrice = isLongContext && pricing.completionPerMTokUsdLongContext
    ? pricing.completionPerMTokUsdLongContext
    : pricing.completionPerMTokUsd;

  // Calculate cost
  const promptCost = (promptTokens / TOKENS_PER_MILLION) * promptPrice;
  const completionCost = (completionTokens / TOKENS_PER_MILLION) * completionPrice;
  const totalCost = promptCost + completionCost;

  return {
    costUsd: totalCost,
    source: 'gateway-pricing',
    pricingVersion: `${provider}-${this.getProviderVersion(provider)}`,
    model,
    usage,
  };
}4. **Add validation** to ensure backward compatibility with models that don't have tiered pricing

## Benefits

1. **Accurate cost calculation** based on actual usage
2. **More competitive pricing** for standard-length prompts
3. **Backward compatible** with existing models that don't have tiered pricing
4. **Transparent pricing** that matches provider's pricing model

## Priority

**Medium** - Current workaround (using long context pricing) is conservative and functional, but implementing tiered pricing would provide better cost accuracy and customer experience.

## Affected Files

- `nuwa-services/llm-gateway/src/billing/pricing.ts`
- `nuwa-services/llm-gateway/src/config/pricingConfigLoader.ts`
- `nuwa-services/llm-gateway/src/config/gemini-pricing.json`
- `nuwa-services/llm-gateway/src/config/claude-pricing.json` (if Claude implements tiered pricing in the future)
- `nuwa-services/llm-gateway/src/config/openai-pricing.json` (if OpenAI implements tiered pricing in the future)

## Related

- Gemini API Pricing Documentation: https://ai.google.dev/gemini-api/docs/pricing
- Updated gemini-pricing.json with conservative long context pricing (2025-11-19)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llm-gateway] Support Tiered Pricing for Gemini Models Based on Prompt Length #464

Background

Current Status

Example Pricing (Current Implementation)

Problem

Proposed Solution

Implementation Steps

Benefits

Priority

Affected Files

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[llm-gateway] Support Tiered Pricing for Gemini Models Based on Prompt Length #464

Description

Background

Current Status

Example Pricing (Current Implementation)

Problem

Proposed Solution

Implementation Steps

Benefits

Priority

Affected Files

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions