-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Background
Google Gemini API has introduced tiered pricing for their newer models (Gemini 3 Pro Preview, Gemini 2.5 Pro, Gemini 2.5 Flash, and Gemini 2.5 Flash Lite). The pricing differs based on the prompt length:
- Standard pricing: Applied when prompt tokens ≤ 200k tokens
- Long context pricing: Applied when prompt tokens > 200k tokens
Reference: https://ai.google.dev/gemini-api/docs/pricing
Current Status
Our current implementation (ModelPricing interface and PricingRegistry class) only supports a single pricing tier with two fields:
promptPerMTokUsd: Price per 1M prompt tokenscompletionPerMTokUsd: Price per 1M completion tokens
As a conservative approach, we are currently using the long context pricing (higher tier) for all requests to avoid underestimating costs.
Example Pricing (Current Implementation)
gemini-3-pro-preview: $4.00 / $18.00 (always using long context price)gemini-2.5-pro: $2.50 / $15.00 (always using long context price)gemini-2.5-flash: $0.30 / $0.90 (always using long context price)
Problem
Using long context pricing for all requests leads to:
- Overestimation of costs for requests with ≤ 200k prompt tokens
- Less competitive pricing compared to actual API costs
- Potential customer dissatisfaction due to higher than necessary charges
Proposed Solution
Implement dynamic tiered pricing based on actual prompt token count.
Implementation Steps
- Extend the
ModelPricinginterface to support tiered pricing:
export interface ModelPricing {
/** Price per 1M prompt tokens in USD /
promptPerMTokUsd: number;
/* Price per 1M completion tokens in USD /
completionPerMTokUsd: number;
/* Optional: Price for long context prompts (>200k tokens) /
promptPerMTokUsdLongContext?: number;
/* Optional: Completion price for long context prompts /
completionPerMTokUsdLongContext?: number;
/* Optional: Threshold for long context pricing (default: 200000) */
longContextThreshold?: number;
}2. Update pricing configuration files to include tiered pricing data:
{
"gemini-3-pro-preview": {
"promptPerMTokUsd": 2.00,
"promptPerMTokUsdLongContext": 4.00,
"completionPerMTokUsd": 12.00,
"completionPerMTokUsdLongContext": 18.00,
"longContextThreshold": 200000
}
}3. ModifycalculateProviderCostmethod inPricingRegistryclass:
calculateProviderCost(provider: string, model: string, usage: UsageInfo): PricingResult | null {
const pricing = this.getProviderPricing(provider, model);
if (!pricing) {
return null;
}
const promptTokens = usage.promptTokens || 0;
const completionTokens = usage.completionTokens || 0;
// Determine if long context pricing applies
const threshold = pricing.longContextThreshold || 200000;
const isLongContext = promptTokens > threshold;
// Select appropriate pricing tier
const promptPrice = isLongContext && pricing.promptPerMTokUsdLongContext
? pricing.promptPerMTokUsdLongContext
: pricing.promptPerMTokUsd;
const completionPrice = isLongContext && pricing.completionPerMTokUsdLongContext
? pricing.completionPerMTokUsdLongContext
: pricing.completionPerMTokUsd;
// Calculate cost
const promptCost = (promptTokens / TOKENS_PER_MILLION) * promptPrice;
const completionCost = (completionTokens / TOKENS_PER_MILLION) * completionPrice;
const totalCost = promptCost + completionCost;
return {
costUsd: totalCost,
source: 'gateway-pricing',
pricingVersion: ${provider}-${this.getProviderVersion(provider)},
model,
usage,
};
}4. Add validation to ensure backward compatibility with models that don't have tiered pricing
Benefits
- Accurate cost calculation based on actual usage
- More competitive pricing for standard-length prompts
- Backward compatible with existing models that don't have tiered pricing
- Transparent pricing that matches provider's pricing model
Priority
Medium - Current workaround (using long context pricing) is conservative and functional, but implementing tiered pricing would provide better cost accuracy and customer experience.
Affected Files
nuwa-services/llm-gateway/src/billing/pricing.tsnuwa-services/llm-gateway/src/config/pricingConfigLoader.tsnuwa-services/llm-gateway/src/config/gemini-pricing.jsonnuwa-services/llm-gateway/src/config/claude-pricing.json(if Claude implements tiered pricing in the future)nuwa-services/llm-gateway/src/config/openai-pricing.json(if OpenAI implements tiered pricing in the future)
Related
- Gemini API Pricing Documentation: https://ai.google.dev/gemini-api/docs/pricing
- Updated gemini-pricing.json with conservative long context pricing (2025-11-19)