[feat] add support for custom model pricing in LiteLLMCompletionClient#427
[feat] add support for custom model pricing in LiteLLMCompletionClient#427Mounir-charef wants to merge 4 commits intomasterfrom
Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughThe PR introduces model pricing registration support in the litellm client by extracting Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@edenai_apis/llmengine/clients/litellm_client/litellm_client.py`:
- Around line 204-206: The code is writing request-scoped pricing into the
process-global dict via litellm.model_cost[model_name] = model_pricing which
causes cross-request leakage; remove these direct global assignments and instead
either (A) load a model registry once at startup (use
LITELLM_MODEL_REGISTRY_PATH to point LiteLLM's registry JSON so workers read
immutable pricing), or (B) pass pricing per request into the LiteLLM cost APIs
using custom_cost_per_token and custom_cost_per_second (update the request-path
that currently sets model_pricing to forward those params into the cost
calculation), or if mutation is unavoidable wrap all writes in a global lock and
perform copy-on-write to replace litellm.model_cost atomically; replace every
occurrence that assigns into litellm.model_cost (e.g., the
litellm.model_cost[model_name] = model_pricing sites) with one of these safe
patterns.
- Around line 445-447: The moderation() and amoderation() flows read
kwargs.get("model") raw which prevents normalization and stops lookup of custom
pricing; update both moderation() and amoderation() to normalize the model name
the same way other methods do (derive provider/model format from kwargs or
provider config) and use kwargs.pop("model_pricing", None) as before, then pass
the normalized model name into completion_cost() (or whichever pricing helper is
called) so the litellm pricing registry can find custom model_pricing; ensure
you reference and update the local variables model_pricing and model_name in
both methods and mirror the normalization logic used by completion(),
embedding(), and image_generation() functions.
| # Register custom model pricing in litellm's registry for extended pricing support | ||
| if model_pricing: | ||
| litellm.model_cost[model_name] = model_pricing |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Step 1: Verify all direct assignments to litellm.model_cost in the file
echo "=== Direct assignments to litellm.model_cost ==="
rg -n "litellm\.model_cost\[" edenai_apis/llmengine/clients/litellm_client/litellm_client.py
# Step 2: Check for cleanup operations (pop, del, etc.)
echo ""
echo "=== Cleanup operations on litellm.model_cost ==="
rg -n "model_cost\.pop|del.*litellm\.model_cost" edenai_apis/llmengine/clients/litellm_client/litellm_client.py
# Step 3: Get file size and examine structure
echo ""
echo "=== File structure ==="
wc -l edenai_apis/llmengine/clients/litellm_client/litellm_client.pyRepository: edenai/edenai-apis
Length of output: 788
🌐 Web query:
LiteLLM litellm.model_cost thread-safe concurrent requests recommended approach per-request pricing registry
💡 Result:
litellm.model_cost is effectively a process-global pricing map (a Python dict) and register_model() mutates/overrides entries in that dict. So for thread-safe / highly-concurrent apps, the recommended pattern is: don’t mutate litellm.model_cost during live request handling—treat it as read-mostly and initialize it once. LiteLLM documents model_cost + register_model() for registering/overriding pricing. [1]
Recommended approaches (safe with concurrent requests)
- Initialize a pricing registry at startup (best)
- Put your pricing into a model registry JSON and point LiteLLM at it (so every worker loads the same registry, and requests only read from it). This is explicitly recommended for adding/overriding missing model cost data. [2]
- Common mechanism:
LITELLM_MODEL_REGISTRY_PATH→ custom registry JSON. [2]
- If you’re using LiteLLM Proxy: set pricing in config (not in code)
- Configure per-model pricing via
model_info: input_cost_per_token / output_cost_per_token(and optionallybase_modelfor pricing lookup when the provider returns a different model id). This avoids runtime mutation and is the documented customization path. [3]
- Per-request pricing (multi-tenant / negotiated rates)
- Instead of editing
litellm.model_cost, pass pricing into the cost calculation call for that request, e.g.custom_cost_per_token(and for some flows,custom_cost_per_second). The cost calculator API supports this override pattern. [4]- Practical pattern: keep your own pricing registry (e.g., keyed by tenant + model), and for each request pass the correct
custom_cost_per_tokenwhen computing cost, rather than callingregister_model()mid-flight.
- Practical pattern: keep your own pricing registry (e.g., keyed by tenant + model), and for each request pass the correct
If you must change pricing dynamically
- Do it behind a global lock and prefer copy-on-write (build a new dict / update via a single controlled code path), because concurrent readers may observe partial updates if you mutate shared dict state during requests (Python doesn’t guarantee your “business-level atomicity” even if individual bytecode ops are protected by the GIL).
One gotcha to watch
- There have been cases where “custom pricing in router/proxy config shows up but isn’t applied in spend tracking” depending on the routing/model-id used; if you see $0 or null spend, check your LiteLLM version and whether pricing lookup is happening on the expected model key (
base_modelcan matter). [5]
Sources: [1] model_cost / register_model docs [2] model registry + LITELLM_MODEL_REGISTRY_PATH guidance [3] proxy custom pricing + base_model [4] cost calculator supports custom_cost_per_token [5] reported custom pricing application bug context. [1][2][3][4][5]
Global registry mutation without synchronization causes data leaks across concurrent requests.
Lines 206, 310, 402, 459, 496, 571, 668, 867 all directly write request-scoped pricing into litellm.model_cost, a process-global dict. Concurrent requests can overwrite each other's pricing, and entries persist across request boundaries. LiteLLM documentation explicitly identifies this pattern as unsafe for concurrent applications.
LiteLLM provides three documented safe alternatives:
-
Initialize pricing at startup (recommended for concurrent code): Use a model registry JSON file pointed to by
LITELLM_MODEL_REGISTRY_PATHso all workers load pricing once and requests only read from it—no runtime mutations. -
Use per-request pricing parameter: Pass
custom_cost_per_token(andcustom_cost_per_secondwhere applicable) into the cost calculation call for each request. This avoids global mutation entirely and keeps pricing scoped to the specific request context. -
If runtime mutation is unavoidable: Protect all mutations with a global lock and use copy-on-write patterns, since concurrent readers may observe partial updates from dict mutations even within the GIL.
Remove direct litellm.model_cost[model_name] = model_pricing assignments and migrate to one of the documented patterns above.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@edenai_apis/llmengine/clients/litellm_client/litellm_client.py` around lines
204 - 206, The code is writing request-scoped pricing into the process-global
dict via litellm.model_cost[model_name] = model_pricing which causes
cross-request leakage; remove these direct global assignments and instead either
(A) load a model registry once at startup (use LITELLM_MODEL_REGISTRY_PATH to
point LiteLLM's registry JSON so workers read immutable pricing), or (B) pass
pricing per request into the LiteLLM cost APIs using custom_cost_per_token and
custom_cost_per_second (update the request-path that currently sets
model_pricing to forward those params into the cost calculation), or if mutation
is unavoidable wrap all writes in a global lock and perform copy-on-write to
replace litellm.model_cost atomically; replace every occurrence that assigns
into litellm.model_cost (e.g., the litellm.model_cost[model_name] =
model_pricing sites) with one of these safe patterns.
| # See if there's custom pricing (model_pricing for extended pricing, or legacy per-token pricing) | ||
| model_pricing = kwargs.pop("model_pricing", None) | ||
| model_name = kwargs.get("model") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "litellm_client.py" | head -5Repository: edenai/edenai-apis
Length of output: 126
🏁 Script executed:
wc -l edenai_apis/llmengine/clients/litellm_client/litellm_client.pyRepository: edenai/edenai-apis
Length of output: 129
🏁 Script executed:
sed -n '440,505p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 3734
🏁 Script executed:
sed -n '475,510p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 2082
🏁 Script executed:
sed -n '1,50p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 1702
🏁 Script executed:
sed -n '475,515p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 2260
🏁 Script executed:
grep -n "provider_name" edenai_apis/llmengine/clients/litellm_client/litellm_client.py | head -20Repository: edenai/edenai-apis
Length of output: 1015
🏁 Script executed:
grep -n "def " edenai_apis/llmengine/clients/litellm_client/litellm_client.py | grep -E "(moderation|completion|embedding|image)"Repository: edenai/edenai-apis
Length of output: 346
🏁 Script executed:
sed -n '442,520p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 4330
🏁 Script executed:
sed -n '78,130p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 2862
🏁 Script executed:
sed -n '258,310p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 2841
🏁 Script executed:
sed -n '613,680p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 3493
🏁 Script executed:
rg -n "\.moderation\(" --type=py -A 3 | head -50Repository: edenai/edenai-apis
Length of output: 688
🏁 Script executed:
rg -n "\.amoderation\(" --type=py -A 3 | head -50Repository: edenai/edenai-apis
Length of output: 691
🏁 Script executed:
sed -n '795,820p' edenai_apis/llmengine/llm_engine.py | cat -nRepository: edenai/edenai-apis
Length of output: 1468
🏁 Script executed:
sed -n '830,860p' edenai_apis/llmengine/llm_engine.py | cat -nRepository: edenai/edenai-apis
Length of output: 1669
🏁 Script executed:
sed -n '55,65p' edenai_apis/apis/openai/openai_text_api.py | cat -nRepository: edenai/edenai-apis
Length of output: 606
🏁 Script executed:
rg -n "provider_config" edenai_apis/llmengine/llm_engine.py | head -20Repository: edenai/edenai-apis
Length of output: 476
🏁 Script executed:
sed -n '1,100p' edenai_apis/llmengine/llm_engine.py | cat -nRepository: edenai/edenai-apis
Length of output: 3873
🏁 Script executed:
sed -n '90,130p' edenai_apis/llmengine/llm_engine.py | cat -nRepository: edenai/edenai-apis
Length of output: 1900
🏁 Script executed:
rg -n "provider_config.*model" edenai_apis/llmengine/llm_engine.pyRepository: edenai/edenai-apis
Length of output: 44
🏁 Script executed:
grep -r "moderation(" edenai_apis/apis --type=py -B 5 -A 5 | head -80Repository: edenai/edenai-apis
Length of output: 182
🏁 Script executed:
grep -r "moderation(" edenai_apis/apis -A 2 | head -60Repository: edenai/edenai-apis
Length of output: 3581
🏁 Script executed:
sed -n '795,810p' edenai_apis/llmengine/llm_engine.py | cat -nRepository: edenai/edenai-apis
Length of output: 933
🏁 Script executed:
sed -n '515,575p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 3035
🏁 Script executed:
rg "completion_cost" edenai_apis/llmengine/clients/litellm_client/litellm_client.py -B 3 -A 3Repository: edenai/edenai-apis
Length of output: 3015
🏁 Script executed:
python3 << 'EOF'
# Check if litellm normalizes model names
import json
# Show what we know about the methods
methods = {
"completion": "line 121: model_name = f'{self.provider_name}/{model}'",
"embedding": "line 22 in call_params: f'{self.provider_name}/{model}'",
"image_generation": "line 33: model_name = f'{self.provider_name}/{model}'",
"moderation": "line 447/8: model_name = kwargs.get('model')",
"amoderation": "line 479/45: model_name = kwargs.get('model')"
}
print("Model name normalization pattern:")
for method, pattern in methods.items():
print(f" {method}: {pattern}")
EOFRepository: edenai/edenai-apis
Length of output: 427
🏁 Script executed:
sed -n '739,800p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 3273
🏁 Script executed:
python3 << 'EOF'
# Analyze the issue: how completion_cost determines the model to use
print("=== PRICING REGISTRATION ANALYSIS ===\n")
print("Method: completion() (line 78-150)")
print(" Line 121: model_name = f\"{self.provider_name}/{model}\"")
print(" Line 47: call_params[\"model\"] = model_name # normalized")
print(" Pricing registration: litellm.model_cost[model_name] = model_pricing")
print(" Cost calculation: completion_cost(**cost_calc_params)")
print(" - cost_calc_params has 'model': model_name (normalized)")
print()
print("Method: embedding() (line 258-315)")
print(" Line 22: call_params[\"model\"] = f\"{self.provider_name}/{model}\"")
print(" Line 40: model_name = call_params[\"model\"] # already normalized")
print(" Pricing registration: litellm.model_cost[model_name] = model_pricing")
print(" Cost calculation: completion_cost(**cost_calc_params)")
print(" - cost_calc_params likely includes normalized model")
print()
print("Method: moderation() (line 442-476)")
print(" Line 448: model_name = kwargs.get(\"model\") # NOT normalized")
print(" Pricing registration: litellm.model_cost[model_name] = model_pricing")
print(" Cost calculation: completion_cost(**cost_calc_params)")
print(" - cost_calc_params DOES NOT include model explicitly!")
print()
print("ISSUE: moderation() doesn't normalize model_name, and...")
print(" doesn't pass model to completion_cost() for lookup")
print(" This could cause pricing lookup failure if litellm expects")
print(" provider/model format in its internal registry.")
EOFRepository: edenai/edenai-apis
Length of output: 1312
🏁 Script executed:
sed -n '442,477p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | grep -A 20 "cost_calc_params"Repository: edenai/edenai-apis
Length of output: 732
🏁 Script executed:
sed -n '613,680p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | grep -A 10 "cost_calc_params"Repository: edenai/edenai-apis
Length of output: 485
🏁 Script executed:
sed -n '78,150p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | grep -A 10 "cost_calc_params"Repository: edenai/edenai-apis
Length of output: 44
🏁 Script executed:
sed -n '78,160p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 4382
🏁 Script executed:
rg "cost_calc_params\[" edenai_apis/llmengine/clients/litellm_client/litellm_client.py -A 2 -B 2Repository: edenai/edenai-apis
Length of output: 2670
🏁 Script executed:
sed -n '126,145p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -nRepository: edenai/edenai-apis
Length of output: 1027
🏁 Script executed:
python3 << 'EOF'
# Comprehensive summary of the findings
print("=== VERIFICATION SUMMARY ===\n")
print("NORMALIZATION PATTERN ACROSS METHODS:")
print("-" * 50)
methods = {
"completion": ("line 121", "f\"{self.provider_name}/{model}\"", "YES"),
"acompletion": ("line 821", "f\"{self.provider_name}/{model}\"", "YES"),
"embedding": ("call_params", "f\"{self.provider_name}/{model}\"", "YES"),
"aembedding": ("line 376", "f\"{self.provider_name}/{model}\"", "YES"),
"image_generation": ("line 645", "f\"{self.provider_name}/{model}\"", "YES"),
"aimage_generation": ("line 548", "f\"{self.provider_name}/{model}\"", "YES"),
"moderation": ("line 448", "kwargs.get(\"model\")", "NO"),
"amoderation": ("line 484", "kwargs.get(\"model\")", "NO"),
}
for method, (location, code, normalized) in methods.items():
status = "✓ NORMALIZED" if normalized == "YES" else "✗ NOT NORMALIZED"
print(f"{method:20} {location:20} {status}")
print("\nCOST CALCULATION PARAMS:")
print("-" * 50)
print("completion(): includes 'model': model_name")
print("embedding(): includes 'model': model_name")
print("image_generation(): includes 'model': model_name")
print("moderation(): MISSING 'model' parameter")
print("amoderation(): MISSING 'model' parameter")
print("\nCRITICAL ISSUES FOUND:")
print("-" * 50)
print("1. moderation() uses bare model from kwargs, no normalization")
print("2. amoderation() uses bare model from kwargs, no normalization")
print("3. Both methods don't pass model to completion_cost()")
print("4. Custom pricing is registered with non-normalized key")
print("5. completion_cost() cannot find pricing without model parameter")
EOFRepository: edenai/edenai-apis
Length of output: 1324
🏁 Script executed:
sed -n '350,380p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | grep -A 15 "cost_calc_params"Repository: edenai/edenai-apis
Length of output: 44
Moderation methods lack model name normalization, causing custom pricing to be silently missed.
The moderation() and amoderation() methods use kwargs.get("model") directly without normalizing to provider/model format, inconsistent with all other methods in this class (completion, embedding, image_generation, etc.). Additionally, neither method passes the model name to completion_cost(), preventing litellm's pricing registry lookup even if custom pricing was registered. If callers pass bare model names (as they do via provider_config in llm_engine), custom pricing will fail silently without raising an error.
Normalize model names in both methods:
Fix
def moderation(self, input: str, **kwargs):
call_params = {}
call_params["input"] = input
model_pricing = kwargs.pop("model_pricing", None)
model_name = kwargs.get("model")
+ if model_name and self.provider_name and "/" not in model_name:
+ model_name = f"{self.provider_name}/{model_name}"
# ... rest of method
cost_calc_params = {
"completion_response": response,
"call_type": "moderation",
+ "model": model_name,
}Apply the same fix to amoderation() at lines 479–514.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@edenai_apis/llmengine/clients/litellm_client/litellm_client.py` around lines
445 - 447, The moderation() and amoderation() flows read kwargs.get("model") raw
which prevents normalization and stops lookup of custom pricing; update both
moderation() and amoderation() to normalize the model name the same way other
methods do (derive provider/model format from kwargs or provider config) and use
kwargs.pop("model_pricing", None) as before, then pass the normalized model name
into completion_cost() (or whichever pricing helper is called) so the litellm
pricing registry can find custom model_pricing; ensure you reference and update
the local variables model_pricing and model_name in both methods and mirror the
normalization logic used by completion(), embedding(), and image_generation()
functions.
fd6eee0 to
dc03217
Compare
…t and refactor calculate_cost function
dc03217 to
96280db
Compare
…pricing lookup and adjust acompute_output to yield raw chunks
Summary by CodeRabbit