Skip to content

[feat] add support for custom model pricing in LiteLLMCompletionClient#427

Draft
Mounir-charef wants to merge 4 commits intomasterfrom
feat/model_onboarding
Draft

[feat] add support for custom model pricing in LiteLLMCompletionClient#427
Mounir-charef wants to merge 4 commits intomasterfrom
feat/model_onboarding

Conversation

@Mounir-charef
Copy link
Contributor

@Mounir-charef Mounir-charef commented Feb 25, 2026

Summary by CodeRabbit

  • New Features
    • Added support for custom model pricing configuration across AI operations (completions, embeddings, image generation, and moderation). Cost calculations now leverage custom pricing when provided while maintaining backward compatibility with existing pricing methods.

@coderabbitai
Copy link

coderabbitai bot commented Feb 25, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

The PR introduces model pricing registration support in the litellm client by extracting model_pricing from kwargs, registering it in litellm's cost registry when provided, and preferring registry-based pricing in cost calculations while falling back to legacy per-token pricing when unavailable.

Changes

Cohort / File(s) Summary
Model Pricing Registration
edenai_apis/llmengine/clients/litellm_client/litellm_client.py
Extends pricing logic across completion, embedding, image_generation, and moderation methods (both sync and async variants) to extract model_pricing from kwargs, register it in litellm's registry, and prioritize registry-based pricing in cost calculations.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 Hops through the code with pricing precision,
Registry bound, a swift decision!
Sync and async, they dance in the light,
Extended costs registered, crystal bright! 🌙✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding support for custom model pricing in the LiteLLMCompletionClient, which aligns with the detailed diff showing model_pricing mechanism additions across multiple methods.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/model_onboarding

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@edenai_apis/llmengine/clients/litellm_client/litellm_client.py`:
- Around line 204-206: The code is writing request-scoped pricing into the
process-global dict via litellm.model_cost[model_name] = model_pricing which
causes cross-request leakage; remove these direct global assignments and instead
either (A) load a model registry once at startup (use
LITELLM_MODEL_REGISTRY_PATH to point LiteLLM's registry JSON so workers read
immutable pricing), or (B) pass pricing per request into the LiteLLM cost APIs
using custom_cost_per_token and custom_cost_per_second (update the request-path
that currently sets model_pricing to forward those params into the cost
calculation), or if mutation is unavoidable wrap all writes in a global lock and
perform copy-on-write to replace litellm.model_cost atomically; replace every
occurrence that assigns into litellm.model_cost (e.g., the
litellm.model_cost[model_name] = model_pricing sites) with one of these safe
patterns.
- Around line 445-447: The moderation() and amoderation() flows read
kwargs.get("model") raw which prevents normalization and stops lookup of custom
pricing; update both moderation() and amoderation() to normalize the model name
the same way other methods do (derive provider/model format from kwargs or
provider config) and use kwargs.pop("model_pricing", None) as before, then pass
the normalized model name into completion_cost() (or whichever pricing helper is
called) so the litellm pricing registry can find custom model_pricing; ensure
you reference and update the local variables model_pricing and model_name in
both methods and mirror the normalization logic used by completion(),
embedding(), and image_generation() functions.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3365a07 and d33c751.

📒 Files selected for processing (1)
  • edenai_apis/llmengine/clients/litellm_client/litellm_client.py

Comment on lines 204 to 206
# Register custom model pricing in litellm's registry for extended pricing support
if model_pricing:
litellm.model_cost[model_name] = model_pricing
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Step 1: Verify all direct assignments to litellm.model_cost in the file
echo "=== Direct assignments to litellm.model_cost ==="
rg -n "litellm\.model_cost\[" edenai_apis/llmengine/clients/litellm_client/litellm_client.py

# Step 2: Check for cleanup operations (pop, del, etc.)
echo ""
echo "=== Cleanup operations on litellm.model_cost ==="
rg -n "model_cost\.pop|del.*litellm\.model_cost" edenai_apis/llmengine/clients/litellm_client/litellm_client.py

# Step 3: Get file size and examine structure
echo ""
echo "=== File structure ==="
wc -l edenai_apis/llmengine/clients/litellm_client/litellm_client.py

Repository: edenai/edenai-apis

Length of output: 788


🌐 Web query:

LiteLLM litellm.model_cost thread-safe concurrent requests recommended approach per-request pricing registry

💡 Result:

litellm.model_cost is effectively a process-global pricing map (a Python dict) and register_model() mutates/overrides entries in that dict. So for thread-safe / highly-concurrent apps, the recommended pattern is: don’t mutate litellm.model_cost during live request handling—treat it as read-mostly and initialize it once. LiteLLM documents model_cost + register_model() for registering/overriding pricing. [1]

Recommended approaches (safe with concurrent requests)

  1. Initialize a pricing registry at startup (best)
  • Put your pricing into a model registry JSON and point LiteLLM at it (so every worker loads the same registry, and requests only read from it). This is explicitly recommended for adding/overriding missing model cost data. [2]
  • Common mechanism: LITELLM_MODEL_REGISTRY_PATH → custom registry JSON. [2]
  1. If you’re using LiteLLM Proxy: set pricing in config (not in code)
  • Configure per-model pricing via model_info: input_cost_per_token / output_cost_per_token (and optionally base_model for pricing lookup when the provider returns a different model id). This avoids runtime mutation and is the documented customization path. [3]
  1. Per-request pricing (multi-tenant / negotiated rates)
  • Instead of editing litellm.model_cost, pass pricing into the cost calculation call for that request, e.g. custom_cost_per_token (and for some flows, custom_cost_per_second). The cost calculator API supports this override pattern. [4]
    • Practical pattern: keep your own pricing registry (e.g., keyed by tenant + model), and for each request pass the correct custom_cost_per_token when computing cost, rather than calling register_model() mid-flight.

If you must change pricing dynamically

  • Do it behind a global lock and prefer copy-on-write (build a new dict / update via a single controlled code path), because concurrent readers may observe partial updates if you mutate shared dict state during requests (Python doesn’t guarantee your “business-level atomicity” even if individual bytecode ops are protected by the GIL).

One gotcha to watch

  • There have been cases where “custom pricing in router/proxy config shows up but isn’t applied in spend tracking” depending on the routing/model-id used; if you see $0 or null spend, check your LiteLLM version and whether pricing lookup is happening on the expected model key (base_model can matter). [5]

Sources: [1] model_cost / register_model docs [2] model registry + LITELLM_MODEL_REGISTRY_PATH guidance [3] proxy custom pricing + base_model [4] cost calculator supports custom_cost_per_token [5] reported custom pricing application bug context. [1][2][3][4][5]


Global registry mutation without synchronization causes data leaks across concurrent requests.

Lines 206, 310, 402, 459, 496, 571, 668, 867 all directly write request-scoped pricing into litellm.model_cost, a process-global dict. Concurrent requests can overwrite each other's pricing, and entries persist across request boundaries. LiteLLM documentation explicitly identifies this pattern as unsafe for concurrent applications.

LiteLLM provides three documented safe alternatives:

  1. Initialize pricing at startup (recommended for concurrent code): Use a model registry JSON file pointed to by LITELLM_MODEL_REGISTRY_PATH so all workers load pricing once and requests only read from it—no runtime mutations.

  2. Use per-request pricing parameter: Pass custom_cost_per_token (and custom_cost_per_second where applicable) into the cost calculation call for each request. This avoids global mutation entirely and keeps pricing scoped to the specific request context.

  3. If runtime mutation is unavoidable: Protect all mutations with a global lock and use copy-on-write patterns, since concurrent readers may observe partial updates from dict mutations even within the GIL.

Remove direct litellm.model_cost[model_name] = model_pricing assignments and migrate to one of the documented patterns above.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@edenai_apis/llmengine/clients/litellm_client/litellm_client.py` around lines
204 - 206, The code is writing request-scoped pricing into the process-global
dict via litellm.model_cost[model_name] = model_pricing which causes
cross-request leakage; remove these direct global assignments and instead either
(A) load a model registry once at startup (use LITELLM_MODEL_REGISTRY_PATH to
point LiteLLM's registry JSON so workers read immutable pricing), or (B) pass
pricing per request into the LiteLLM cost APIs using custom_cost_per_token and
custom_cost_per_second (update the request-path that currently sets
model_pricing to forward those params into the cost calculation), or if mutation
is unavoidable wrap all writes in a global lock and perform copy-on-write to
replace litellm.model_cost atomically; replace every occurrence that assigns
into litellm.model_cost (e.g., the litellm.model_cost[model_name] =
model_pricing sites) with one of these safe patterns.

Comment on lines 445 to 447
# See if there's custom pricing (model_pricing for extended pricing, or legacy per-token pricing)
model_pricing = kwargs.pop("model_pricing", None)
model_name = kwargs.get("model")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "litellm_client.py" | head -5

Repository: edenai/edenai-apis

Length of output: 126


🏁 Script executed:

wc -l edenai_apis/llmengine/clients/litellm_client/litellm_client.py

Repository: edenai/edenai-apis

Length of output: 129


🏁 Script executed:

sed -n '440,505p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 3734


🏁 Script executed:

sed -n '475,510p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 2082


🏁 Script executed:

sed -n '1,50p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 1702


🏁 Script executed:

sed -n '475,515p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 2260


🏁 Script executed:

grep -n "provider_name" edenai_apis/llmengine/clients/litellm_client/litellm_client.py | head -20

Repository: edenai/edenai-apis

Length of output: 1015


🏁 Script executed:

grep -n "def " edenai_apis/llmengine/clients/litellm_client/litellm_client.py | grep -E "(moderation|completion|embedding|image)"

Repository: edenai/edenai-apis

Length of output: 346


🏁 Script executed:

sed -n '442,520p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 4330


🏁 Script executed:

sed -n '78,130p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 2862


🏁 Script executed:

sed -n '258,310p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 2841


🏁 Script executed:

sed -n '613,680p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 3493


🏁 Script executed:

rg -n "\.moderation\(" --type=py -A 3 | head -50

Repository: edenai/edenai-apis

Length of output: 688


🏁 Script executed:

rg -n "\.amoderation\(" --type=py -A 3 | head -50

Repository: edenai/edenai-apis

Length of output: 691


🏁 Script executed:

sed -n '795,820p' edenai_apis/llmengine/llm_engine.py | cat -n

Repository: edenai/edenai-apis

Length of output: 1468


🏁 Script executed:

sed -n '830,860p' edenai_apis/llmengine/llm_engine.py | cat -n

Repository: edenai/edenai-apis

Length of output: 1669


🏁 Script executed:

sed -n '55,65p' edenai_apis/apis/openai/openai_text_api.py | cat -n

Repository: edenai/edenai-apis

Length of output: 606


🏁 Script executed:

rg -n "provider_config" edenai_apis/llmengine/llm_engine.py | head -20

Repository: edenai/edenai-apis

Length of output: 476


🏁 Script executed:

sed -n '1,100p' edenai_apis/llmengine/llm_engine.py | cat -n

Repository: edenai/edenai-apis

Length of output: 3873


🏁 Script executed:

sed -n '90,130p' edenai_apis/llmengine/llm_engine.py | cat -n

Repository: edenai/edenai-apis

Length of output: 1900


🏁 Script executed:

rg -n "provider_config.*model" edenai_apis/llmengine/llm_engine.py

Repository: edenai/edenai-apis

Length of output: 44


🏁 Script executed:

grep -r "moderation(" edenai_apis/apis --type=py -B 5 -A 5 | head -80

Repository: edenai/edenai-apis

Length of output: 182


🏁 Script executed:

grep -r "moderation(" edenai_apis/apis -A 2 | head -60

Repository: edenai/edenai-apis

Length of output: 3581


🏁 Script executed:

sed -n '795,810p' edenai_apis/llmengine/llm_engine.py | cat -n

Repository: edenai/edenai-apis

Length of output: 933


🏁 Script executed:

sed -n '515,575p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 3035


🏁 Script executed:

rg "completion_cost" edenai_apis/llmengine/clients/litellm_client/litellm_client.py -B 3 -A 3

Repository: edenai/edenai-apis

Length of output: 3015


🏁 Script executed:

python3 << 'EOF'
# Check if litellm normalizes model names
import json

# Show what we know about the methods
methods = {
    "completion": "line 121: model_name = f'{self.provider_name}/{model}'",
    "embedding": "line 22 in call_params: f'{self.provider_name}/{model}'",
    "image_generation": "line 33: model_name = f'{self.provider_name}/{model}'",
    "moderation": "line 447/8: model_name = kwargs.get('model')",
    "amoderation": "line 479/45: model_name = kwargs.get('model')"
}

print("Model name normalization pattern:")
for method, pattern in methods.items():
    print(f"  {method}: {pattern}")
EOF

Repository: edenai/edenai-apis

Length of output: 427


🏁 Script executed:

sed -n '739,800p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 3273


🏁 Script executed:

python3 << 'EOF'
# Analyze the issue: how completion_cost determines the model to use

print("=== PRICING REGISTRATION ANALYSIS ===\n")

print("Method: completion() (line 78-150)")
print("  Line 121: model_name = f\"{self.provider_name}/{model}\"")
print("  Line 47: call_params[\"model\"] = model_name  # normalized")
print("  Pricing registration: litellm.model_cost[model_name] = model_pricing")
print("  Cost calculation: completion_cost(**cost_calc_params)")
print("    - cost_calc_params has 'model': model_name (normalized)")
print()

print("Method: embedding() (line 258-315)")
print("  Line 22: call_params[\"model\"] = f\"{self.provider_name}/{model}\"")
print("  Line 40: model_name = call_params[\"model\"]  # already normalized")
print("  Pricing registration: litellm.model_cost[model_name] = model_pricing")
print("  Cost calculation: completion_cost(**cost_calc_params)")
print("    - cost_calc_params likely includes normalized model")
print()

print("Method: moderation() (line 442-476)")
print("  Line 448: model_name = kwargs.get(\"model\")  # NOT normalized")
print("  Pricing registration: litellm.model_cost[model_name] = model_pricing")
print("  Cost calculation: completion_cost(**cost_calc_params)")
print("    - cost_calc_params DOES NOT include model explicitly!")
print()

print("ISSUE: moderation() doesn't normalize model_name, and...")
print("       doesn't pass model to completion_cost() for lookup")
print("       This could cause pricing lookup failure if litellm expects")
print("       provider/model format in its internal registry.")
EOF

Repository: edenai/edenai-apis

Length of output: 1312


🏁 Script executed:

sed -n '442,477p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | grep -A 20 "cost_calc_params"

Repository: edenai/edenai-apis

Length of output: 732


🏁 Script executed:

sed -n '613,680p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | grep -A 10 "cost_calc_params"

Repository: edenai/edenai-apis

Length of output: 485


🏁 Script executed:

sed -n '78,150p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | grep -A 10 "cost_calc_params"

Repository: edenai/edenai-apis

Length of output: 44


🏁 Script executed:

sed -n '78,160p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 4382


🏁 Script executed:

rg "cost_calc_params\[" edenai_apis/llmengine/clients/litellm_client/litellm_client.py -A 2 -B 2

Repository: edenai/edenai-apis

Length of output: 2670


🏁 Script executed:

sed -n '126,145p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | cat -n

Repository: edenai/edenai-apis

Length of output: 1027


🏁 Script executed:

python3 << 'EOF'
# Comprehensive summary of the findings
print("=== VERIFICATION SUMMARY ===\n")

print("NORMALIZATION PATTERN ACROSS METHODS:")
print("-" * 50)
methods = {
    "completion": ("line 121", "f\"{self.provider_name}/{model}\"", "YES"),
    "acompletion": ("line 821", "f\"{self.provider_name}/{model}\"", "YES"),
    "embedding": ("call_params", "f\"{self.provider_name}/{model}\"", "YES"),
    "aembedding": ("line 376", "f\"{self.provider_name}/{model}\"", "YES"),
    "image_generation": ("line 645", "f\"{self.provider_name}/{model}\"", "YES"),
    "aimage_generation": ("line 548", "f\"{self.provider_name}/{model}\"", "YES"),
    "moderation": ("line 448", "kwargs.get(\"model\")", "NO"),
    "amoderation": ("line 484", "kwargs.get(\"model\")", "NO"),
}

for method, (location, code, normalized) in methods.items():
    status = "✓ NORMALIZED" if normalized == "YES" else "✗ NOT NORMALIZED"
    print(f"{method:20} {location:20} {status}")

print("\nCOST CALCULATION PARAMS:")
print("-" * 50)
print("completion():        includes 'model': model_name")
print("embedding():         includes 'model': model_name")
print("image_generation():  includes 'model': model_name")
print("moderation():        MISSING 'model' parameter")
print("amoderation():       MISSING 'model' parameter")

print("\nCRITICAL ISSUES FOUND:")
print("-" * 50)
print("1. moderation() uses bare model from kwargs, no normalization")
print("2. amoderation() uses bare model from kwargs, no normalization")
print("3. Both methods don't pass model to completion_cost()")
print("4. Custom pricing is registered with non-normalized key")
print("5. completion_cost() cannot find pricing without model parameter")
EOF

Repository: edenai/edenai-apis

Length of output: 1324


🏁 Script executed:

sed -n '350,380p' edenai_apis/llmengine/clients/litellm_client/litellm_client.py | grep -A 15 "cost_calc_params"

Repository: edenai/edenai-apis

Length of output: 44


Moderation methods lack model name normalization, causing custom pricing to be silently missed.

The moderation() and amoderation() methods use kwargs.get("model") directly without normalizing to provider/model format, inconsistent with all other methods in this class (completion, embedding, image_generation, etc.). Additionally, neither method passes the model name to completion_cost(), preventing litellm's pricing registry lookup even if custom pricing was registered. If callers pass bare model names (as they do via provider_config in llm_engine), custom pricing will fail silently without raising an error.

Normalize model names in both methods:

Fix
     def moderation(self, input: str, **kwargs):
         call_params = {}
         call_params["input"] = input
         model_pricing = kwargs.pop("model_pricing", None)
         model_name = kwargs.get("model")
+        if model_name and self.provider_name and "/" not in model_name:
+            model_name = f"{self.provider_name}/{model_name}"
         # ... rest of method
         cost_calc_params = {
             "completion_response": response,
             "call_type": "moderation",
+            "model": model_name,
         }

Apply the same fix to amoderation() at lines 479–514.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@edenai_apis/llmengine/clients/litellm_client/litellm_client.py` around lines
445 - 447, The moderation() and amoderation() flows read kwargs.get("model") raw
which prevents normalization and stops lookup of custom pricing; update both
moderation() and amoderation() to normalize the model name the same way other
methods do (derive provider/model format from kwargs or provider config) and use
kwargs.pop("model_pricing", None) as before, then pass the normalized model name
into completion_cost() (or whichever pricing helper is called) so the litellm
pricing registry can find custom model_pricing; ensure you reference and update
the local variables model_pricing and model_name in both methods and mirror the
normalization logic used by completion(), embedding(), and image_generation()
functions.

@Mounir-charef Mounir-charef marked this pull request as draft February 26, 2026 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant