Litellm staging 03 21 2026 by Chesars · Pull Request #24340 · BerriAI/litellm

Chesars · 2026-03-22T03:00:54Z

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

- Removed litellm/model_prices_and_context_window_backup.json - Updated get_model_cost_map() to read from single source - Refactored _load_local_model_cost_map() to support: * Package resources (production/pip install) * Project root (development) - Updated CI/CD workflows: * .circleci/config.yml: Copy to litellm/ before publishing * .github/workflows/simple_pypi_publish.yml: Same approach * ci_cd/check_files_match.py: Updated file paths - Updated test_get_model_file.py to use project root - Renamed test_get_backup_model_cost_map → test_get_local_model_cost_map Benefits: - Eliminates file duplication - Single source of truth for model pricing - No need for PRs like #16460 to sync backup files - Simpler maintenance The backup was unnecessary because CI copies the file to the package before publishing.

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

The hook kept the backup JSON in sync with the root file — no longer needed now that the backup file has been removed.

- Update load_local_model_cost_map to use project root fallback for dev - Keep main's validation, aliases, and source info tracking - Remove backup JSON (purpose of this PR)

Prevent accidental commits of the CI-generated copy.

…or/remove-backup-file-dry-principle # Conflicts: # litellm/model_prices_and_context_window_backup.json

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

…ctual helper

Remove early return in get_llm_provider_logic.py that prevented the 'openrouter/' prefix from being stripped. The early return was intended for 'native OpenRouter models' like 'openrouter/free', but no such models exist in the model registry — all OpenRouter models are multi-segment (e.g. 'openrouter/anthropic/claude-3.5-sonnet') and need the prefix stripped before being sent to the OpenRouter API. This regression was introduced in v1.82.3 and caused 400 Bad Request errors for all OpenRouter models.

Native models like openrouter/auto, openrouter/free have no extra '/' after the prefix. Provider-routed models like openrouter/anthropic/claude have extra segments and need prefix stripping. Condition: only early-return when model after 'openrouter/' has no '/'.

fix: strip 'openrouter/' prefix from model names (#24234)

…24209) When multiple inputs were passed to the Gemini embedding endpoint and any contained multimodal data (images, audio, etc.), LiteLLM incorrectly used the `embedContent` endpoint which combines all inputs into a single aggregated embedding. Now uses `batchEmbedContents` with each input as a separate request, returning N embeddings for N inputs as expected. Also fixes hardcoded index=0 in batch embedding responses.

Add zai.glm-5 to model_prices_and_context_window.json with Bedrock Converse routing. Pricing: $1.00/1M input, $3.20/1M output tokens.

feat(bedrock): add Z.AI GLM-5 model support

…ntent tests

…beddings-24209 fix(gemini): return separate embeddings for multimodal inputs

…t loader

…e-backup-file-dry-principle # Conflicts: # litellm/model_prices_and_context_window_backup.json

…y-principle refactor: Remove redundant backup file

vercel · 2026-03-22T03:01:00Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
litellm	Ready	Preview, Comment	Mar 22, 2026 4:54am

codspeed-hq · 2026-03-22T03:02:42Z

Merging this PR will not alter performance

✅ 16 untouched benchmarks

_{Comparing litellm_staging_03_21_2026 (b038c07) with main (f5194b5)}

greptile-apps · 2026-03-22T03:06:55Z

Greptile Summary

This staging PR bundles three independent improvements: (1) a rename of the internal model-prices JSON from model_prices_and_context_window_backup.json to model_prices_and_context_window.json with an updated CI/CD pipeline and development fallback path; (2) a fix for OpenRouter provider-prefixed model names (openrouter/anthropic/claude-3.5-sonnet) where the openrouter/ prefix was incorrectly preserved; and (3) new combined multimodal embedding support for Gemini's batchEmbedContents path, allowing callers to wrap inputs in nested lists to produce a single combined embedding.

Key changes:

litellm/model_prices_and_context_window_backup.json is deleted; CI now copies model_prices_and_context_window.json directly into the litellm/ package directory, and that path is gitignored.
get_model_cost_map.py adds a dev-mode fallback to the repo root when the package resource is absent.
EmbeddingInput type is broadened to Union[str, List[Union[str, List[str]]]] to accept nested lists.
batch_embed_content_handler.py removes the multimodal-detection gate (is_multimodal or vertex_ai) in favor of a provider-only gate (vertex_ai). This is a backwards-incompatible behavioral change: existing users passing multiple multimodal inputs with a gemini/ prefix previously received a single combined embedding (via embedContent), but will now receive separate embeddings (via batchEmbedContents). This violates the project policy of avoiding backwards-incompatible changes without user-controlled flags.
_is_multimodal_input is left as an unused import in batch_embed_content_handler.py after the refactor.
process_response correctly fixes a long-standing bug where all returned embedding objects had index=0 instead of their actual position.

Confidence Score: 3/5

Mostly safe, but contains a backwards-incompatible behavior change for Gemini multimodal embedding users that should be addressed before merging.
The model-prices rename and OpenRouter fix are well-implemented and tested. The embedding refactor introduces good new functionality and fixes real bugs (index=0 regression, separate-per-input semantics). However, the removal of the is_multimodal gate for the gemini/ provider path silently changes behavior for existing users who passed multimodal inputs and expected a single combined embedding. This is a P1 backwards-incompatible change with no migration path or flag. The unused import of _is_multimodal_input is minor.
litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py — the use_embed_content logic change is a backwards-incompatible behavioral shift for existing Gemini multimodal embedding users.

Important Files Changed

Filename	Overview
litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py	Adds combined multimodal embedding support (nested list syntax) for Gemini batch path; removes multimodal routing to embedContent for gemini/ provider — a backwards-incompatible behavior change. Contains one unused import (`_is_multimodal_input`).
litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_transformation.py	Refactors part-building logic into `_build_part_for_input`, adds nested-list support for combined embeddings in `transform_openai_input_gemini_content`, fixes index tracking in `process_response` (was always 0), and improves token counting for multimodal inputs. Minor docstring inaccuracy in `_is_multimodal_input`.
litellm/litellm_core_utils/get_model_cost_map.py	Renames backup file references to use the canonical `model_prices_and_context_window.json`, adds a fallback to the project root for development environments. Silent `FileNotFoundError` handling may obscure misconfigured installs.
litellm/litellm_core_utils/get_llm_provider_logic.py	Fixes GH#24234: OpenRouter provider-routed models like `openrouter/anthropic/claude-3.5-sonnet` now correctly strip the `openrouter/` prefix, while single-segment native models like `openrouter/auto` preserve it.
litellm/types/llms/openai.py	Extends `EmbeddingInput` type to support nested lists (`List[Union[str, List[str]]]`) for combined multimodal embeddings — a backwards-compatible type broadening.
pyproject.toml	Adds `litellm/model_prices_and_context_window.json` to the wheel and sdist package includes so it is bundled during `pip install`.
tests/test_litellm/llms/vertex_ai/gemini_embeddings/test_batch_embed_content_transformation.py	Comprehensive mock-only unit tests covering nested list embeddings, multimodal input transformation, index regression fix, and token counting. All tests are mock-based with no network calls.
tests/test_litellm/test_deepseek_model_metadata.py	Updated to use `GetModelCostMap.load_local_model_cost_map()` instead of loading the now-deleted backup JSON directly. Test coverage and assertions are unchanged.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[embedding call] --> B{provider?}
    B -->|vertex_ai| C[use_embed_content = True\nembedContent path]
    B -->|gemini provider| D[use_embed_content = False\nbatchEmbedContents path]

    D --> E{input shape?}
    E -->|flat string or list| F[One request per element\nN separate embeddings]
    E -->|nested list element| G[One request with multiple parts\n1 combined embedding]
    E -->|mixed nested and flat| H[Combined plus separate requests]

    D --> I{has file refs?}
    I -->|yes and api_key present| J[resolve file references]
    I -->|yes but no api_key| K[raise ValueError]
    J --> F
    J --> G

    C --> L[transform_openai_input_gemini_embed_content]
    F --> M[transform_openai_input_gemini_content]
    G --> M
    H --> M

Comments Outside Diff (4)

litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py, line 26 (link)

Unused import after refactor

_is_multimodal_input is imported here but no longer used anywhere in this file after removing the is_multimodal = _is_multimodal_input(input) call (which was replaced by use_embed_content = custom_llm_provider == "vertex_ai"). This is an unused import that should be removed.
litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_transformation.py, line 124-126 (link)

Inaccurate docstring — nested text-only lists return False

The docstring states the function returns True if any element is "a nested list", but the implementation returns True only if a nested list contains multimodal elements. A nested list of plain-text strings (e.g., [["text_a", "text_b"]]) returns False, as confirmed by the test_nested_text_is_not_multimodal test.
litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py, line 428 (link)

Backwards-incompatible behavior change for Gemini multimodal users

Before this PR, multimodal input (data URIs, GCS URLs, file references) to a gemini/ provider would route through use_embed_content = True (the embedContent path), producing a single combined embedding for all inputs. After this change, use_embed_content is False for all non-vertex_ai providers, so multimodal input via gemini/ now goes through batchEmbedContents, producing separate embeddings per input.

Any existing user who passed multiple multimodal inputs with a gemini/ provider and expected a single combined embedding will silently get multiple separate embeddings instead, which is a breaking behavioral change with no opt-out flag.

Per the project's rule on avoiding backwards-incompatible changes without user-controlled flags, this change should either be gated behind a flag or documented with a migration note.

Rule Used: What: avoid backwards-incompatible changes without... (source)
litellm/litellm_core_utils/get_model_cost_map.py, line 53-63 (link)

Silent fallback on FileNotFoundError may hide misconfigured installs

When importlib.resources raises FileNotFoundError (e.g., a pip install where the JSON was not bundled correctly), the code silently falls through to the project-root path. If that path also fails (e.g., in a Docker container or any environment where the repo root is not present), an unhandled FileNotFoundError is raised with no diagnostic message, making it hard to understand why litellm failed to start.

Consider adding a warning log on the FileNotFoundError path (similar to the ModuleNotFoundError path) and wrapping the project-root open() in a try/except with a clear error message:
```
except FileNotFoundError:
    verbose_logger.warning(
        "LiteLLM: model_prices_and_context_window.json not found in package resources. "
        "Falling back to project root."
    )
```

_{Last reviewed commit: "refactor: extract _f..."}

Allows wrapping multiple inputs in a nested list to produce a single combined embedding (text + image = 1 vector). Flat lists continue to produce separate embeddings per input (OpenAI-compatible default). Examples: input=["text", "image"] → 2 separate embeddings input=[["text", "image"]] → 1 combined embedding input=[["text", "image"], "x"] → 2 embeddings (1 combined + 1 separate)

…n tests

…l-embeddings feat(gemini): support combined multimodal embeddings via nested input

…sted inputs

…eNotFoundError

Chesars and others added 25 commits November 13, 2025 12:43

Update litellm/litellm_core_utils/get_model_cost_map.py

4e4dd88

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

chore: remove check_files_match pre-commit hook

2c6cd45

The hook kept the backup JSON in sync with the root file — no longer needed now that the backup file has been removed.

fix: resolve merge conflicts with main

fa1b1de

- Update load_local_model_cost_map to use project root fallback for dev - Keep main's validation, aliases, and source info tracking - Remove backup JSON (purpose of this PR)

chore: gitignore litellm/model_prices_and_context_window.json

f475241

Prevent accidental commits of the CI-generated copy.

Merge branch 'main' of https://github.com/BerriAI/litellm into refact…

620fd34

…or/remove-backup-file-dry-principle # Conflicts: # litellm/model_prices_and_context_window_backup.json

Update tests/test_litellm/test_deepseek_model_metadata.py

3a56bb6

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

fix: separate .gitignore entries (missing newline)

7290bcd

fix: rename _load_backup_json call sites to _load_model_cost_json

89a5a44

fix: rename test methods to remove stale _in_backup suffix and test a…

871df9f

…ctual helper

style: apply black formatting

64eb695

Merge pull request #24282 from kimimgo/fix/openrouter-prefix-strip-24234

6cd3896

fix: strip 'openrouter/' prefix from model names (#24234)

feat(bedrock): add Z.AI GLM-5 model support (#24227)

a6ef217

Add zai.glm-5 to model_prices_and_context_window.json with Bedrock Converse routing. Pricing: $1.00/1M input, $3.20/1M output tokens.

test: add multimodal mixed input test for process_response

806dd31

refactor: reuse _build_part_for_input in embed_content transform

9c4c341

Merge pull request #24338 from Chesars/feat/bedrock-glm-5-24227

7401636

feat(bedrock): add Z.AI GLM-5 model support

fix: skip token counting for multimodal inputs in process_response

bb24768

fix: count text tokens only for mixed multimodal inputs, add embed_co…

883e150

…ntent tests

Merge pull request #24337 from Chesars/fix/gemini-multimodal-batch-em…

a646214

…beddings-24209 fix(gemini): return separate embeddings for multimodal inputs

refactor: rename 'backup' to 'local' in get_model_cost_map, align tes…

965b952

…t loader

Merge remote-tracking branch 'upstream/real-main' into refactor/remov…

556a1ae

…e-backup-file-dry-principle # Conflicts: # litellm/model_prices_and_context_window_backup.json

Merge pull request #16590 from Chesars/refactor/remove-backup-file-dr…

62df632

…y-principle refactor: Remove redundant backup file

Chesars requested a review from yuneng-jiang March 22, 2026 03:08

fix: ensure gitignored model_prices JSON is included in wheel/sdist

c91b44a

vercel bot deployed to Preview March 22, 2026 03:16 View deployment

fix: raise clear error when file references used without api_key

76d53fe

vercel bot deployed to Preview March 22, 2026 03:26 View deployment

refactor: remove unnecessary sys.path.insert from test file

4694a30

vercel bot deployed to Preview March 22, 2026 03:29 View deployment

Chesars and others added 4 commits March 22, 2026 00:53

fix: update EmbeddingInput type, validate nested sub-elements, add tests

9601089

fix: clear error for nested lists on embedContent path, add validatio…

9fc6fda

…n tests

Merge pull request #24341 from Chesars/feat/gemini-combined-multimoda…

912f08b

…l-embeddings feat(gemini): support combined multimodal embeddings via nested input

vercel bot deployed to Preview March 22, 2026 04:15 View deployment

Chesars added 2 commits March 22, 2026 01:37

fix: resolve file refs in nested lists, only flag truly multimodal ne…

11e4ec0

…sted inputs

fix: rename test method

7091f87

vercel bot deployed to Preview March 22, 2026 04:39 View deployment

refactor: extract _flatten_and_detect_file_refs helper, warn on Modul…

b038c07

…eNotFoundError

vercel bot deployed to Preview March 22, 2026 04:54 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Litellm staging 03 21 2026#24340

Litellm staging 03 21 2026#24340
Chesars wants to merge 35 commits intomainfrom
litellm_staging_03_21_2026

Chesars commented Mar 22, 2026

Uh oh!

vercel bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

codspeed-hq bot commented Mar 22, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Mar 22, 2026 •

edited

Loading

Important Files Changed

Comments Outside Diff (4)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Chesars commented Mar 22, 2026

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Uh oh!

vercel bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Uh oh!

greptile-apps bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Flowchart

Comments Outside Diff (4)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Mar 22, 2026 •

edited

Loading

codspeed-hq bot commented Mar 22, 2026 •

edited

Loading

greptile-apps bot commented Mar 22, 2026 •

edited

Loading