Skip to content

Litellm staging 03 21 2026#24340

Open
Chesars wants to merge 35 commits intomainfrom
litellm_staging_03_21_2026
Open

Litellm staging 03 21 2026#24340
Chesars wants to merge 35 commits intomainfrom
litellm_staging_03_21_2026

Conversation

@Chesars
Copy link
Collaborator

@Chesars Chesars commented Mar 22, 2026

Relevant issues

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test

Changes

Chesars and others added 25 commits November 13, 2025 12:43
- Removed litellm/model_prices_and_context_window_backup.json
- Updated get_model_cost_map() to read from single source
- Refactored _load_local_model_cost_map() to support:
  * Package resources (production/pip install)
  * Project root (development)
- Updated CI/CD workflows:
  * .circleci/config.yml: Copy to litellm/ before publishing
  * .github/workflows/simple_pypi_publish.yml: Same approach
  * ci_cd/check_files_match.py: Updated file paths
- Updated test_get_model_file.py to use project root
- Renamed test_get_backup_model_cost_map → test_get_local_model_cost_map

Benefits:
- Eliminates file duplication
- Single source of truth for model pricing
- No need for PRs like #16460 to sync backup files
- Simpler maintenance

The backup was unnecessary because CI copies the file to the package before publishing.
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
The hook kept the backup JSON in sync with the root file — no longer
needed now that the backup file has been removed.
- Update load_local_model_cost_map to use project root fallback for dev
- Keep main's validation, aliases, and source info tracking
- Remove backup JSON (purpose of this PR)
Prevent accidental commits of the CI-generated copy.
…or/remove-backup-file-dry-principle

# Conflicts:
#	litellm/model_prices_and_context_window_backup.json
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Remove early return in get_llm_provider_logic.py that prevented
the 'openrouter/' prefix from being stripped. The early return was
intended for 'native OpenRouter models' like 'openrouter/free',
but no such models exist in the model registry — all OpenRouter
models are multi-segment (e.g. 'openrouter/anthropic/claude-3.5-sonnet')
and need the prefix stripped before being sent to the OpenRouter API.

This regression was introduced in v1.82.3 and caused 400 Bad Request
errors for all OpenRouter models.
Native models like openrouter/auto, openrouter/free have no extra '/'
after the prefix. Provider-routed models like openrouter/anthropic/claude
have extra segments and need prefix stripping.

Condition: only early-return when model after 'openrouter/' has no '/'.
fix: strip 'openrouter/' prefix from model names (#24234)
…24209)

When multiple inputs were passed to the Gemini embedding endpoint and any
contained multimodal data (images, audio, etc.), LiteLLM incorrectly used
the `embedContent` endpoint which combines all inputs into a single
aggregated embedding. Now uses `batchEmbedContents` with each input as a
separate request, returning N embeddings for N inputs as expected.

Also fixes hardcoded index=0 in batch embedding responses.
Add zai.glm-5 to model_prices_and_context_window.json with Bedrock
Converse routing. Pricing: $1.00/1M input, $3.20/1M output tokens.
feat(bedrock): add Z.AI GLM-5 model support
…beddings-24209

fix(gemini): return separate embeddings for multimodal inputs
…e-backup-file-dry-principle

# Conflicts:
#	litellm/model_prices_and_context_window_backup.json
…y-principle

refactor: Remove redundant backup file
@vercel
Copy link

vercel bot commented Mar 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 22, 2026 4:54am

Request Review

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 22, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing litellm_staging_03_21_2026 (b038c07) with main (f5194b5)

Open in CodSpeed

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This staging PR bundles three independent improvements: (1) a rename of the internal model-prices JSON from model_prices_and_context_window_backup.json to model_prices_and_context_window.json with an updated CI/CD pipeline and development fallback path; (2) a fix for OpenRouter provider-prefixed model names (openrouter/anthropic/claude-3.5-sonnet) where the openrouter/ prefix was incorrectly preserved; and (3) new combined multimodal embedding support for Gemini's batchEmbedContents path, allowing callers to wrap inputs in nested lists to produce a single combined embedding.

Key changes:

  • litellm/model_prices_and_context_window_backup.json is deleted; CI now copies model_prices_and_context_window.json directly into the litellm/ package directory, and that path is gitignored.
  • get_model_cost_map.py adds a dev-mode fallback to the repo root when the package resource is absent.
  • EmbeddingInput type is broadened to Union[str, List[Union[str, List[str]]]] to accept nested lists.
  • batch_embed_content_handler.py removes the multimodal-detection gate (is_multimodal or vertex_ai) in favor of a provider-only gate (vertex_ai). This is a backwards-incompatible behavioral change: existing users passing multiple multimodal inputs with a gemini/ prefix previously received a single combined embedding (via embedContent), but will now receive separate embeddings (via batchEmbedContents). This violates the project policy of avoiding backwards-incompatible changes without user-controlled flags.
  • _is_multimodal_input is left as an unused import in batch_embed_content_handler.py after the refactor.
  • process_response correctly fixes a long-standing bug where all returned embedding objects had index=0 instead of their actual position.

Confidence Score: 3/5

  • Mostly safe, but contains a backwards-incompatible behavior change for Gemini multimodal embedding users that should be addressed before merging.
  • The model-prices rename and OpenRouter fix are well-implemented and tested. The embedding refactor introduces good new functionality and fixes real bugs (index=0 regression, separate-per-input semantics). However, the removal of the is_multimodal gate for the gemini/ provider path silently changes behavior for existing users who passed multimodal inputs and expected a single combined embedding. This is a P1 backwards-incompatible change with no migration path or flag. The unused import of _is_multimodal_input is minor.
  • litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py — the use_embed_content logic change is a backwards-incompatible behavioral shift for existing Gemini multimodal embedding users.

Important Files Changed

Filename Overview
litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py Adds combined multimodal embedding support (nested list syntax) for Gemini batch path; removes multimodal routing to embedContent for gemini/ provider — a backwards-incompatible behavior change. Contains one unused import (_is_multimodal_input).
litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_transformation.py Refactors part-building logic into _build_part_for_input, adds nested-list support for combined embeddings in transform_openai_input_gemini_content, fixes index tracking in process_response (was always 0), and improves token counting for multimodal inputs. Minor docstring inaccuracy in _is_multimodal_input.
litellm/litellm_core_utils/get_model_cost_map.py Renames backup file references to use the canonical model_prices_and_context_window.json, adds a fallback to the project root for development environments. Silent FileNotFoundError handling may obscure misconfigured installs.
litellm/litellm_core_utils/get_llm_provider_logic.py Fixes GH#24234: OpenRouter provider-routed models like openrouter/anthropic/claude-3.5-sonnet now correctly strip the openrouter/ prefix, while single-segment native models like openrouter/auto preserve it.
litellm/types/llms/openai.py Extends EmbeddingInput type to support nested lists (List[Union[str, List[str]]]) for combined multimodal embeddings — a backwards-compatible type broadening.
pyproject.toml Adds litellm/model_prices_and_context_window.json to the wheel and sdist package includes so it is bundled during pip install.
tests/test_litellm/llms/vertex_ai/gemini_embeddings/test_batch_embed_content_transformation.py Comprehensive mock-only unit tests covering nested list embeddings, multimodal input transformation, index regression fix, and token counting. All tests are mock-based with no network calls.
tests/test_litellm/test_deepseek_model_metadata.py Updated to use GetModelCostMap.load_local_model_cost_map() instead of loading the now-deleted backup JSON directly. Test coverage and assertions are unchanged.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[embedding call] --> B{provider?}
    B -->|vertex_ai| C[use_embed_content = True\nembedContent path]
    B -->|gemini provider| D[use_embed_content = False\nbatchEmbedContents path]

    D --> E{input shape?}
    E -->|flat string or list| F[One request per element\nN separate embeddings]
    E -->|nested list element| G[One request with multiple parts\n1 combined embedding]
    E -->|mixed nested and flat| H[Combined plus separate requests]

    D --> I{has file refs?}
    I -->|yes and api_key present| J[resolve file references]
    I -->|yes but no api_key| K[raise ValueError]
    J --> F
    J --> G

    C --> L[transform_openai_input_gemini_embed_content]
    F --> M[transform_openai_input_gemini_content]
    G --> M
    H --> M
Loading

Comments Outside Diff (4)

  1. litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py, line 26 (link)

    Unused import after refactor

    _is_multimodal_input is imported here but no longer used anywhere in this file after removing the is_multimodal = _is_multimodal_input(input) call (which was replaced by use_embed_content = custom_llm_provider == "vertex_ai"). This is an unused import that should be removed.

  2. litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_transformation.py, line 124-126 (link)

    Inaccurate docstring — nested text-only lists return False

    The docstring states the function returns True if any element is "a nested list", but the implementation returns True only if a nested list contains multimodal elements. A nested list of plain-text strings (e.g., [["text_a", "text_b"]]) returns False, as confirmed by the test_nested_text_is_not_multimodal test.

  3. litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_handler.py, line 428 (link)

    Backwards-incompatible behavior change for Gemini multimodal users

    Before this PR, multimodal input (data URIs, GCS URLs, file references) to a gemini/ provider would route through use_embed_content = True (the embedContent path), producing a single combined embedding for all inputs. After this change, use_embed_content is False for all non-vertex_ai providers, so multimodal input via gemini/ now goes through batchEmbedContents, producing separate embeddings per input.

    Any existing user who passed multiple multimodal inputs with a gemini/ provider and expected a single combined embedding will silently get multiple separate embeddings instead, which is a breaking behavioral change with no opt-out flag.

    Per the project's rule on avoiding backwards-incompatible changes without user-controlled flags, this change should either be gated behind a flag or documented with a migration note.

    Rule Used: What: avoid backwards-incompatible changes without... (source)

  4. litellm/litellm_core_utils/get_model_cost_map.py, line 53-63 (link)

    Silent fallback on FileNotFoundError may hide misconfigured installs

    When importlib.resources raises FileNotFoundError (e.g., a pip install where the JSON was not bundled correctly), the code silently falls through to the project-root path. If that path also fails (e.g., in a Docker container or any environment where the repo root is not present), an unhandled FileNotFoundError is raised with no diagnostic message, making it hard to understand why litellm failed to start.

    Consider adding a warning log on the FileNotFoundError path (similar to the ModuleNotFoundError path) and wrapping the project-root open() in a try/except with a clear error message:

    except FileNotFoundError:
        verbose_logger.warning(
            "LiteLLM: model_prices_and_context_window.json not found in package resources. "
            "Falling back to project root."
        )

Last reviewed commit: "refactor: extract _f..."

@Chesars Chesars requested a review from yuneng-jiang March 22, 2026 03:08
Chesars and others added 4 commits March 22, 2026 00:53
Allows wrapping multiple inputs in a nested list to produce a single
combined embedding (text + image = 1 vector). Flat lists continue to
produce separate embeddings per input (OpenAI-compatible default).

Examples:
  input=["text", "image"]        → 2 separate embeddings
  input=[["text", "image"]]      → 1 combined embedding
  input=[["text", "image"], "x"] → 2 embeddings (1 combined + 1 separate)
…l-embeddings

feat(gemini): support combined multimodal embeddings via nested input
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants