Skip to content

Conversation

@michelligabriele
Copy link
Contributor

Relevant issues

Fixes #19684

Pre-Submission checklist

  • I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🐛 Bug Fix

Changes

OpenAI models return cached token counts in usage.prompt_tokens_details.cached_tokens, but LiteLLM was not mapping this value to the internal _cache_read_input_tokens private attribute on the Usage class.

This caused the admin UI to display "Cache Read Tokens: 0" for OpenAI/Azure requests even when prompt caching was active (visible in response metadata). Cost calculation was unaffected since it reads directly from prompt_tokens_details.cached_tokens.

Root cause: Anthropic and Bedrock explicitly pass cache_read_input_tokens when constructing Usage, which sets the private attr. OpenAI's usage dict only contains prompt_tokens_details.cached_tokens, and no mapping existed from that field to _cache_read_input_tokens.

Fix: In Usage.__init__, after all provider-specific mappings (Anthropic, DeepSeek), populate _cache_read_input_tokens from prompt_tokens_details.cached_tokens if it hasn't already been set. This is safe because the == 0 guard prevents overwriting values set by other providers.

Files changed:

  • litellm/types/utils.py - Added fallback mapping in Usage.__init__
  • tests/test_litellm/types/test_types_utils.py - Added 4 unit tests covering OpenAI cached tokens mapping, zero value, Anthropic non-overwrite, and no prompt_tokens_details

@vercel
Copy link

vercel bot commented Feb 10, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Feb 10, 2026 5:28pm

Request Review

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 10, 2026

Greptile Overview

Greptile Summary

This PR updates Usage.__init__ to backfill the private _cache_read_input_tokens field from OpenAI-style prompt_tokens_details.cached_tokens when other provider mappings haven’t set it. This aligns OpenAI/Azure usage parsing with existing Anthropic/DeepSeek mappings so the admin UI can display cache read tokens correctly.

Unit tests were added to cover the OpenAI cached tokens mapping behavior and a few edge cases around missing/zero values.

Confidence Score: 3/5

  • This PR is close to mergeable but has a couple correctness/test-coverage gaps that should be addressed first.
  • Core mapping is straightforward and localized, but the OpenAI mapping currently skips the explicit-zero case and one added test won’t detect overwrites because it uses identical values for both sources.
  • litellm/types/utils.py, tests/test_litellm/types/test_types_utils.py

Important Files Changed

Filename Overview
litellm/types/utils.py Adds an OpenAI-specific fallback mapping in Usage.__init__ to populate _cache_read_input_tokens from prompt_tokens_details.cached_tokens when not already set by other provider mappings.
tests/test_litellm/types/test_types_utils.py Adds tests for OpenAI cached token mapping and related edge cases; one non-overwrite test currently can’t detect regressions because it uses identical values.

Sequence Diagram

sequenceDiagram
  participant Provider as OpenAI/Azure response
  participant UsageInit as Usage.__init__
  participant PromptDetails as PromptTokensDetailsWrapper
  participant UI as Admin UI

  Provider->>UsageInit: Construct Usage(**usage_dict)
  UsageInit->>PromptDetails: Wrap prompt_tokens_details dict
  UsageInit->>UsageInit: Apply provider mappings (Anthropic/DeepSeek)
  UsageInit->>UsageInit: If _cache_read_input_tokens == 0 and cached_tokens > 0
  UsageInit->>UsageInit: Set _cache_read_input_tokens = cached_tokens
  UsageInit->>UI: UI reads cache read tokens from Usage._cache_read_input_tokens
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines 1559 to 1566
## OPENAI MAPPING - populate _cache_read_input_tokens from prompt_tokens_details.cached_tokens ##
if (
self._cache_read_input_tokens == 0
and _prompt_tokens_details is not None
and _prompt_tokens_details.cached_tokens is not None
and _prompt_tokens_details.cached_tokens > 0
):
self._cache_read_input_tokens = _prompt_tokens_details.cached_tokens
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache tokens lost when None

Usage.__init__ only sets _cache_read_input_tokens from prompt_tokens_details.cached_tokens when it’s > 0 (litellm/types/utils.py:1559-1566). If OpenAI returns a valid cached_tokens value of 0 (or the UI expects to reflect that the field was present), this mapping won’t run and you can’t distinguish “absent” vs “present but zero”. Since the PR intent is to map OpenAI’s field, consider setting _cache_read_input_tokens when cached_tokens is not None (and still keep the “don’t overwrite provider-set value” guard).

Comment on lines 195 to 206
# Anthropic passes cache_read_input_tokens explicitly in **params
usage = Usage(
prompt_tokens=1000,
completion_tokens=50,
total_tokens=1050,
prompt_tokens_details={"cached_tokens": 500},
cache_read_input_tokens=500,
)

# Should use the explicit Anthropic value, not overwrite it
assert usage._cache_read_input_tokens == 500
assert usage.prompt_tokens_details.cached_tokens == 500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-overwrite test is weak

In test_usage_anthropic_cache_read_not_overwritten_by_prompt_details the two inputs that could conflict are identical, so the test can’t detect an overwrite regression (tests/test_litellm/types/test_types_utils.py:200-206). Change the test to pass different values for the two fields and assert the explicit cache-read value is preserved.

@michelligabriele
Copy link
Contributor Author

@greptile can you review the last commit?

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 10, 2026

Greptile Overview

Greptile Summary

This PR updates the Usage model construction to populate the private _cache_read_input_tokens field from OpenAI’s prompt_tokens_details.cached_tokens when no provider-specific cache-read value was already set. It also adds unit tests covering OpenAI cached token mapping, zero values, a non-overwrite scenario, and absence of prompt_tokens_details.

The change lives in litellm/types/utils.py within Usage.__init__, alongside existing provider-specific mappings (Anthropic/DeepSeek), and is intended to fix UI reporting of cache-read tokens for OpenAI/Azure responses without affecting cost computation.

Confidence Score: 3/5

  • This PR is close to mergeable, but one added unit test assertion appears incorrect and will likely fail in CI.
  • The runtime mapping change is small and aligned with existing provider-specific token mapping logic, but the new Anthropic non-overwrite test asserts a cached_tokens value that does not match the constructor’s attribute-application order in Usage.__init__. Fixing that test should make the PR safer to merge.
  • tests/test_litellm/types/test_types_utils.py

Important Files Changed

Filename Overview
litellm/types/utils.py Adds a fallback mapping in Usage.__init__ to set _cache_read_input_tokens from prompt_tokens_details.cached_tokens when not already set.
tests/test_litellm/types/test_types_utils.py Adds tests for OpenAI cached_tokens mapping, but includes a failing assertion in the Anthropic non-overwrite test due to how prompt_tokens_details is applied at the end of Usage.__init__.

Sequence Diagram

sequenceDiagram
    participant Provider as ProviderResponse
    participant UsageInit as UsageInit
    participant Ptd as PromptTokensDetails

    Provider->>UsageInit: construct with usage dict
    UsageInit->>UsageInit: parse prompt token details

    alt explicit cache-read provided
        UsageInit->>UsageInit: set private cache-read tokens
        UsageInit->>Ptd: set cached_tokens
    else provider cache-hit provided
        UsageInit->>UsageInit: set private cache-read tokens
    else OpenAI cached_tokens present
        UsageInit->>UsageInit: if private cache-read is zero
        UsageInit->>UsageInit: set private cache-read from cached_tokens
    end

    UsageInit->>UsageInit: apply remaining params to model
    UsageInit-->>Provider: return Usage object
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines 205 to 206
# Should use the explicit Anthropic value (500), not the prompt_tokens_details value (300)
assert usage._cache_read_input_tokens == 500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect cached_tokens assertion

With prompt_tokens_details passed in, that value is applied after the internal mappings, so prompt_tokens_details.cached_tokens ends up 300 here (while _cache_read_input_tokens remains 500).

@michelligabriele
Copy link
Contributor Author

michelligabriele commented Feb 10, 2026

@greptile please re-review the latest commits. Here's what was addressed:

  1. > 0 guard removed (first review comment): Now uses is not None, so cached_tokens: 0 is correctly distinguished from absent.
  2. Weak overwrite test (first review comment): Anthropic test now uses distinct values (cache_read_input_tokens=500 vs prompt_tokens_details.cached_tokens=300) to properly detect overwrites.
  3. "Incorrect cached_tokens assertion" (second review comment): The assertion prompt_tokens_details.cached_tokens == 500 is actually correct — the Anthropic mapping at Usage.init line ~1502 explicitly overwrites _prompt_tokens_details.cached_tokens with the cache_read_input_tokens value before super().init is called. Added a
    docstring explaining this flow.

Additionally, expanded test coverage with 3 new tests:

  • DeepSeek: verifies prompt_cache_hit_tokens is not overwritten by the OpenAI fallback
  • cached_tokens=None: verifies explicit None doesn't set _cache_read_input_tokens
  • Round-trip serialization: verifies _cache_read_input_tokens survives model_dump() → Usage() cycle

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 10, 2026

Greptile Overview

Greptile Summary

This PR updates Usage.__init__ to backfill LiteLLM’s internal cache-read counter (_cache_read_input_tokens) from OpenAI/Azure’s usage.prompt_tokens_details.cached_tokens when no provider-specific mapping has already set it (Anthropic via cache_read_input_tokens, DeepSeek via prompt_cache_hit_tokens). It also adds tests intended to cover the OpenAI mapping, non-overwrite behavior for other providers, and a model_dump round-trip.

Overall, the mapping change is small and localized to litellm/types/utils.py, and it fits the existing pattern of provider-specific usage normalization used to power downstream reporting/UI.

Confidence Score: 4/5

  • Generally safe to merge; localized change, but test expectations should be aligned with the intended mapping semantics.
  • The core code change only backfills the internal cache-read counter from an already-parsed provider field when other provider mappings have not set it, which keeps the blast radius small. Main remaining risk is that the new tests encode a confusing contract around zero values, which can mislead future maintenance.
  • tests/test_litellm/types/test_types_utils.py

Important Files Changed

Filename Overview
litellm/types/utils.py Adds OpenAI fallback mapping to populate Usage._cache_read_input_tokens from prompt_tokens_details.cached_tokens when not already set by other provider mappings.

Sequence Diagram

sequenceDiagram
    participant Provider as OpenAI/Azure
    participant LiteLLM as LiteLLM Usage.__init__
    participant UI as Admin UI

    Provider->>LiteLLM: usage dict (prompt_tokens_details.cached_tokens)
    LiteLLM->>LiteLLM: Build PromptTokensDetailsWrapper
    LiteLLM->>LiteLLM: Apply provider mappings (Anthropic/DeepSeek)
    alt _cache_read_input_tokens still 0 and cached_tokens != None
        LiteLLM->>LiteLLM: _cache_read_input_tokens = cached_tokens
    end
    LiteLLM->>UI: Usage object (includes _cache_read_input_tokens)
    UI->>UI: Display Cache Read Tokens
Loading

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment on lines +168 to +185
def test_usage_openai_cached_tokens_zero_does_not_set_cache_read():
"""
When OpenAI returns cached_tokens=0, _cache_read_input_tokens should stay 0.
"""
from litellm.types.utils import Usage

openai_usage = {
"prompt_tokens": 100,
"completion_tokens": 10,
"total_tokens": 110,
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0,
},
}

usage = Usage(**openai_usage)
assert usage._cache_read_input_tokens == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading zero-value test

The implementation maps the internal cache-read counter from prompt_tokens_details.cached_tokens whenever that field is present (checked via cached_tokens is not None), even if the value is 0 (litellm/types/utils.py:1559-1565). This test’s name/docstring reads like zero should be treated as “do not map”, which contradicts the actual behavior and the broader goal of treating “present but zero” differently from “missing”. Please update the test name/docstring and/or assertions to reflect the intended contract: mapping occurs when the field is present; lack of mapping should be reserved for missing details or null cached_tokens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: “Admin UI isn’t displaying cache read tokens even when metadata.cached_tokens > 0”

1 participant