-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
fix(usage): map OpenAI cached_tokens to _cache_read_input_tokens #20878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(usage): map OpenAI cached_tokens to _cache_read_input_tokens #20878
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
Greptile OverviewGreptile SummaryThis PR updates Unit tests were added to cover the OpenAI cached tokens mapping behavior and a few edge cases around missing/zero values. Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/types/utils.py | Adds an OpenAI-specific fallback mapping in Usage.__init__ to populate _cache_read_input_tokens from prompt_tokens_details.cached_tokens when not already set by other provider mappings. |
| tests/test_litellm/types/test_types_utils.py | Adds tests for OpenAI cached token mapping and related edge cases; one non-overwrite test currently can’t detect regressions because it uses identical values. |
Sequence Diagram
sequenceDiagram
participant Provider as OpenAI/Azure response
participant UsageInit as Usage.__init__
participant PromptDetails as PromptTokensDetailsWrapper
participant UI as Admin UI
Provider->>UsageInit: Construct Usage(**usage_dict)
UsageInit->>PromptDetails: Wrap prompt_tokens_details dict
UsageInit->>UsageInit: Apply provider mappings (Anthropic/DeepSeek)
UsageInit->>UsageInit: If _cache_read_input_tokens == 0 and cached_tokens > 0
UsageInit->>UsageInit: Set _cache_read_input_tokens = cached_tokens
UsageInit->>UI: UI reads cache read tokens from Usage._cache_read_input_tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 2 comments
| ## OPENAI MAPPING - populate _cache_read_input_tokens from prompt_tokens_details.cached_tokens ## | ||
| if ( | ||
| self._cache_read_input_tokens == 0 | ||
| and _prompt_tokens_details is not None | ||
| and _prompt_tokens_details.cached_tokens is not None | ||
| and _prompt_tokens_details.cached_tokens > 0 | ||
| ): | ||
| self._cache_read_input_tokens = _prompt_tokens_details.cached_tokens |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cache tokens lost when None
Usage.__init__ only sets _cache_read_input_tokens from prompt_tokens_details.cached_tokens when it’s > 0 (litellm/types/utils.py:1559-1566). If OpenAI returns a valid cached_tokens value of 0 (or the UI expects to reflect that the field was present), this mapping won’t run and you can’t distinguish “absent” vs “present but zero”. Since the PR intent is to map OpenAI’s field, consider setting _cache_read_input_tokens when cached_tokens is not None (and still keep the “don’t overwrite provider-set value” guard).
| # Anthropic passes cache_read_input_tokens explicitly in **params | ||
| usage = Usage( | ||
| prompt_tokens=1000, | ||
| completion_tokens=50, | ||
| total_tokens=1050, | ||
| prompt_tokens_details={"cached_tokens": 500}, | ||
| cache_read_input_tokens=500, | ||
| ) | ||
|
|
||
| # Should use the explicit Anthropic value, not overwrite it | ||
| assert usage._cache_read_input_tokens == 500 | ||
| assert usage.prompt_tokens_details.cached_tokens == 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-overwrite test is weak
In test_usage_anthropic_cache_read_not_overwritten_by_prompt_details the two inputs that could conflict are identical, so the test can’t detect an overwrite regression (tests/test_litellm/types/test_types_utils.py:200-206). Change the test to pass different values for the two fields and assert the explicit cache-read value is preserved.
|
@greptile can you review the last commit? |
Greptile OverviewGreptile SummaryThis PR updates the The change lives in Confidence Score: 3/5
|
| Filename | Overview |
|---|---|
| litellm/types/utils.py | Adds a fallback mapping in Usage.__init__ to set _cache_read_input_tokens from prompt_tokens_details.cached_tokens when not already set. |
| tests/test_litellm/types/test_types_utils.py | Adds tests for OpenAI cached_tokens mapping, but includes a failing assertion in the Anthropic non-overwrite test due to how prompt_tokens_details is applied at the end of Usage.__init__. |
Sequence Diagram
sequenceDiagram
participant Provider as ProviderResponse
participant UsageInit as UsageInit
participant Ptd as PromptTokensDetails
Provider->>UsageInit: construct with usage dict
UsageInit->>UsageInit: parse prompt token details
alt explicit cache-read provided
UsageInit->>UsageInit: set private cache-read tokens
UsageInit->>Ptd: set cached_tokens
else provider cache-hit provided
UsageInit->>UsageInit: set private cache-read tokens
else OpenAI cached_tokens present
UsageInit->>UsageInit: if private cache-read is zero
UsageInit->>UsageInit: set private cache-read from cached_tokens
end
UsageInit->>UsageInit: apply remaining params to model
UsageInit-->>Provider: return Usage object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 files reviewed, 1 comment
| # Should use the explicit Anthropic value (500), not the prompt_tokens_details value (300) | ||
| assert usage._cache_read_input_tokens == 500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect cached_tokens assertion
With prompt_tokens_details passed in, that value is applied after the internal mappings, so prompt_tokens_details.cached_tokens ends up 300 here (while _cache_read_input_tokens remains 500).
|
@greptile please re-review the latest commits. Here's what was addressed:
Additionally, expanded test coverage with 3 new tests:
|
Greptile OverviewGreptile SummaryThis PR updates Overall, the mapping change is small and localized to Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| litellm/types/utils.py | Adds OpenAI fallback mapping to populate Usage._cache_read_input_tokens from prompt_tokens_details.cached_tokens when not already set by other provider mappings. |
Sequence Diagram
sequenceDiagram
participant Provider as OpenAI/Azure
participant LiteLLM as LiteLLM Usage.__init__
participant UI as Admin UI
Provider->>LiteLLM: usage dict (prompt_tokens_details.cached_tokens)
LiteLLM->>LiteLLM: Build PromptTokensDetailsWrapper
LiteLLM->>LiteLLM: Apply provider mappings (Anthropic/DeepSeek)
alt _cache_read_input_tokens still 0 and cached_tokens != None
LiteLLM->>LiteLLM: _cache_read_input_tokens = cached_tokens
end
LiteLLM->>UI: Usage object (includes _cache_read_input_tokens)
UI->>UI: Display Cache Read Tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 file reviewed, 1 comment
| def test_usage_openai_cached_tokens_zero_does_not_set_cache_read(): | ||
| """ | ||
| When OpenAI returns cached_tokens=0, _cache_read_input_tokens should stay 0. | ||
| """ | ||
| from litellm.types.utils import Usage | ||
|
|
||
| openai_usage = { | ||
| "prompt_tokens": 100, | ||
| "completion_tokens": 10, | ||
| "total_tokens": 110, | ||
| "prompt_tokens_details": { | ||
| "audio_tokens": 0, | ||
| "cached_tokens": 0, | ||
| }, | ||
| } | ||
|
|
||
| usage = Usage(**openai_usage) | ||
| assert usage._cache_read_input_tokens == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misleading zero-value test
The implementation maps the internal cache-read counter from prompt_tokens_details.cached_tokens whenever that field is present (checked via cached_tokens is not None), even if the value is 0 (litellm/types/utils.py:1559-1565). This test’s name/docstring reads like zero should be treated as “do not map”, which contradicts the actual behavior and the broader goal of treating “present but zero” differently from “missing”. Please update the test name/docstring and/or assertions to reflect the intended contract: mapping occurs when the field is present; lack of mapping should be reserved for missing details or null cached_tokens.
Relevant issues
Fixes #19684
Pre-Submission checklist
tests/litellm/directory, Adding at least 1 test is a hard requirement - see detailsmake test-unitCI (LiteLLM team)
Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:
Type
🐛 Bug Fix
Changes
OpenAI models return cached token counts in
usage.prompt_tokens_details.cached_tokens, but LiteLLM was not mapping this value to the internal_cache_read_input_tokensprivate attribute on theUsageclass.This caused the admin UI to display "Cache Read Tokens: 0" for OpenAI/Azure requests even when prompt caching was active (visible in response metadata). Cost calculation was unaffected since it reads directly from
prompt_tokens_details.cached_tokens.Root cause: Anthropic and Bedrock explicitly pass
cache_read_input_tokenswhen constructingUsage, which sets the private attr. OpenAI's usage dict only containsprompt_tokens_details.cached_tokens, and no mapping existed from that field to_cache_read_input_tokens.Fix: In
Usage.__init__, after all provider-specific mappings (Anthropic, DeepSeek), populate_cache_read_input_tokensfromprompt_tokens_details.cached_tokensif it hasn't already been set. This is safe because the== 0guard prevents overwriting values set by other providers.Files changed:
litellm/types/utils.py- Added fallback mapping inUsage.__init__tests/test_litellm/types/test_types_utils.py- Added 4 unit tests covering OpenAI cached tokens mapping, zero value, Anthropic non-overwrite, and no prompt_tokens_details