fix(provider/xai): handle inconsistent cached token reporting#12485
Merged
fix(provider/xai): handle inconsistent cached token reporting#12485
Conversation
gr2m
approved these changes
Feb 12, 2026
4 tasks
Contributor
|
|
dancer
added a commit
that referenced
this pull request
Feb 12, 2026
…ng (#12518) ## background backport of #12485 to `release-v5.0` xAI's token reporting is inconsistent across models. most models report `prompt_tokens`/`input_tokens` inclusive of cached tokens (like OpenAI), but some models (e.g. `grok-4-1-fast-non-reasoning`) report them exclusive of cached tokens, where `cached_tokens > prompt_tokens` ## summary - add `convertXaiChatUsage` and `convertXaiResponsesUsage` converter functions - detect which reporting style xAI is using based on whether `cached_tokens <= prompt_tokens` - when inclusive (normal): use prompt tokens as-is - when exclusive (anomalous): add cached tokens to prompt for total input tokens - applies to both chat completions and responses APIs - adapted for v5 `LanguageModelV2Usage` flat format (vs v6 structured format) ## verification <details> <summary>tests</summary> ``` ✓ src/convert-xai-chat-usage.test.ts (6 tests) 6ms ✓ src/responses/convert-xai-responses-usage.test.ts (6 tests) 6ms Test Files 2 passed (2) Tests 12 passed (12) ``` </details> ## checklist - [x] tests have been added / updated (for bug fixes / features) - [ ] documentation has been added / updated (for bug fixes / features) - [x] a _patch_ changeset for relevant packages has been added (run `pnpm changeset` in root) - [x] i have reviewed this pull request (self-review) ## related issues backport of #12485 --------- Co-authored-by: josh <josh@afterima.ge>
gr2m
pushed a commit
that referenced
this pull request
Feb 16, 2026
## background xAI's token reporting is inconsistent across models. most models report `prompt_tokens`/`input_tokens` inclusive of cached tokens (like OpenAI), but some models (e.g. `grok-4-1-fast-non-reasoning`) report them exclusive of cached tokens, where `cached_tokens > prompt_tokens` ## summary - detect which reporting style xAI is using based on whether `cached_tokens <= prompt_tokens` - when inclusive (normal): subtract cached from prompt to get noCache (OpenAI pattern) - when exclusive (anomalous): prompt tokens already represent noCache, add cached for total (Anthropic pattern) - applies to both chat completions and responses APIs - add unit tests for the non-inclusive reporting edge case - add responses usage test file ## verification <details> <summary>gateway bug case (cached > prompt)</summary> ``` before: total=4142, noCache=-186, cacheRead=4328 after: total=8470, noCache=4142, cacheRead=4328 ``` </details> <details> <summary>normal case (cached <= prompt)</summary> ``` raw: input_tokens: 12, cached_tokens: 3 sdk: noCache: 9, cacheRead: 3, total: 12 ``` </details> ## checklist - [x] tests have been added / updated (for bug fixes / features) - [ ] documentation has been added / updated (for bug fixes / features) - [x] a _patch_ changeset for relevant packages has been added (run `pnpm changeset` in root) - [x] i have reviewed this pull request (self-review)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
background
xAI's token reporting is inconsistent across models. most models report
prompt_tokens/input_tokensinclusive of cached tokens (like OpenAI), but some models (e.g.grok-4-1-fast-non-reasoning) report them exclusive of cached tokens, wherecached_tokens > prompt_tokenssummary
cached_tokens <= prompt_tokensverification
gateway bug case (cached > prompt)
normal case (cached <= prompt)
checklist
pnpm changesetin root)