Skip to content

Token usage/cost estimates undershoot sometimesΒ #277

@ammario

Description

@ammario

Observed on interrupted stream in Anthropic models: usage comes back strangely low (like 26 input tokens) on a very long chat. Perhaps it's because the cache is getting thrashed and we're somehow not seeing those cache eviction tokens?

Observed on GPT-5-Pro: I believe we're estimating reasoning by counting tokens in the reasoning trace. This might be accurate for Anthropic models where reasoning is highly detailed, but it falls flat on its face on GPT-5-Pro and the costs are an order of magnitude off. We should see if the API provides the number of reasoning tokens by some other means.

One idea is to compute as reasoning_tokens = total_output_tokens - parsed_text_tokens. This might be provider-agnostic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions