Skip to content

Conversation

@b3nw
Copy link
Contributor

@b3nw b3nw commented Jan 28, 2026

Fixes metadata token counting for Anthropic-format API responses (used by /v1/messages endpoint).

The _log_metadata method in TransactionLogger only supported OpenAI format usage keys (prompt_tokens, completion_tokens) but Anthropic responses use different keys (input_tokens, output_tokens). This caused null token counts in metadata.json for providers like dedaluslabs and firmware when using the Anthropic-compatible /v1/messages endpoint.

Changes:

Add fallback from OpenAI to Anthropic format for token counts (prompt_tokens → input_tokens, completion_tokens → output_tokens)
Use explicit None checks instead of or to correctly handle 0 values
Calculate total_tokens if missing from Anthropic responses (sum of input + output)
Handle stop_reason (Anthropic format) as well as finish_reason (OpenAI format)

Testing Done:

Verified dedaluslabs and firmware providers now log token counts correctly in metadata.json when using /v1/messages endpoint
Confirmed OpenAI format responses continue to work unchanged
Tested edge case where token counts are 0 (now correctly logged as 0 instead of falling back)


Important

Fixes token counting for Anthropic-format API responses in TransactionLogger by adding support for Anthropic keys and handling zero token counts.

  • Behavior:
    • Fixes token counting for Anthropic-format API responses in _log_metadata() of transaction_logger.py.
    • Adds fallback from OpenAI format keys (prompt_tokens, completion_tokens) to Anthropic format keys (input_tokens, output_tokens).
    • Uses explicit None checks to handle zero token counts correctly.
    • Calculates total_tokens if missing in Anthropic responses by summing input_tokens and output_tokens.
    • Handles stop_reason (Anthropic) alongside finish_reason (OpenAI).
  • Testing:
    • Verified correct logging of token counts for dedaluslabs and firmware providers using /v1/messages endpoint.
    • Confirmed unchanged behavior for OpenAI format responses.
    • Tested edge case for zero token counts, ensuring correct logging as 0.

This description was created by Ellipsis for 6a5f601. You can customize this summary. It will automatically update as commits are pushed.

@b3nw b3nw requested a review from Mirrowel as a code owner January 28, 2026 01:12
@b3nw b3nw force-pushed the fix/metadata-token-counting branch from 6a5f601 to 9c2436d Compare January 28, 2026 03:16
Pre-calculate input tokens before streaming starts so message_start
contains accurate input_tokens. This matches Anthropic's native API
behavior where input_tokens is provided upfront.

Previously, message_start always had input_tokens=0 because OpenAI-format
streams only provide usage data in the final chunk. Claude Code reads
current_usage from message_start, causing used_percentage to always be 0.

Changes:
- Add precalculated_input_tokens parameter to anthropic_streaming_wrapper
- Calculate tokens via token_count() before streaming in AnthropicHandler
- Use precalculated value for message_start usage dict
- Include input_tokens in message_delta for total_input_tokens accumulation
@b3nw b3nw force-pushed the fix/metadata-token-counting branch from 9c2436d to 0ba756b Compare January 28, 2026 04:36
@mirrobot-agent
Copy link
Contributor

I'm starting my review of the fix for Anthropic metadata token counting. I'll be looking closely at how token counts are handled in both streaming and non-streaming responses. Back soon with the results! 🔍

Copy link
Contributor

@mirrobot-agent mirrobot-agent bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This is a solid fix for the token counting issues observed with Anthropic-format responses. The approach of providing fallback keys and handling zero values explicitly is robust and correctly addresses the underlying problem where token counts would appear as null for certain providers.

The addition of precalculated_input_tokens for streaming responses is a significant UX improvement. By calculating tokens upfront, the message_start event now reports accurate input tokens immediately, matching Anthropic's native API behavior and avoiding the confusion of seeing 0 tokens until the end of the stream.

Architectural Feedback

The integration of pre-calculated tokens into the AnthropicHandler and anthropic_streaming_wrapper is well-placed. It leverages the existing token_count infrastructure effectively without introducing unnecessary complexity.

Key Suggestions

  • Streaming Usage Fallbacks: While the TransactionLogger was updated to handle both OpenAI and Anthropic usage keys, the anthropic_streaming_wrapper (around lines 223-224) still only checks for prompt_tokens and completion_tokens in individual chunks. For full consistency, consider adding fallback checks for input_tokens and output_tokens there as well. This ensures that if a provider returns Anthropic-style usage in an OpenAI-formatted stream, it is correctly captured and updated from the chunks.

Questions for the Author

None. The implementation is clear and the testing covers the relevant edge cases well.

This review was generated by an AI assistant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant