fix(langchain): Extract token usage from message.usage_metadata for streaming responses by NikitaVoitov · Pull Request #127 · signalfx/splunk-otel-python-contrib

NikitaVoitov · 2026-01-13T13:58:42Z

Summary

Fixes token usage extraction to support streaming mode by checking message.usage_metadata in addition to llm_output. This enables accurate token tracking for OpenAI, Anthropic, and other providers when streaming is enabled.

Fixes #126

The Bug (Before)

# Streaming mode span
Span: chat gpt-4o-mini
Attributes:
  gen_ai.request.model: gpt-4o-mini
  # NO gen_ai.usage.input_tokens
  # NO gen_ai.usage.output_tokens

The Fix (After)

# Streaming mode span
Span: chat gpt-4o-mini
Attributes:
  gen_ai.request.model: gpt-4o-mini
  gen_ai.usage.input_tokens: 40
  gen_ai.usage.output_tokens: 55

Changes

callback_handler.py

Added two helper functions and updated token extraction logic:

1. New _extract_token_usage_from_generations() helper:

def _extract_token_usage_from_generations(
    generations: list[list[Any]] | None,
) -> tuple[int | None, int | None]:
    """Extract token counts from message.usage_metadata (streaming format)."""
    if not generations:
        return None, None
    
    for generation_list in generations:
        for generation in generation_list:
            if not hasattr(generation, "message"):
                continue
            message = generation.message
            usage_meta = getattr(message, "usage_metadata", None)
            if not isinstance(usage_meta, dict) or not usage_meta:
                continue
            
            # Standard keys first, then OpenAI-style fallback
            input_tokens = usage_meta.get("input_tokens") or usage_meta.get("prompt_tokens")
            output_tokens = usage_meta.get("output_tokens") or usage_meta.get("completion_tokens")
            
            if input_tokens and output_tokens:
                return input_tokens, output_tokens
    
    return None, None

2. New _extract_token_usage_from_llm_output() helper:

def _extract_token_usage_from_llm_output(
    llm_output: dict[str, Any] | None,
    existing_input: int | None = None,
    existing_output: int | None = None,
) -> tuple[int | None, int | None]:
    """Extract token usage from llm_output (non-streaming format)."""
    if not llm_output:
        return existing_input, existing_output
    
    usage = llm_output.get("usage") or llm_output.get("token_usage") or {}
    
    input_tokens = existing_input
    if input_tokens is None:
        input_tokens = usage.get("prompt_tokens") or usage.get("input_tokens")
    
    output_tokens = existing_output
    if output_tokens is None:
        output_tokens = usage.get("completion_tokens") or usage.get("output_tokens")
    
    return input_tokens, output_tokens

3. Updated on_llm_end() with priority-based extraction:

# Before - only checked llm_output
llm_output = getattr(response, "llm_output", {}) or {}
usage = llm_output.get("usage") or llm_output.get("token_usage") or {}
inv.input_tokens = usage.get("prompt_tokens")
inv.output_tokens = usage.get("completion_tokens")

# After - checks both sources with priority
# Priority: message.usage_metadata (streaming) > llm_output (non-streaming)
input_tokens, output_tokens = _extract_token_usage_from_generations(generations)

# Fallback to llm_output for non-streaming responses
if input_tokens is None or output_tokens is None:
    llm_output = getattr(response, "llm_output", {}) or {}
    input_tokens, output_tokens = _extract_token_usage_from_llm_output(
        llm_output, input_tokens, output_tokens
    )

inv.input_tokens = input_tokens
inv.output_tokens = output_tokens

Token Source Priority

Priority	Source	Mode	Keys Checked
1 (highest)	`message.usage_metadata`	Streaming	`input_tokens`, `output_tokens`
2 (fallback)	`llm_output.token_usage`	Non-streaming	`prompt_tokens`, `completion_tokens`

Testing

test_token_usage_extraction_streaming_mode - verifies message.usage_metadata extraction
test_token_usage_extraction_non_streaming_mode - verifies llm_output extraction
test_token_usage_streaming_priority - verifies streaming source takes precedence
All existing tests pass

Evidence

Live test showing the fix works:

Test	`response.usage_metadata`	Trace `gen_ai.usage.*`
Non-streaming	`input_tokens: 31, output_tokens: 5`	✅ `31, 5`
Streaming (before fix)	`input_tokens: 40, output_tokens: 55`	❌ MISSING
Streaming (after fix)	`input_tokens: 40, output_tokens: 57`	✅ `40, 57`

Trace Evidence:

Streaming trace BEFORE fix (tokens missing) - Trace ID: 77432872a967d4321701ce1f22032d8c
Streaming trace AFTER fix (tokens present) - Trace ID: 303595c0d1031acdae9bacd46083d87b

Files Changed

File	Changes
`instrumentation-genai/opentelemetry-instrumentation-langchain/src/opentelemetry/instrumentation/langchain/callback_handler.py`	Added `_extract_token_usage_from_generations()`, `_extract_token_usage_from_llm_output()`, updated `on_llm_end()`
`instrumentation-genai/opentelemetry-instrumentation-langchain/tests/test_callback_handler_agent.py`	Added 3 tests for streaming token extraction

…treaming responses Token usage attributes (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens) were missing when LLM streaming is enabled because the code only checked llm_output.token_usage. In streaming mode, LangChain puts token counts in message.usage_metadata instead. This fix adds priority-based extraction: 1. First check message.usage_metadata (streaming mode) 2. Fallback to llm_output.token_usage (non-streaming mode) Adds two helper functions: - _extract_token_usage_from_generations(): extracts from usage_metadata - _extract_token_usage_from_llm_output(): extracts from llm_output Tests added: - test_token_usage_extraction_streaming_mode - test_token_usage_extraction_non_streaming_mode - test_token_usage_streaming_priority Affects: OpenAI, Anthropic, Google, and other providers using streaming with usage_metadata

Before fix (trace_streaming_before_fix.json): - Trace ID: 77432872a967d4321701ce1f22032d8c - gen_ai.usage.input_tokens: MISSING - gen_ai.usage.output_tokens: MISSING After fix (trace_streaming_after_fix.json): - Trace ID: 303595c0d1031acdae9bacd46083d87b - gen_ai.usage.input_tokens: 40 - gen_ai.usage.output_tokens: 57

github-actions · 2026-01-13T13:58:52Z

Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.

I have read the CLA Document and I hereby sign the CLA

_{You can retrigger this bot by commenting recheck in this Pull Request.}_{Posted by the CLA Assistant Lite bot.}

zhirafovod

@NikitaVoitov , thank you for creating the PR!

can you add the real app which you used to get this traces? I am specifically trying to understand when

usage.get("prompt_tokens") or usage.get("input_tokens")
``` can be the use case?

NikitaVoitov added 2 commits January 13, 2026 15:37

NikitaVoitov requested review from a team as code owners January 13, 2026 13:58

zhirafovod requested changes Jan 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(langchain): Extract token usage from message.usage_metadata for streaming responses#127

fix(langchain): Extract token usage from message.usage_metadata for streaming responses#127
NikitaVoitov wants to merge 2 commits intosignalfx:mainfrom
NikitaVoitov:fix/token-usage-streaming

NikitaVoitov commented Jan 13, 2026

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

zhirafovod left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

NikitaVoitov commented Jan 13, 2026

Summary

The Bug (Before)

The Fix (After)

Changes

callback_handler.py

Token Source Priority

Testing

Evidence

Files Changed

Uh oh!

github-actions bot commented Jan 13, 2026

Uh oh!

zhirafovod left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants