Skip to content

[Bug] gen_ai.usage.input_tokens and output_tokens missing when LLM streaming is used #126

@NikitaVoitov

Description

@NikitaVoitov

Description

When using LangChain with streaming enabled (streaming=True or stream_options={"include_usage": True}), the gen_ai.usage.input_tokens and gen_ai.usage.output_tokens span attributes are missing from traces. This prevents accurate token usage tracking and cost attribution for streaming LLM calls.

Environment

  • splunk-otel-instrumentation-langchain: 0.1.x
  • langchain-openai: 0.3.x (or langchain-anthropic)
  • Provider: OpenAI, Anthropic, or any provider using streaming with usage_metadata

Steps to Reproduce

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

# Create LLM with streaming enabled
llm = ChatOpenAI(
    model="gpt-4o-mini",
    streaming=True,
    model_kwargs={"stream_options": {"include_usage": True}}
)

# Invoke - response.usage_metadata WILL have tokens
response = llm.invoke([HumanMessage(content="Say hello")])
print(response.usage_metadata)  # {'input_tokens': 40, 'output_tokens': 55, ...}

# But the trace span will be missing token attributes

Expected Behavior

Span: chat gpt-4o-mini
Attributes:
  gen_ai.usage.input_tokens: 40
  gen_ai.usage.output_tokens: 55

Actual Behavior

Span: chat gpt-4o-mini
Attributes:
  gen_ai.request.model: gpt-4o-mini
  gen_ai.response.model: gpt-4o-mini-2024-07-18
  # No gen_ai.usage.* attributes

Evidence

Direct comparison from live testing with Cisco/OpenAI endpoint:

📊 Non-Streaming Trace (tokens present) - click to expand

Trace ID: 49f46b971825fa178f4ae86812a9f1d2

{
  "traceId": "49f46b971825fa178f4ae86812a9f1d2",
  "operationName": "chat gpt-4o-mini",
  "tags": {
    "gen_ai.request.model": "gpt-4o-mini",
    "gen_ai.response.model": "gpt-4o-mini-2024-07-18",
    "gen_ai.usage.input_tokens": 31,
    "gen_ai.usage.output_tokens": 5,
    "gen_ai.provider.name": "openai"
  }
}

✅ Token usage attributes present!

📊 Streaming Trace (tokens missing) - click to expand

Trace ID: 77432872a967d4321701ce1f22032d8c

{
  "traceId": "77432872a967d4321701ce1f22032d8c",
  "operationName": "chat gpt-4o-mini",
  "tags": {
    "gen_ai.request.model": "gpt-4o-mini",
    "gen_ai.response.model": "gpt-4o-mini-2024-07-18",
    "gen_ai.provider.name": "openai"
  }
}
Image

NO gen_ai.usage.input_tokens or gen_ai.usage.output_tokens!

Yet Python showed tokens were available:

response.usage_metadata: {'input_tokens': 40, 'output_tokens': 55, 'total_tokens': 95}
📊 Streaming Trace AFTER FIX (tokens presented) - click to expand

Trace ID: 303595c0d1031acdae9bacd46083d87b

{
  "traceId": "303595c0d1031acdae9bacd46083d87b",
  "operationName": "chat gpt-4o-mini",
  "tags": {
    "gen_ai.request.model": "gpt-4o-mini",
    "gen_ai.response.model": "gpt-4o-mini-2024-07-18",
    "gen_ai.usage.input_tokens": 40,
    "gen_ai.usage.output_tokens": 57,
    "gen_ai.provider.name": "openai"
  }
}

✅ Token usage attributes now captured in streaming mode!

Full trace evidence:

Root Cause Analysis

The current code only extracts tokens from llm_output.token_usage:

# Current code - ONLY checks llm_output (non-streaming path)
llm_output = getattr(response, "llm_output", {}) or {}
usage = llm_output.get("usage") or llm_output.get("token_usage") or {}
inv.input_tokens = usage.get("prompt_tokens")
inv.output_tokens = usage.get("completion_tokens")

In streaming mode:

  • llm_output is empty or missing token data
  • Tokens are in response.generations[0][0].message.usage_metadata instead

Affected Providers

Provider Streaming Token Source Status
OpenAI (ChatOpenAI) message.usage_metadata (via stream_options) ❌ Missing
Anthropic (ChatAnthropic) message.usage_metadata (via message_delta) ❌ Missing
Snowflake (ChatSnowflake) Custom (may populate llm_output) ⚠️ Varies

API References

OpenAI:

When stream_options: {"include_usage": true} is set, token usage is returned in the final streaming chunk via usage field.
OpenAI API Reference

Anthropic:

The token counts shown in the usage field of the message_delta event are cumulative.
Anthropic Streaming Messages Docs

Impact

  • Cost tracking broken for streaming calls (often 50%+ of production traffic)
  • Token budgets unenforceable - can't track streaming token consumption
  • Billing reconciliation impossible - streaming costs not attributed
  • Performance analysis incomplete - can't correlate tokens with latency for streaming
  • Dashboards show gaps - token metrics missing for streaming spans

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions