-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Description
When using LangChain with streaming enabled (streaming=True or stream_options={"include_usage": True}), the gen_ai.usage.input_tokens and gen_ai.usage.output_tokens span attributes are missing from traces. This prevents accurate token usage tracking and cost attribution for streaming LLM calls.
Environment
splunk-otel-instrumentation-langchain: 0.1.xlangchain-openai: 0.3.x (orlangchain-anthropic)- Provider: OpenAI, Anthropic, or any provider using streaming with
usage_metadata
Steps to Reproduce
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
# Create LLM with streaming enabled
llm = ChatOpenAI(
model="gpt-4o-mini",
streaming=True,
model_kwargs={"stream_options": {"include_usage": True}}
)
# Invoke - response.usage_metadata WILL have tokens
response = llm.invoke([HumanMessage(content="Say hello")])
print(response.usage_metadata) # {'input_tokens': 40, 'output_tokens': 55, ...}
# But the trace span will be missing token attributesExpected Behavior
Span: chat gpt-4o-mini
Attributes:
gen_ai.usage.input_tokens: 40
gen_ai.usage.output_tokens: 55
Actual Behavior
Span: chat gpt-4o-mini
Attributes:
gen_ai.request.model: gpt-4o-mini
gen_ai.response.model: gpt-4o-mini-2024-07-18
# No gen_ai.usage.* attributes
Evidence
Direct comparison from live testing with Cisco/OpenAI endpoint:
📊 Non-Streaming Trace (tokens present) - click to expand
Trace ID: 49f46b971825fa178f4ae86812a9f1d2
{
"traceId": "49f46b971825fa178f4ae86812a9f1d2",
"operationName": "chat gpt-4o-mini",
"tags": {
"gen_ai.request.model": "gpt-4o-mini",
"gen_ai.response.model": "gpt-4o-mini-2024-07-18",
"gen_ai.usage.input_tokens": 31,
"gen_ai.usage.output_tokens": 5,
"gen_ai.provider.name": "openai"
}
}✅ Token usage attributes present!
📊 Streaming Trace (tokens missing) - click to expand
Trace ID: 77432872a967d4321701ce1f22032d8c
{
"traceId": "77432872a967d4321701ce1f22032d8c",
"operationName": "chat gpt-4o-mini",
"tags": {
"gen_ai.request.model": "gpt-4o-mini",
"gen_ai.response.model": "gpt-4o-mini-2024-07-18",
"gen_ai.provider.name": "openai"
}
}
❌ NO gen_ai.usage.input_tokens or gen_ai.usage.output_tokens!
Yet Python showed tokens were available:
response.usage_metadata: {'input_tokens': 40, 'output_tokens': 55, 'total_tokens': 95}
📊 Streaming Trace AFTER FIX (tokens presented) - click to expand
Trace ID: 303595c0d1031acdae9bacd46083d87b
{
"traceId": "303595c0d1031acdae9bacd46083d87b",
"operationName": "chat gpt-4o-mini",
"tags": {
"gen_ai.request.model": "gpt-4o-mini",
"gen_ai.response.model": "gpt-4o-mini-2024-07-18",
"gen_ai.usage.input_tokens": 40,
"gen_ai.usage.output_tokens": 57,
"gen_ai.provider.name": "openai"
}
}✅ Token usage attributes now captured in streaming mode!
Full trace evidence:
Root Cause Analysis
The current code only extracts tokens from llm_output.token_usage:
# Current code - ONLY checks llm_output (non-streaming path)
llm_output = getattr(response, "llm_output", {}) or {}
usage = llm_output.get("usage") or llm_output.get("token_usage") or {}
inv.input_tokens = usage.get("prompt_tokens")
inv.output_tokens = usage.get("completion_tokens")In streaming mode:
llm_outputis empty or missing token data- Tokens are in
response.generations[0][0].message.usage_metadatainstead
Affected Providers
| Provider | Streaming Token Source | Status |
|---|---|---|
OpenAI (ChatOpenAI) |
message.usage_metadata (via stream_options) |
❌ Missing |
Anthropic (ChatAnthropic) |
message.usage_metadata (via message_delta) |
❌ Missing |
Snowflake (ChatSnowflake) |
Custom (may populate llm_output) |
API References
OpenAI:
When
stream_options: {"include_usage": true}is set, token usage is returned in the final streaming chunk viausagefield.
— OpenAI API Reference
Anthropic:
The token counts shown in the
usagefield of themessage_deltaevent are cumulative.
— Anthropic Streaming Messages Docs
Impact
- ❌ Cost tracking broken for streaming calls (often 50%+ of production traffic)
- ❌ Token budgets unenforceable - can't track streaming token consumption
- ❌ Billing reconciliation impossible - streaming costs not attributed
- ❌ Performance analysis incomplete - can't correlate tokens with latency for streaming
- ❌ Dashboards show gaps - token metrics missing for streaming spans