-
Notifications
You must be signed in to change notification settings - Fork 8
Description
Description
When using LangChainInstrumentor with async LangChain/LangGraph applications, spans are not properly linked in a parent-child hierarchy. All spans appear as direct children of the root span instead of forming a nested trace tree, making it impossible to visualize the actual exec flow.
Note: Spans share the same Trace ID, but the parent-child relationships are incorrect (flat instead of nested).
Environment
splunk-otel-instrumentation-langchain: 0.1.xlangchain-core: 1.x /langgraph: 1.xopentelemetry-sdk: 1.39.x- Async execution via
await graph.ainvoke()orawait chain.ainvoke()
Steps to Reproduce
- Instrument an async LangChain/LangGraph application with
LangChainInstrumentor - Execute an agent using
await graph.ainvoke(...) - Export spans to Jaeger/Zipkin/Splunk O11y
- Observe that all spans are siblings with the same parent (flat hierarchy)
Expected Behavior
POST /agents/query (trace_id: 90a9d9...)
└── invoke_agent healthcare_agent (parent: HTTP span)
├── step model (parent: agent)
│ └── chat claude-3-5-sonnet (parent: step)
└── step tools (parent: agent)
└── tool query_member_data (parent: step)
Actual Behavior
POST /agents/query (trace_id: 90a9d9...)
├── invoke_agent healthcare_agent (parent: HTTP span)
├── step model (parent: HTTP span) ← WRONG! Should be child of agent
├── chat unknown_model (parent: HTTP span) ← WRONG! Should be child of step
├── step tools (parent: HTTP span) ← WRONG! Should be child of agent
└── tool query_member_data (parent: HTTP span) ← WRONG! Should be child of tools
All spans have the same Trace ID but all point to the HTTP root span as parent instead of their logical parent.
Root Cause Analysis
In callback_handler.py, the parent_run_id UUID is captured but never resolved to an actual OpenTelemetry Span object. The span emitter code in span.py checks for parent_span:
parent_span = getattr(invocation, "parent_span", None)
parent_ctx = trace.set_span_in_context(parent_span) if parent_span else NoneWhen parent_span is None, parent_ctx becomes None, and the span is created with the current active context. In synchronous code, this works because Python's context propagation maintains the parent. But in async code with await, the context is lost between coroutines, causing all spans to inherit from the outermost context (HTTP request span).
# Current code - parent_run_id captured but not resolved to span
inv = LLMInvocation(
parent_run_id=parent_run_id, # UUID stored but not used
# parent_span is never set!
)Evidence
Tested with healthcare agent deployed on Snowflake SPCS:
- Trace ID:
90a9d9e471e7ba2a1c88f96e2bc46293 - 14 spans all showing
parentId: daaa62fa3f11aae6(the HTTP root span) - UI shows flat waterfall with all spans at same indentation level
📊 Trace Span Evidence
Root span (correct):
{
"spanId": "daaa62fa3f11aae6",
"parentId": null,
"operationName": "POST /agents/sf-query"
}All child spans have SAME parentId (BUG!):
| Operation | Span ID | Parent ID | Expected Parent |
|---|---|---|---|
invoke_agent healthcare_agent |
6d7b48054121e9fc |
daaa62fa3f11aae6 |
HTTP root ✅ |
step model |
b73d110a068e930e |
daaa62fa3f11aae6 |
❌ Should be 6d7b... (agent) |
chat unknown_model |
ac8f798541d3812c |
daaa62fa3f11aae6 |
❌ Should be b73d... (step) |
step tools |
7fa31f5192dde5be |
daaa62fa3f11aae6 |
❌ Should be 6d7b... (agent) |
tool query_member_data |
e4456446b9a33f7c |
daaa62fa3f11aae6 |
❌ Should be 7fa3... (tools) |
step model (2nd) |
2bc05af97bdb1fe2 |
daaa62fa3f11aae6 |
❌ Should be 6d7b... (agent) |
chat unknown_model (2nd) |
225813e7ed81f752 |
daaa62fa3f11aae6 |
❌ Should be 2bc0... (step) |
Key observations:
- All spans share
traceId: 90a9d9e471e7ba2a1c88f96e2bc46293✅ - All spans have
parentId: daaa62fa3f11aae6(HTTP root) ❌ - Service version confirms unfixed code:
service.version: v1.0.106-no-fix
Full trace JSON: trace_flat_hierarchy_bug.json
Impact
- ❌ Trace visualization broken - flat hierarchy instead of nested tree
- ❌ Cannot trace execution flow - unclear which LLM call belongs to which step
- ❌ Agent Flow view broken - Splunk O11y Agent Flow shows incorrect structure