Skip to content

[Bug] LangChain spans have flat hierarchy instead of proper parent-child nesting in async applications #122

@NikitaVoitov

Description

@NikitaVoitov

Description

When using LangChainInstrumentor with async LangChain/LangGraph applications, spans are not properly linked in a parent-child hierarchy. All spans appear as direct children of the root span instead of forming a nested trace tree, making it impossible to visualize the actual exec flow.

Note: Spans share the same Trace ID, but the parent-child relationships are incorrect (flat instead of nested).

Environment

  • splunk-otel-instrumentation-langchain: 0.1.x
  • langchain-core: 1.x / langgraph: 1.x
  • opentelemetry-sdk: 1.39.x
  • Async execution via await graph.ainvoke() or await chain.ainvoke()

Steps to Reproduce

  1. Instrument an async LangChain/LangGraph application with LangChainInstrumentor
  2. Execute an agent using await graph.ainvoke(...)
  3. Export spans to Jaeger/Zipkin/Splunk O11y
  4. Observe that all spans are siblings with the same parent (flat hierarchy)

Expected Behavior

POST /agents/query (trace_id: 90a9d9...)
└── invoke_agent healthcare_agent (parent: HTTP span)
    ├── step model (parent: agent)
    │   └── chat claude-3-5-sonnet (parent: step)
    └── step tools (parent: agent)
        └── tool query_member_data (parent: step)

Actual Behavior

POST /agents/query (trace_id: 90a9d9...)
├── invoke_agent healthcare_agent (parent: HTTP span)
├── step model (parent: HTTP span)      ← WRONG! Should be child of agent
├── chat unknown_model (parent: HTTP span)  ← WRONG! Should be child of step
├── step tools (parent: HTTP span)      ← WRONG! Should be child of agent
└── tool query_member_data (parent: HTTP span)  ← WRONG! Should be child of tools

All spans have the same Trace ID but all point to the HTTP root span as parent instead of their logical parent.

Root Cause Analysis

In callback_handler.py, the parent_run_id UUID is captured but never resolved to an actual OpenTelemetry Span object. The span emitter code in span.py checks for parent_span:

parent_span = getattr(invocation, "parent_span", None)
parent_ctx = trace.set_span_in_context(parent_span) if parent_span else None

When parent_span is None, parent_ctx becomes None, and the span is created with the current active context. In synchronous code, this works because Python's context propagation maintains the parent. But in async code with await, the context is lost between coroutines, causing all spans to inherit from the outermost context (HTTP request span).

# Current code - parent_run_id captured but not resolved to span
inv = LLMInvocation(
    parent_run_id=parent_run_id,  # UUID stored but not used
    # parent_span is never set!
)

Evidence

Tested with healthcare agent deployed on Snowflake SPCS:

  • Trace ID: 90a9d9e471e7ba2a1c88f96e2bc46293
  • 14 spans all showing parentId: daaa62fa3f11aae6 (the HTTP root span)
  • UI shows flat waterfall with all spans at same indentation level
📊 Trace Span Evidence

Root span (correct):

{
  "spanId": "daaa62fa3f11aae6",
  "parentId": null,
  "operationName": "POST /agents/sf-query"
}

All child spans have SAME parentId (BUG!):

Operation Span ID Parent ID Expected Parent
invoke_agent healthcare_agent 6d7b48054121e9fc daaa62fa3f11aae6 HTTP root ✅
step model b73d110a068e930e daaa62fa3f11aae6 ❌ Should be 6d7b... (agent)
chat unknown_model ac8f798541d3812c daaa62fa3f11aae6 ❌ Should be b73d... (step)
step tools 7fa31f5192dde5be daaa62fa3f11aae6 ❌ Should be 6d7b... (agent)
tool query_member_data e4456446b9a33f7c daaa62fa3f11aae6 ❌ Should be 7fa3... (tools)
step model (2nd) 2bc05af97bdb1fe2 daaa62fa3f11aae6 ❌ Should be 6d7b... (agent)
chat unknown_model (2nd) 225813e7ed81f752 daaa62fa3f11aae6 ❌ Should be 2bc0... (step)

Key observations:

  • All spans share traceId: 90a9d9e471e7ba2a1c88f96e2bc46293
  • All spans have parentId: daaa62fa3f11aae6 (HTTP root) ❌
  • Service version confirms unfixed code: service.version: v1.0.106-no-fix

Full trace JSON: trace_flat_hierarchy_bug.json

Image

Impact

  • Trace visualization broken - flat hierarchy instead of nested tree
  • Cannot trace execution flow - unclear which LLM call belongs to which step
  • Agent Flow view broken - Splunk O11y Agent Flow shows incorrect structure

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions