Parent Agent spans report aggregated token usage from their child spans

### Initial Checks

- [x] I'm using the [latest version](https://github.com/pydantic/pydantic-ai/releases/latest) of Pydantic AI
- [x] I've searched for my issue in [the issue tracker](https://github.com/pydantic/pydantic-ai/issues) before opening this issue

### Description

## Description

**Issue:** Parent "agent run" spans report aggregated token usage from their child spans, causing double-counting in observability platforms that aggregate usage across all spans.

**Expected Behavior:**  
Parent agent spans should either:
1. Not report usage at all (let observability platforms aggregate from leaf spans), OR
2. Report only their own usage (excluding child spans)
3. Make it configurable
4. Add a property that marks the usage as an aggregation

**Actual Behavior:**  
Parent "agent run" spans include `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens` attributes that are the **sum of all child spans' usage**. When observability platforms aggregate usage by summing all spans, they count the same tokens twice:
- Once from the child spans (actual LLM calls)
- Once from the parent span (aggregated value)

**Example:**
- Child span 1: 56 input + 5 output = 61 tokens
- Child span 2: 57 input + 6 output = 63 tokens
- Parent span: 113 input + 11 output = 124 tokens ← **This is already 56+57 and 5+6!**
- **Total when summing all spans:** 226 input + 22 output = **248 tokens** ❌
- **Correct total:** 113 input + 11 output = **124 tokens** ✓

This affects any observability platform using OpenTelemetry (Opik, LangSmith, DataDog, New Relic, etc.) and leads to incorrect billing, metrics, and analytics.

### Minimal, Reproducible Example

```Python
"""Minimal reproduction of Pydantic AI usage double-counting issue"""
from typing import Literal

from langchain_core.messages import HumanMessage
from langgraph.graph import END, START, MessagesState, StateGraph
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.trace import set_tracer_provider
from opentelemetry.instrumentation.langchain import LangchainInstrumentor
from pydantic_ai import Agent, RunContext
from pydantic_ai.messages import ModelRequest, ModelResponse, TextPart, ToolCallPart
from pydantic_ai.models.function import AgentInfo, FunctionModel


# Setup OpenTelemetry with console exporter to inspect spans
tracer_provider = TracerProvider()
tracer_provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
set_tracer_provider(tracer_provider)

# Instrument Pydantic AI and LangChain
Agent.instrument_all()
LangchainInstrumentor().instrument()


# Create a dummy model that simulates tool calls with usage tracking
def make_dummy_model():
    phase: Literal["tool_call", "reply"] = "tool_call"

    def dummy(messages: list[ModelRequest | ModelResponse], info: AgentInfo) -> ModelResponse:
        nonlocal phase
        if phase == "tool_call":
            phase = "reply"
            # First call: return a tool call (usage: 56 input, 5 output)
            return ModelResponse(
                parts=[ToolCallPart.from_dict({"name": "get_weather", "args": {"location": "tokyo"}})],
                usage={"requests": 1, "tokens_prompt": 56, "tokens_completion": 5},
            )
        else:
            # Second call: return final result (usage: 57 input, 6 output)
            return ModelResponse(
                parts=[TextPart("Sunny and warm")],
                usage={"requests": 1, "tokens_prompt": 57, "tokens_completion": 6},
            )

    return FunctionModel(dummy)


# Create agent with a tool
agent = Agent(make_dummy_model(), system_prompt="Be concise")


@agent.tool
def get_weather(ctx: RunContext[None], location: str) -> str:
    """Get weather information"""
    return "sunny"


# Create LangGraph workflow that delegates to Pydantic AI
def delegate_to_agent(state: MessagesState):
    last_message = state["messages"][-1]
    result = agent.run_sync(last_message.content)
    return {"messages": [result.all_messages()[-1].parts[-1].content]}


graph = StateGraph(state_schema=MessagesState)
graph.add_node("delegate_to_agent", delegate_to_agent)
graph.add_edge(START, "delegate_to_agent")
graph.add_edge("delegate_to_agent", END)
workflow = graph.compile()


# Run the agent - inspect the console output to see the issue
print("\n" + "="*80)
print("Running agent workflow...")
print("="*80)

result = workflow.invoke({"messages": [HumanMessage(content="What's the weather?")]})

print("\n" + "="*80)
print("🐛 ISSUE: Parent span reports aggregated usage")
print("="*80)
print("When you inspect the OpenTelemetry spans above, you'll see:")
print("")
print("Span: 'chat function:dummy:' (call 1)")
print("  gen_ai.usage.input_tokens: 56")
print("  gen_ai.usage.output_tokens: 5")
print("")
print("Span: 'chat function:dummy:' (call 2)")
print("  gen_ai.usage.input_tokens: 57")
print("  gen_ai.usage.output_tokens: 6")
print("")
print("Span: 'agent run' (parent) ← PROBLEM")
print("  gen_ai.usage.input_tokens: 113  (= 56+57 from children!)")
print("  gen_ai.usage.output_tokens: 11  (= 5+6 from children!)")
print("  model_name: 'function:dummy:'")
print("")
print("When observability platforms sum ALL spans:")
print("  Total: (56+57+113) input + (5+6+11) output = 248 tokens ❌")
print("  Expected: (56+57) input + (5+6) output = 124 tokens ✓")
print("="*80 + "\n")


**To run:**

pip install pydantic-ai-slim langgraph opentelemetry-api opentelemetry-sdk \
            opentelemetry-exporter-otlp opentelemetry-instrumentation-langchain
python reproduce_issue.py
```

### Logfire Trace

Unfortunately, I don't have a public Logfire trace link available, but the issue can be reproduced with the code above using any OpenTelemetry backend (console exporter, OTLP endpoint, etc.).

### Python, Pydantic AI & LLM client version

- **Python:** 3.13.1
- **Pydantic AI:** 1.41.0 (pydantic-ai-slim)
- **LLM provider SDK:** N/A (using FunctionModel for reproduction)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parent Agent spans report aggregated token usage from their child spans #3995

Initial Checks

Description

Description

Minimal, Reproducible Example

Logfire Trace

Python, Pydantic AI & LLM client version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Parent Agent spans report aggregated token usage from their child spans #3995

Description

Initial Checks

Description

Description

Minimal, Reproducible Example

Logfire Trace

Python, Pydantic AI & LLM client version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions