Skip to content

reasoning_tokens is calculated twice when streaming #2918

@yf-yang

Description

@yf-yang

Initial Checks

Description

Check the code below. RUN usage outside's reasoning_tokens is twice of RUN usage's

If that async for event in response_stream is commented out , then the usage is correct.

Example Code

python
import asyncio

from dotenv import load_dotenv
from pydantic_ai import Agent, ModelRetry
from pydantic_ai.models.openai import OpenAIResponsesModel, OpenAIResponsesModelSettings

load_dotenv()

agent = Agent(
  model=OpenAIResponsesModel("gpt-5-nano"),
  model_settings=OpenAIResponsesModelSettings(
    openai_reasoning_effort="low",
    openai_service_tier="flex",
    max_tokens=8192,
    timeout=15,
  ),
)

retried = False


@agent.tool_plain
def foo():
  global retried
  if not retried:
    retried = True
    raise ModelRetry("Please retry this tool again")
  return "Success"


async def main():
  async with agent.iter("Calculate 100 * 200 / 3, then call tool foo") as run:
    async for node in run:
      if Agent.is_model_request_node(node):
        async with node.stream(run.ctx) as response_stream:
          async for event in response_stream:
            # print(event)
            pass
          print("RESPONSE usage", response_stream.get().usage)
          print("STREAM usage", response_stream.usage())
          print('RUN usage', run.usage())
        print("RUN usage outside", run.usage())

    print(run.result.all_messages())


if __name__ == "__main__":
  asyncio.run(main())

Python, Pydantic AI & LLM client version

1.0.6

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions