-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Initial Checks
- I confirm that I'm using the latest version of Pydantic AI
- I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue
Description
Bug Report
Describe the bug
When using agent.run_stream in conjunction with result.stream_output, the output is not streamed incrementally. Instead, the entire response from the LLM is buffered and delivered only after the generation is fully complete. This makes the stream_output method behave like a synchronous call (run_sync) rather than a real-time stream.
To Reproduce
The issue can be consistently reproduced using the official "whales" example from the pydantic-ai documentation.
Code:
# From: https://ai.pydantic.dev/examples/stream-whales/#example-code
async with agent.run_stream('Generate me details of 30 species of Whale.') as result:
console.print('Response:', style='green')
async for whales in result.stream_output(debounce_by=0.01):
# This block is only entered once, at the very end.
table = Table(
title='Species of Whale',
caption='Streaming Structured responses from GPT-4',
width=120,
)
# ... (rest of the rendering code)Expected behavior
The async for loop on result.stream_output should yield data chunks as soon as they are available from the LLM, allowing for incremental processing and rendering of the output.
Current behavior
The program waits until the LLM has generated the complete list of 30 whale species. Only then does the async for loop execute, receiving all the data in a single iteration.
Debugging Investigation
While inspecting the code execution, I noticed that the program flow within pydantic_ai/result.py seems to first collect all messages by awaiting the full LLM response. Only after the entire result is gathered does the execution path enter the async def stream_output method.
This suggests the issue lies in how the underlying response is being awaited before the streaming logic is initiated, which defeats the purpose of a streaming operation.
I am only getting issue when using agent = Agent(model=model, output_type=list[Whale]) if run without output_type then data is streaming some issue with pydantic output validation while streaming
Example Code
Python, Pydantic AI & LLM client version
i have tried open-router and google both have issue all models in google Gemini same issue with all
provider = GoogleProvider(api_key='xxxxx')
model = GoogleModel('gemini-flash-lite-latest', provider=provider)
# model = OpenAIChatModel(
# 'openai/gpt-5',
# provider=OpenRouterProvider(api_key='sk-or-xxxx'),
# )