Skip to content

stream_output should yield data chunks as they arrive, but instead buffers the entire response #3144

@saikethan27

Description

@saikethan27

Initial Checks

Description

Bug Report

Describe the bug
When using agent.run_stream in conjunction with result.stream_output, the output is not streamed incrementally. Instead, the entire response from the LLM is buffered and delivered only after the generation is fully complete. This makes the stream_output method behave like a synchronous call (run_sync) rather than a real-time stream.

To Reproduce
The issue can be consistently reproduced using the official "whales" example from the pydantic-ai documentation.

Code:

# From: https://ai.pydantic.dev/examples/stream-whales/#example-code

async with agent.run_stream('Generate me details of 30 species of Whale.') as result:
    console.print('Response:', style='green')

    async for whales in result.stream_output(debounce_by=0.01):
        # This block is only entered once, at the very end.
        table = Table(
            title='Species of Whale',
            caption='Streaming Structured responses from GPT-4',
            width=120,
        )
        # ... (rest of the rendering code)

Expected behavior
The async for loop on result.stream_output should yield data chunks as soon as they are available from the LLM, allowing for incremental processing and rendering of the output.

Current behavior
The program waits until the LLM has generated the complete list of 30 whale species. Only then does the async for loop execute, receiving all the data in a single iteration.

Debugging Investigation
While inspecting the code execution, I noticed that the program flow within pydantic_ai/result.py seems to first collect all messages by awaiting the full LLM response. Only after the entire result is gathered does the execution path enter the async def stream_output method.

This suggests the issue lies in how the underlying response is being awaited before the streaming logic is initiated, which defeats the purpose of a streaming operation.

I am only getting issue when using agent = Agent(model=model, output_type=list[Whale]) if run without output_type then data is streaming some issue with pydantic output validation while streaming

Example Code

Python, Pydantic AI & LLM client version

i have tried open-router and google both have issue all models in google Gemini same issue with all

provider = GoogleProvider(api_key='xxxxx')
model = GoogleModel('gemini-flash-lite-latest', provider=provider)


# model = OpenAIChatModel(
#     'openai/gpt-5',
#     provider=OpenRouterProvider(api_key='sk-or-xxxx'),
# )

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions