Skip to content

Failed to yield ToolReturnPart streaming with structured output type via tool call #2640

@lorr1

Description

@lorr1

Initial Checks

Description

When building an agent that uses tool calls to handle structured outputs, the final result tool return is never streamed back, it is only returned when the complete list of messages is returned. Specifically this part

part = _messages.ToolReturnPart(
. The tool return is created and added to the outputs but not yielded.

This may have been on purpose, and I'd like to understand why. In most use cases, structured outputs are one-shot runs where there is no chat back with the model. So not including the tool return is fine. But I wanted to chat with the model and assumed the streamed responses would give me an equivalent history for chat applications as calling all_messages().

Without yielding the tool return, the streamed responses cause a dangling tool call which causes issues if you try to feed that back as message history.

Happy to submit a PR to add the yielding if you folks want. Let me know what you think!

Example Code

# Run in a notebook, hence nest_asyncio
# Modified from https://ai.pydantic.dev/agents/#streaming-all-events-and-output
import asyncio
from pydantic import BaseModel
from pydantic_ai import Agent, StructuredDict
from pydantic_ai.messages import FinalResultEvent, FunctionToolCallEvent, FunctionToolResultEvent
import nest_asyncio

nest_asyncio.apply()

async def run_agent(agent, prompt):
    output_messages = []
    # Begin a node-by-node, streaming iteration
    async with agent.iter(prompt) as run:
        async for node in run:
            if Agent.is_user_prompt_node(node):
                # A user prompt node => The user has provided input
                output_messages.append(f'=== UserPromptNode: {node.user_prompt} ===')
            elif Agent.is_model_request_node(node):
                # A model request node => We can stream tokens from the model's request
                async with node.stream(run.ctx) as request_stream:
                    async for event in request_stream:
                        pass
                    output_messages.append(request_stream.get())
            elif Agent.is_call_tools_node(node):
                # A handle-response node => The model returned some data, potentially calls a tool
                async with node.stream(run.ctx) as handle_stream:
                    async for event in handle_stream:
                        if isinstance(event, FunctionToolCallEvent):
                            output_messages.append(event)
                        elif isinstance(event, FunctionToolResultEvent):
                            output_messages.append(event.result)
            elif Agent.is_end_node(node):
                # Once an End node is reached, the agent run is complete
                assert run.result is not None
                assert run.result.output == node.data.output
                output_messages.append(run.result.output)
    return output_messages


class TestSchema(BaseModel):
    user_name: str
    user_description: str


json_schema = TestSchema.model_json_schema()
agent = Agent(
    "openai:gpt-4o",
    system_prompt="Either ask for clarification or respond with the final result tool.",
    output_type=StructuredDict(json_schema),
)
# Full run sync to get all the messages
result = agent.run_sync("Hi")

for msg in result.all_messages():
    # The first message with be ModelRequest for system promptand user prompt
    # Second is the tool call
    # Third is the tool return
    print(msg)
    print()

print("==============")
msgs = asyncio.run(run_agent(agent, "Hi"))
for msg in msgs:
    # First is user prompt
    # Second is tool call
    # Third is just output - no tool return
    print(msg)
    print()

"""RESULTS
ModelRequest(parts=[SystemPromptPart(content='Either ask for clarification or respond with the final result tool.', timestamp=datetime.datetime(2025, 8, 22, 5, 28, 23, 285240, tzinfo=datetime.timezone.utc)), UserPromptPart(content='Hi', timestamp=datetime.datetime(2025, 8, 22, 5, 28, 23, 285246, tzinfo=datetime.timezone.utc))])

ModelResponse(parts=[ToolCallPart(tool_name='final_result', args='{"user_name":"Hello, how can I assist you today?","user_description":"User greeted."}', tool_call_id='call_xxFXtp52ubL9XPdC0XK21EbD')], usage=RequestUsage(input_tokens=64, output_tokens=30, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}), model_name='gpt-4o-2024-08-06', timestamp=datetime.datetime(2025, 8, 22, 5, 28, 23, tzinfo=TzInfo(UTC)), provider_request_id='chatcmpl-C7Ec3CbnlDC0oZGYKS7xxQQdFtaCB')

ModelRequest(parts=[ToolReturnPart(tool_name='final_result', content='Final result processed.', tool_call_id='call_xxFXtp52ubL9XPdC0XK21EbD', timestamp=datetime.datetime(2025, 8, 22, 5, 28, 24, 19144, tzinfo=datetime.timezone.utc))])

==============
=== UserPromptNode: Hi ===

ModelResponse(parts=[ToolCallPart(tool_name='final_result', args='', tool_call_id='call_Dt32fw0NnEoOKhPLdtpSeFsi')], usage=RequestUsage(details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}), model_name='gpt-4o', timestamp=datetime.datetime(2025, 8, 22, 5, 28, 24, tzinfo=TzInfo(UTC)))

{'user_name': 'Hi', 'user_description': 'An initial greeting message.'}
"""
# I am expecting a toolreturnpart object in the second response just like in the first

Python, Pydantic AI & LLM client version

Python version: 3.13.5 (main, Jul 11 2025, 22:26:07) [Clang 20.1.4 ]
Pydantic AI version: 0.7.4
LLM Client OpenAI version: 1.100.2

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions