-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Initial Checks
- I confirm that I'm using the latest version of Pydantic AI
- I confirm that I searched for my issue in https://github.com/pydantic/pydantic-ai/issues before opening this issue
Description
When building an agent that uses tool calls to handle structured outputs, the final result tool return is never streamed back, it is only returned when the complete list of messages is returned. Specifically this part
| part = _messages.ToolReturnPart( |
This may have been on purpose, and I'd like to understand why. In most use cases, structured outputs are one-shot runs where there is no chat back with the model. So not including the tool return is fine. But I wanted to chat with the model and assumed the streamed responses would give me an equivalent history for chat applications as calling all_messages().
Without yielding the tool return, the streamed responses cause a dangling tool call which causes issues if you try to feed that back as message history.
Happy to submit a PR to add the yielding if you folks want. Let me know what you think!
Example Code
# Run in a notebook, hence nest_asyncio
# Modified from https://ai.pydantic.dev/agents/#streaming-all-events-and-output
import asyncio
from pydantic import BaseModel
from pydantic_ai import Agent, StructuredDict
from pydantic_ai.messages import FinalResultEvent, FunctionToolCallEvent, FunctionToolResultEvent
import nest_asyncio
nest_asyncio.apply()
async def run_agent(agent, prompt):
output_messages = []
# Begin a node-by-node, streaming iteration
async with agent.iter(prompt) as run:
async for node in run:
if Agent.is_user_prompt_node(node):
# A user prompt node => The user has provided input
output_messages.append(f'=== UserPromptNode: {node.user_prompt} ===')
elif Agent.is_model_request_node(node):
# A model request node => We can stream tokens from the model's request
async with node.stream(run.ctx) as request_stream:
async for event in request_stream:
pass
output_messages.append(request_stream.get())
elif Agent.is_call_tools_node(node):
# A handle-response node => The model returned some data, potentially calls a tool
async with node.stream(run.ctx) as handle_stream:
async for event in handle_stream:
if isinstance(event, FunctionToolCallEvent):
output_messages.append(event)
elif isinstance(event, FunctionToolResultEvent):
output_messages.append(event.result)
elif Agent.is_end_node(node):
# Once an End node is reached, the agent run is complete
assert run.result is not None
assert run.result.output == node.data.output
output_messages.append(run.result.output)
return output_messages
class TestSchema(BaseModel):
user_name: str
user_description: str
json_schema = TestSchema.model_json_schema()
agent = Agent(
"openai:gpt-4o",
system_prompt="Either ask for clarification or respond with the final result tool.",
output_type=StructuredDict(json_schema),
)
# Full run sync to get all the messages
result = agent.run_sync("Hi")
for msg in result.all_messages():
# The first message with be ModelRequest for system promptand user prompt
# Second is the tool call
# Third is the tool return
print(msg)
print()
print("==============")
msgs = asyncio.run(run_agent(agent, "Hi"))
for msg in msgs:
# First is user prompt
# Second is tool call
# Third is just output - no tool return
print(msg)
print()
"""RESULTS
ModelRequest(parts=[SystemPromptPart(content='Either ask for clarification or respond with the final result tool.', timestamp=datetime.datetime(2025, 8, 22, 5, 28, 23, 285240, tzinfo=datetime.timezone.utc)), UserPromptPart(content='Hi', timestamp=datetime.datetime(2025, 8, 22, 5, 28, 23, 285246, tzinfo=datetime.timezone.utc))])
ModelResponse(parts=[ToolCallPart(tool_name='final_result', args='{"user_name":"Hello, how can I assist you today?","user_description":"User greeted."}', tool_call_id='call_xxFXtp52ubL9XPdC0XK21EbD')], usage=RequestUsage(input_tokens=64, output_tokens=30, details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}), model_name='gpt-4o-2024-08-06', timestamp=datetime.datetime(2025, 8, 22, 5, 28, 23, tzinfo=TzInfo(UTC)), provider_request_id='chatcmpl-C7Ec3CbnlDC0oZGYKS7xxQQdFtaCB')
ModelRequest(parts=[ToolReturnPart(tool_name='final_result', content='Final result processed.', tool_call_id='call_xxFXtp52ubL9XPdC0XK21EbD', timestamp=datetime.datetime(2025, 8, 22, 5, 28, 24, 19144, tzinfo=datetime.timezone.utc))])
==============
=== UserPromptNode: Hi ===
ModelResponse(parts=[ToolCallPart(tool_name='final_result', args='', tool_call_id='call_Dt32fw0NnEoOKhPLdtpSeFsi')], usage=RequestUsage(details={'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}), model_name='gpt-4o', timestamp=datetime.datetime(2025, 8, 22, 5, 28, 24, tzinfo=TzInfo(UTC)))
{'user_name': 'Hi', 'user_description': 'An initial greeting message.'}
"""
# I am expecting a toolreturnpart object in the second response just like in the firstPython, Pydantic AI & LLM client version
Python version: 3.13.5 (main, Jul 11 2025, 22:26:07) [Clang 20.1.4 ]
Pydantic AI version: 0.7.4
LLM Client OpenAI version: 1.100.2