-
Notifications
You must be signed in to change notification settings - Fork 424
feat: Add Structured Output as part of the agent loop #943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
feat: Implement comprehensive structured output system This feature addition introduces a complete structured output system that allows agents to return validated Pydantic models instead of raw text responses, providing type safety and consistency for AI agent interactions. ## Key Features Added ### Core Structured Output System - **New output module**: Complete structured output architecture with base classes, modes, and utilities - **Agent integration**: Native structured_output_type parameter support in Agent class and __call__ method - **Event loop integration**: Enhanced event loop to handle structured output processing and validation - **Tool-based fallback**: Automatic fallback mechanism using structured output tools when native support unavailable ### Architecture Components - **OutputMode base class**: Abstract interface for different structured output implementations - **ToolMode implementation**: Tool-based structured output mode with caching and retry logic - **OutputSchema resolution**: Centralized schema resolution utility with BASE_KEY constant - **Structured output handler**: Comprehensive handler with logging, caching, and error handling ### Developer Experience - **PydanticAI-style interface**: Familiar API pattern for structured output specification - **Comprehensive documentation**: 400+ line README with examples, use cases, and best practices - **Type safety**: Full typing support with proper generic types and validation - **Streaming compatibility**: Works seamlessly with existing streaming functionality ### Tool Integration - **Structured output tool**: Dedicated tool for handling structured output requests - **Registry integration**: Enhanced tool registry to support structured output tools - **Backward compatibility**: Maintains compatibility with existing tool ecosystem ## Technical Implementation ### Files Added - `src/strands/output/`: Complete output module with base classes, modes, and utilities - `src/strands/tools/structured_output/`: Dedicated structured output tool implementation - `src/strands/types/output.py`: Type definitions for output system - Comprehensive documentation and examples ### Files Modified - Enhanced Agent class with structured_output_type parameter and default schema support - Updated event loop for structured output processing and validation - Improved AgentResult to include structured_output field - Model provider updates for structured output compatibility ### Key Improvements - **Error handling**: Robust error handling with fallback mechanisms - **Performance**: Caching system for improved performance with repeated schema usage - **Logging**: Enhanced logging for debugging and monitoring structured output operations - **Code quality**: Comprehensive formatting, linting, and style improvements ## Usage Examples python # Basic structured output from strands import Agent from pydantic import BaseModel class UserProfile(BaseModel): name: str age: int occupation: str agent = Agent() result = agent("Create a profile for John, 25, dentist", structured_output_type=UserProfile) profile = result.structured_output # Validated UserProfile instance ## Migration Notes - Existing agents continue to work without changes - New structured_output_type parameter is optional - Legacy output modes are deprecated but still functional Resolves: Multiple structured output related issues
Add explicit user message instructing the agent to format previous response as structured output during forced structured output attempts.
tools: List[ToolSpec] = [tool_spec for tool_spec in all_tools.values()] | ||
return tools | ||
|
||
def register_dynamic_tool(self, tool: AgentTool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure we should be adding/removing the tool dynamically - can we simply append the tool_spec inside of the event_loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I'm not opposed but to better understand, what is the downside? Are we concerned that others will use this method to dynamically register tools? Or is it something else? Wouldn't appending the tool_spec basically be dynamically adding but without a method? (There does seem to be a specific self.dynamic_tools
variable. When is that supposed to be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, it's better to be functional (don't modify state unless you have to) as it's side-effect free. In this case, you always have to remember to unregister even in exceptional cases and while reading the code, you need to remember that something is temporarily added.
Are we even calling unregister_dynamic_tool right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to being wrong about this, but it feels... odd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. You're making an interesting point with the register/deregister "mental overhead". I do like how it has more of a 'native' feel when it's part of the toolbox - even though we add it dynamically. Lemme see if there's a better way to add to the tools we provide the model w/o the dynamic register. I can probably just do something like tool_specs = existing tool specs + SO tool spec
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into this a bit more. I see we need the tool in registry as it's accessed here:
tool_info = agent.tool_registry.dynamic_tools.get(tool_name) |
If we add it in the event loop like so and don't register it:
. . .
tool_specs = agent.tool_registry.get_all_tool_specs() + so_tool_specs
try:
async for event in stream_messages(
agent.model, agent.system_prompt, agent.messages, tool_specs, structured_output_context.tool_choice
):
. . .
it does not make it to the registry and will result in a tool_name=<UserProfile>, available_tools=<[]> | tool not found in registry
exception when it reaches the
async def _stream( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh 😢
There's still the issue that I don't think we're calling unregister_dynamic_tool
right now FWIW
I think we'd block the new structured_output change (#919) on whether or not someone is using kwargs vs invocation_state. That is, if you're using structured_output and you're trying to pass additional features, then you must be using agent.invoke_async(output_model=Person, additional_arg=some_value) # does not use structured_output
agent.invoke_async(output_model=Person, invocation_state={"additional_arg": some_value}) # uses structured_output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: dont name directories or files utils
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a more concise readme here? I dont think this needs to be so verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Once I add tests i will reduce the readme scope
Replace StructuredOutputHandler with StructuredOutputContext to provide better encapsulation and cleaner separation of concerns. This change: - Introduces StructuredOutputContext to manage structured output state - Updates Agent and event loop to use the new context-based approach - Modifies tool executors to work with the context pattern - Removes the handler-based implementation in favor of context
- Replace mode.get_tool_specs() calls with cached tool_specs property - Improve code formatting and add trailing commas
Rename parameter throughout codebase for better clarity. This change improves API consistency and makes the parameter's purpose more explicit.
Simplify output mode options by removing unused NativeMode and PromptMode implementations, keeping only ToolMode for structured output. This reduces complexity while maintaining full functionality through the tool-based approach.
src/strands/agent/agent_result.py
Outdated
message: Message | ||
metrics: EventLoopMetrics | ||
state: Any | ||
structured_output: Optional[BaseModel] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future we plan on allowing for more types than BaseModel
however I think it's best to set the type then. It probably won't be any
but more like a very large set of types that we would extract out to StructuredOutputType
src/strands/event_loop/event_loop.py
Outdated
) | ||
agent.messages.append({ | ||
"role": "user", | ||
"content": [{"text": "You must format the previous response as structured output."}] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. hmmm
src/strands/event_loop/event_loop.py
Outdated
forced_invocation_state["tool_choice"] = {"any": {}} | ||
forced_invocation_state["_structured_output_only"] = True | ||
|
||
events = recurse_event_loop(agent=agent, invocation_state=forced_invocation_state) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this code has been slightly updated but I think you're question is beyond the updates).
Are you asking why recurse and not call the model.structured_output
? It's because we our using the Tool based approach and if the model didn't call it on it's own, we will pass in only the StructuredOutputTool and then recurse the event loop so the model calls it on it's own.
- list[ContentBlock]: Multi-modal content blocks | ||
- list[Message]: Complete messages with roles | ||
- None: Use existing conversation history | ||
structured_output_model: Pydantic model type(s) for structured output (overrides agent default). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to pass None
if you don't want structured output? Should that be an option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ignore it and it'll use the default None. The user can also set structured_output_model=None
as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was:
agent = Agent(structuctured_output_model=Cat)
...
agent.invoke_async(structured_output_model=None) # to turn it off
Not a blocker though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Left a comment above, this goes against the convention of the Agent class. We used to have every class level attribute overridable in the invoke method, but this was tripping customers up. We ended up deciding on having one way to set things so its more obvious what is going on
@deprecated( | ||
"Agent.structured_output method is deprecated." | ||
" You should pass in `structured_output_model` directly into the agent invocation." | ||
" see the <LINK> for more details" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO - update LINK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @deprecated
annotation is only available in python >= 3.13. You should use warnings.warn
instead as thats compatible with all versions of python we currently support:
sdk-python/src/strands/tools/loader.py
Line 160 in 7fbc9dc
warnings.warn( |
- Tracking expected tool names from output schemas | ||
- Managing validated result storage | ||
- Extracting structured output results from tool executions | ||
- Managing retry attempts for structured output forcing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In what case does retrying take effect and when it's useful? If it's a tool-call, what can cause the LLM to fail the tool call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for the models like Writer that don't accept tool choice the model can fail to call the tool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs say they support it FWIW: https://dev.writer.com/api-reference/completion-api/chat-completion#body-tool-choice - so is this more of a provider issue?
I think I prefer avoiding retries in favor of putting stuff like this into the provider. Thoughts?
tools: List[ToolSpec] = [tool_spec for tool_spec in all_tools.values()] | ||
return tools | ||
|
||
def register_dynamic_tool(self, tool: AgentTool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to being wrong about this, but it feels... odd
…xt explicitly - Delete output module (OutputSchema, ToolMode, utils) - Replace OutputSchema with direct tool based usage throughout - Update StructuredOutputContext to work without OutputSchema wrapper - Simplify structured output handling in agent and tool executors - Plumb the StructuredOutputContext explicitly instead of kwargs This simplifies the codebase by removing an unnecessary abstraction layer and using Pydantic models directly for structured output configuration.
Replace Optional[Type[BaseModel]] with Type[BaseModel] | None across agent, event_loop, and structured_output modules for consistency with Python 3.10+ type hint syntax.
- Replace structured_output_model checks with is_enabled property - Raise StructuredOutputException when retry limit exceeded instead of silent failure - cleanup
- Add public API exports for convert_pydantic_to_tool_spec - Replace Any with BaseModel type hints for structured output - Simplify condition checks using is_enabled property - Clean up module docstrings and comments
- list[ContentBlock]: Multi-modal content blocks | ||
- list[Message]: Complete messages with roles | ||
- None: Use existing conversation history | ||
structured_output_model: Pydantic model type(s) for structured output (overrides agent default). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was:
agent = Agent(structuctured_output_model=Cat)
...
agent.invoke_async(structured_output_model=None) # to turn it off
Not a blocker though
invocation_state["agent"] = self | ||
|
||
if structured_output_context and structured_output_context.structured_output_tool: | ||
self.tool_registry.register_dynamic_tool(structured_output_context.structured_output_tool) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do we remove this? I believe this should be done in the same method as where we register it.
Also consider if there's a Context Manager we could use to ensure that we're calling unregister. There should also be a test
tools: List[ToolSpec] = [tool_spec for tool_spec in all_tools.values()] | ||
return tools | ||
|
||
def register_dynamic_tool(self, tool: AgentTool) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh 😢
There's still the issue that I don't think we're calling unregister_dynamic_tool
right now FWIW
- Tracking expected tool names from output schemas | ||
- Managing validated result storage | ||
- Extracting structured output results from tool executions | ||
- Managing retry attempts for structured output forcing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs say they support it FWIW: https://dev.writer.com/api-reference/completion-api/chat-completion#body-tool-choice - so is this more of a provider issue?
I think I prefer avoiding retries in favor of putting stuff like this into the provider. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to have a _ in the beginning of the file name - let's keep this an internal implementation detail
return False | ||
return self.attempts < self.MAX_STRUCTURED_OUTPUT_ATTEMPTS | ||
|
||
def set_forced_mode(self, tool_choice: dict | None = None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be private or inlined now?
self.forced_mode = True | ||
self.tool_choice = tool_choice or {"any": {}} | ||
|
||
def has_structured_output_tool(self, tool_uses: list[ToolUse]) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make private
In fact, this method has more docs than actual implementation - I would inline the definition
""" | ||
if not self.expected_tool_name: | ||
return False | ||
return any(tool_use.get("name") == self.expected_tool_name for tool_use in tool_uses) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line is effectively what the loop on 137 does, and thus you don't even need this line - it's redundant
Args: | ||
tool: The tool to register dynamically | ||
""" | ||
self.dynamic_tools[tool.tool_name] = tool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be throwing if a tool already exists with this name - that would indicate a bug I think or us overriding a customer based name.
Throw and if we receive feedback we can adjust
Tests can be written I think - I'm generally inclined with the approach though the high coupling of the event loop with structured output is a bit concerning |
If None, the model will behave according to its default settings. | ||
structured_output_model: Pydantic model type(s) for structured output. | ||
When specified, all agent calls will attempt to return structured output of this type. | ||
This can be overridden on the agent invocation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this goes against convention for the agent class. This should either be a class attribute, or an kwargument on the invoke method. Im inclined to lean toward just the invoke kwargument.
- list[ContentBlock]: Multi-modal content blocks | ||
- list[Message]: Complete messages with roles | ||
- None: Use existing conversation history | ||
structured_output_model: Pydantic model type(s) for structured output (overrides agent default). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Left a comment above, this goes against the convention of the Agent class. We used to have every class level attribute overridable in the invoke method, but this was tripping customers up. We ended up deciding on having one way to set things so its more obvious what is going on
return list(all_tools.keys()) | ||
|
||
def __call__(self, prompt: AgentInput = None, **kwargs: Any) -> AgentResult: | ||
def __call__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: pretty sure this has updated, so you will need to rebase
@deprecated( | ||
"Agent.structured_output method is deprecated." | ||
" You should pass in `structured_output_model` directly into the agent invocation." | ||
" see the <LINK> for more details" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The @deprecated
annotation is only available in python >= 3.13. You should use warnings.warn
instead as thats compatible with all versions of python we currently support:
sdk-python/src/strands/tools/loader.py
Line 160 in 7fbc9dc
warnings.warn( |
|
||
if TYPE_CHECKING: | ||
from ..agent import Agent | ||
from ..agent.agent import Agent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this?
from ..agent.agent import Agent | |
from ..agent import Agent |
return False | ||
return any(tool_use.get("name") == self.expected_tool_name for tool_use in tool_uses) | ||
|
||
def get_tool_spec(self) -> Optional[ToolSpec]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this optional? When would this not be set?
"""Context management for structured output in the event loop.""" | ||
|
||
import logging | ||
from typing import Dict, Optional, Type |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from typing import Dict, Optional, Type | |
from typing import Type |
yield EventLoopStopEvent(stop_reason, message, agent.event_loop_metrics, invocation_state["request_state"]) | ||
# Force structured output tool call if LLM didn't use it automatically | ||
if structured_output_context.is_enabled and stop_reason == "end_turn": | ||
if not structured_output_context.can_retry(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed this already, but why do we ever want to retry? If the model didnt return structured output when forced, we should always raise, right?
|
||
from ..event_loop import streaming | ||
from ..tools import convert_pydantic_to_tool_spec | ||
from ..tools.structured_output.structured_output_utils import convert_pydantic_to_tool_spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this change?
|
||
from .decorator import tool | ||
from .structured_output import convert_pydantic_to_tool_spec | ||
from .structured_output.structured_output_utils import convert_pydantic_to_tool_spec |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this change?
Description
This PR implements a comprehensive structured output system that allows agents to return validated Pydantic models. Strands developers can pass in the
structured_output_model
field, set to a Pydantic model, when initializing an agent or when invoking the agent. The agent will attempt to populate the pydantic object and set it to a field,structured_output
, that can be accessed from theAgentResult
object. Callers can use different pydantic models per invocation, or the same, or for some invocations usestructured_output_model
and for others ignore it.Examples
Structured Output on the agent invocation
Structured Output when initializing the agent
See the README.md for more examples.
Key Features:
•
structured_output_model
parameter support in Agent class and call method• Complete output module with base classes, modes, and utilities (src/strands/output/)
• Tool-based system with automatic retry logic
• Enhanced event loop integration for structured output processing and validation
• Comprehensive documentation with examples, use cases, and best practices
• Type safety with full typing support and Pydantic validation
• Backward compatibility with existing tool ecosystem
ℹ️ NOTE: ℹ️
API-Bar raising
structured_output_model
is the parameter name we agreed toStructuredOutputEvent
we added a new Typed Event calledStructuredOutputEvent
Open questions:
gettext
but that would be hard to scaleRelated Issues
Documentation PR
Type of Change
New feature
Testing
How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli
• [ ] I ran hatch run prepare
Checklist
• [ ] I have read the CONTRIBUTING document
• [ ] I have added any necessary tests that prove my fix is effective or my feature works
• [ ] I have updated the documentation accordingly
• [ ] I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
• [ ] My changes generate no new warnings
• [ ] Any dependent changes have been merged and published
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.