Skip to content

feat(agent): make structured output part of the agent loop #670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kazmer97
Copy link

@kazmer97 kazmer97 commented Aug 14, 2025

Description

Title:

feat: Enable agent loop retry mechanism for structured output extraction

Description:

Summary

Modifies the structured_output_async method to leverage the agent's event loop retry mechanism, allowing the model to
attempt structured data extraction multiple times until successful. Previously, the method used direct model inference
without retry capabilities.

Key Change: Agent Loop Integration

Before: Direct model call → single attempt → return or fail

# Old approach (direct model inference)
events = self.model.structured_output(output_model, temp_messages, system_prompt=self.system_prompt)

After: Agent event loopretry capabilitystructured tool usageextraction
# New approach (agent loop with retry)
async for event in self._run_loop(message=message, invocation_state=invocation_state):
    # Agent can retry, use tools, and self-correct until successful

Why This Matters

  1. Retry Logic: Agent can attempt structured extraction multiple times if initial attempts fail
  2. Tool Integration: Model can use available tools to gather information before structuring output
  3. Self-Correction: Agent can iterate and refine responses until they meet the schema requirements
  4. Robustness: Handles edge cases where first extraction attempt produces invalid data

Implementation Approach

  • Temporary Tool Registration: Adds _structured_output tool that validates against the target Pydantic model
  • Hook-Based Capture: Uses AfterToolInvocationEvent to intercept when model uses the structured output tool
  • Resource Cleanup: Ensures temporary tools and hooks are properly removed after extraction
  • State Preservation: Deep copy approach maintains clean agent state

Behavioral Improvements

  • Higher Success Rate: Multiple attempts increase likelihood of valid structured output
  • Better Error Handling: Agent can recover from validation failures and retry
  • Enhanced Flexibility: Model can use conversation context and tools before structuring data
  • Consistent API: Same function signature, improved internal behavior

Example Scenario

  • Agent may now iterate like this internally:

  • 1st attempt: Model provides incomplete data → validation fails → retry

  • 2nd attempt: Model uses tools to gather missing info → validation fails → retry

  • 3rd attempt: Model provides complete valid data → success!

    profile = await agent.structured_output_async(UserProfile, prompt) -- Returns validated result after however many attempts needed

    Files Modified

    • src/strands/agent/agent.py - Replace direct model call with agent loop integration

    Backward Compatibility

  • Hybrid Approach: Structured output now attempts tool-based execution first, then falls back to the original
    model.structured_output() method for backward compatibility

    • Message History Control: Added optional preserve_conversation parameter (default: False) to control whether structured
      output execution modifies conversation history

    Implementation Details:

    1. Registers temporary _structured_output tool and hook to capture tool-based results
    2. Attempts event loop execution with tool-based structured output
    3. Falls back to original model.structured_output() if no tool result captured
    4. Automatically cleans up temporary tools and hooks
    5. Restores original message history when preserve_conversation=False (default)

    Test Updates:

    • Hook Tests: Updated to use preserve_conversation=True and expect more hook events due to event loop execution
    • Compatibility Tests: Maintained existing behavior expectations (no message pollution) with default
      preserve_conversation=False
    • Event Assertions: Changed from exact event count to range checks, verifying correct first/last hook events

    Backward Compatibility: All existing structured output functionality preserved while adding new tool-based capabilities
    for future extensibility.

Function signature unchanged - existing code continues to work
Return type unchanged - still returns validated Pydantic models
Error handling improved - more robust failure modes

Related Issues

#348

Documentation PR

Type of Change

New feature

Testing

How have you tested the change? Verify that the changes do not break functionality or introduce warnings in consuming repositories: agents-docs, agents-tools, agents-cli

  • I ran hatch run prepare

Checklist

  • I have read the CONTRIBUTING document
  • I have added any necessary tests that prove my fix is effective or my feature works
  • I have updated the documentation accordingly
  • I have added an appropriate example to the documentation to outline the feature, or no new docs are needed
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@kazmer97 kazmer97 force-pushed the feat/fix-structured-output branch from 713d681 to 8bf015f Compare August 14, 2025 17:31
@kazmer97 kazmer97 force-pushed the feat/fix-structured-output branch from 8bf015f to 33bc6e8 Compare August 14, 2025 17:49
@kazmer97 kazmer97 force-pushed the feat/fix-structured-output branch from 33bc6e8 to 23c8cb2 Compare August 14, 2025 17:51
@kazmer97 kazmer97 temporarily deployed to manual-approval August 14, 2025 18:15 — with GitHub Actions Inactive
@jotelfor
Copy link

I think this retry mechanism is a good idea. I've seen something similar used in a different context, and it was very effective.

@kazmer97 kazmer97 force-pushed the feat/fix-structured-output branch from 01f7dab to f2e46b9 Compare August 16, 2025 11:50
"gen_ai.choice",
attributes={"message": json.dumps(user.model_dump())},
)
# Verify agent-level tracing was called
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the assertions have been weakened here. it is preferred to keep a high bar on assertions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what assertion would you like to call here?

events = self.model.structured_output(output_model, temp_messages, system_prompt=self.system_prompt)
async for event in events:
if "callback" in event:
self.callback_handler(**cast(dict, event["callback"]))
structured_output_span.add_event(
"gen_ai.choice", attributes={"message": serialize(event["output"].model_dump())}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am concerned that the tracing behavior has changed and hasn't been manually or unit tested

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok we can assert it further


with ThreadPoolExecutor() as executor:
future = executor.submit(execute)
return future.result()

def _register_structured_output_tool(self, output_model: type[BaseModel]) -> Any:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's unclear to me why this tool is needed. the current structured output implementation within strands does utilize a tool to generate structured output (which is subject to change). however, i am hesitant to retry the entire agent loop in the pattern suggested in this PR. another way to implement this is returning the schema validation failures as tool failures to the model.

structured output is currently an open roadmap item that we are planning to redesign. i am not comfortable merging this PR in it's current state given that the underlying implementation is subject to change and due to the points raised above. currently, the approach of using a tool to generate structured output is an anti-pattern since most model providers natively support structured output response. these native approaches improve structure output performance substantially.

reference:

given these natively supported features, retries can be simply implemented on the actual model API request, a full retry of the agent loop is not required.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON output and TOOL use at the model level is likely an identical implementation to force the tokens into the json schema space. The performance of the models greatly improves in the agent loop context as it will have the context to evaluate its own work. Please see anthropic docs

This is an additional feature on top of native support. Think of it as the tool/json output at the model level forces the model to write in cursive, but doesn't actually check the contents of the writing. With a pydantic model you can have additional logical validations checking the content.

Please explain why is it an antipattern to use a tool for structured output? It re-suses the execution guardrails testing of tool execution where failures are fed back to the model. You would be reimplementing the same logic at the model structured output function level if you were to try to feedback any validation errors.

Under this setup the agent has the chance of performing all the actions as under the run command, and responding with a structured output, which is very useful for interacting with the output programatically.

however, i am hesitant to retry the entire agent loop in the pattern suggested in this PR. another way to implement this is returning the schema validation failures as tool failures to the model.

How is what you describe here different to rerunning the agent loop? if you feedback the schema failure as a tool result and then the agent responds that is effectively an agent loop with extra steps.

@@ -417,20 +426,33 @@ def structured_output(self, output_model: Type[T], prompt: Optional[Union[str, l
output_model: The output model (a JSON schema written as a Pydantic BaseModel)
that the agent will use when responding.
prompt: The prompt to use for the agent (will not be added to conversation history).
preserve_conversation: If False (default), restores original conversation state after execution.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the existence (or lack thereof) of the prompt parameter determines this behavior, so i'm not sure we want to modify this interface and add another parameter here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants