Skip to content

[BUG] Structured output model triggering Prompt Attack Guardrail #1288

@juliangrueber

Description

@juliangrueber

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

1.17.0

Python Version

3.12

Operating System

CloudDesktop

Installation Method

pip

Steps to Reproduce

  1. Create a Bedrock Guardrail with a prompt attack filter (strength: medium).
  2. Add the guardrail to the Bedrock Model.
  3. Define a structured output model.
  4. Add the structured output model and the Bedrock Model to the agent.
  5. Run agent("Hello").

The stop reason will be "guardrail_intervened" due to the following message:

await agent._append_message(
{"role": "user", "content": [{"text": "You must format the previous response as structured output."}]}
)

Expected Behavior

The usage of an structured output model should not trigger the guardrails

Actual Behavior

guardrail intervened

Additional Context

No response

Possible Solution

No response

Related Issues

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions