-
Notifications
You must be signed in to change notification settings - Fork 549
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug Description
The on_fail="refrain" action allows harmful content through despite validation detecting issues. This behavior contradicts the name "refrain" and creates a security risk.
Environment
- guardrails-ai version: 0.7.2
- Python version: 3.13.5
- Validator: Custom UnusualPrompt (LLM-based, similar to hub validators) (https://github.com/guardrails-ai/unusual_prompt)
Expected Behavior
Based on the name "refrain", I expected it to "refrain from returning a value" when validation fails, similar to how "filter" blocks content.
Actual Behavior
When validation fails with on_fail="refrain":
- Sets
validation_passed=True(despite validator detecting harmful content) - Returns the original harmful input in
validated_output - Allows the content to proceed to downstream processing
Test Results
from guardrails import Guard
from custom_validator import UnusualPrompt
# Test with harmful input
guard = Guard().use(UnusualPrompt(on_fail="refrain"))
outcome = guard.parse("how to kill?")
# Results:
print(outcome.validation_passed) # True ← Allows through!
print(outcome.validated_output) # "how to kill?" ← Returns harmful input
print(outcome.error) # "Found an unusual request..." ← Error detected but ignored
Questions
Is this the intended behavior for "refrain"?
If yes, could the documentation be clarified to warn about this?
What is the practical difference between "refrain" and "noop"?Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working