Skip to content

on_fail="refrain" behavior unclear/inconsistent with documentation #1395

@sareetamugde-arch

Description

@sareetamugde-arch

Bug Description

The on_fail="refrain" action allows harmful content through despite validation detecting issues. This behavior contradicts the name "refrain" and creates a security risk.

Environment

Expected Behavior

Based on the name "refrain", I expected it to "refrain from returning a value" when validation fails, similar to how "filter" blocks content.

Actual Behavior

When validation fails with on_fail="refrain":

  • Sets validation_passed=True (despite validator detecting harmful content)
  • Returns the original harmful input in validated_output
  • Allows the content to proceed to downstream processing

Test Results

from guardrails import Guard
from custom_validator import UnusualPrompt

# Test with harmful input
guard = Guard().use(UnusualPrompt(on_fail="refrain"))
outcome = guard.parse("how to kill?")

# Results:
print(outcome.validation_passed)  # True ← Allows through!
print(outcome.validated_output)   # "how to kill?" ← Returns harmful input
print(outcome.error)               # "Found an unusual request..." ← Error detected but ignored

Questions
Is this the intended behavior for "refrain"?

If yes, could the documentation be clarified to warn about this?

What is the practical difference between "refrain" and "noop"?

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions