-
Notifications
You must be signed in to change notification settings - Fork 551
Description
Description
There is no proper documentation explaining how combined Guardrails validators work. When I combine multiple validators, even simple questions are being flagged incorrectly. For example, the question “can you compare two products?” is detected as a jailbreak with a score of 0.80, and my base threshold is also 0.80. Similarly, queries that contain no personal details are still detected as PII. These issues appear specifically when multiple validators are combined.
Current documentation
The full documentation needs significant improvement. It is not clear how each validator works internally, how parameters should be tuned, or what the best practices are for different use cases. When multiple validators are combined, each one has its own behavior, and their parameters may conflict. The documentation should clearly explain which validators can be combined, how they interact, and provide example configurations or templates demonstrating correct usage.
Suggested changes
Provide detailed explanations of how each validator works in the background, including ML/DL-based validators, pattern-based validators, and LLM-based validators.
Include guidance on parameter tuning for each validator, with recommended ranges and use-case-specific examples.
Add clear documentation on combining validators—what combinations are safe, what may conflict, and how Guardrails resolves these conflicts.
Clarify how custom validators should be implemented, especially when blending multiple techniques such as Python classes, ML/DL models, or LLM logic.
Explicitly mention whether developers can use any LLM of their choice, and explain how behavior may change when integrating different LLMs together with ML/DL validators.
Provide examples showing how Guardrails handles scenarios where multiple types of validators (ML/DL, pattern detection, LLM-based, and custom validators) are applied at the same time.
Additional context
When trying to combine multiple validators, simple queries are often misclassified. For example, harmless questions may be flagged as jailbreaks or PII despite having no sensitive content. These issues make it difficult to understand how validator interactions work, especially when they include different technologies like ML, DL, regex patterns, and LLM-based checks. Clear documentation and examples would help prevent these conflicts and improve reliability.
Checklist
- I have checked that this issue hasn't already been reported
- I have checked the latest version of the documentation to ensure this issue still exists
- For simple typos or fixes, I have considered submitting a pull request instead