[docs] - Combined Guardrails is not working good

**Description**
There is no proper documentation explaining how combined Guardrails validators work. When I combine multiple validators, even simple questions are being flagged incorrectly. For example, the question “can you compare two products?” is detected as a jailbreak with a score of 0.80, and my base threshold is also 0.80. Similarly, queries that contain no personal details are still detected as PII. These issues appear specifically when multiple validators are combined.

**Current documentation**
The full documentation needs significant improvement. It is not clear how each validator works internally, how parameters should be tuned, or what the best practices are for different use cases. When multiple validators are combined, each one has its own behavior, and their parameters may conflict. The documentation should clearly explain which validators can be combined, how they interact, and provide example configurations or templates demonstrating correct usage.

**Suggested changes**
Provide detailed explanations of how each validator works in the background, including ML/DL-based validators, pattern-based validators, and LLM-based validators.

Include guidance on parameter tuning for each validator, with recommended ranges and use-case-specific examples.

Add clear documentation on combining validators—what combinations are safe, what may conflict, and how Guardrails resolves these conflicts.

Clarify how custom validators should be implemented, especially when blending multiple techniques such as Python classes, ML/DL models, or LLM logic.

Explicitly mention whether developers can use any LLM of their choice, and explain how behavior may change when integrating different LLMs together with ML/DL validators.

Provide examples showing how Guardrails handles scenarios where multiple types of validators (ML/DL, pattern detection, LLM-based, and custom validators) are applied at the same time.

**Additional context**
When trying to combine multiple validators, simple queries are often misclassified. For example, harmless questions may be flagged as jailbreaks or PII despite having no sensitive content. These issues make it difficult to understand how validator interactions work, especially when they include different technologies like ML, DL, regex patterns, and LLM-based checks. Clear documentation and examples would help prevent these conflicts and improve reliability.

**Checklist**
- [x] I have checked that this issue hasn't already been reported
- [x] I have checked the latest version of the documentation to ensure this issue still exists
- [ ] For simple typos or fixes, I have considered submitting a pull request instead


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] - Combined Guardrails is not working good #1374

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[docs] - Combined Guardrails is not working good #1374

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions