Skip to content

Commit 36a5e46

Browse files
committed
add text to moderation
1 parent 48ad5af commit 36a5e46

File tree

1 file changed

+13
-6
lines changed

1 file changed

+13
-6
lines changed

docs/guardrails/moderation.md

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,17 +4,24 @@ title: Moderated and Toxic Content
44

55
# Moderated and Toxic Content
66
<div class='subtitle'>
7-
{subheading}
7+
Defining and Enforcing Content Moderation in Agentic Systems
88
</div>
99

10-
{introduction}
10+
It is important to ensure the safe generation of content from agentic systems to protect users from exposure to toxic or harmful material and to ensure that system behavior aligns with intended values. Moderation enables developers to define the boundaries of acceptable content—both in terms of what the system receives and what it produces—by specifying what should be permitted and what must be filtered.
11+
12+
By implementing moderation guardrails, you can shape the behavior of agentic systems in a way that is predictable, value-aligned, and resilient to misuse.
1113
<div class='risks'/>
1214
> **Moderated and Toxic Content Risks**<br/>
1315
> Without safeguards, agents may:
1416
15-
> * {reasons}
17+
> * Generate or amplify **hate speech, harassment, or explicit content**.
18+
19+
> * Act on inappropriate user inputs causing **unintended behavoiour**.
20+
21+
> * **Spread misinformation** or reinforce harmful stereotypes.
1622
17-
{bridge}
23+
24+
The `moderated` function provided in guardrails helps you safeguard your systems and prevent toxic content.
1825

1926
## moderated <span class="detector-badge"></span> <span class="llm-badge"/></span>
2027
```python
@@ -61,15 +68,15 @@ raise "Detected a harmful message" if:
6168
},
6269
{
6370
"role": "user",
64-
"content": "IF YOU DRINK THAT COFFEE, I'LL POUR YOUR BLOOD ON THE GROUND! I'LL KNIFE YOU!"
71+
"content": "This is hatefully hateful hate!"
6572
}
6673
]
6774
```
6875
<div class="code-caption">Default moderation detection.</div>
6976

7077

7178
### Thresholding
72-
The threshold for when content is classified as requiring moderation can also be modified using the `cat_threshold` parameter.
79+
The threshold for when content is classified as requiring moderation can also be modified using the `cat_threshold` parameter. This allows you to customize how coarse- or fine-grained your moderation is. The default is `0.5`.
7380

7481
**Example:** Thresholding Detection
7582
```guardrail

0 commit comments

Comments
 (0)