You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/guardrails/moderation.md
+13-6Lines changed: 13 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,17 +4,24 @@ title: Moderated and Toxic Content
4
4
5
5
# Moderated and Toxic Content
6
6
<divclass='subtitle'>
7
-
{subheading}
7
+
Defining and Enforcing Content Moderation in Agentic Systems
8
8
</div>
9
9
10
-
{introduction}
10
+
It is important to ensure the safe generation of content from agentic systems to protect users from exposure to toxic or harmful material and to ensure that system behavior aligns with intended values. Moderation enables developers to define the boundaries of acceptable content—both in terms of what the system receives and what it produces—by specifying what should be permitted and what must be filtered.
11
+
12
+
By implementing moderation guardrails, you can shape the behavior of agentic systems in a way that is predictable, value-aligned, and resilient to misuse.
11
13
<divclass='risks'/>
12
14
> **Moderated and Toxic Content Risks**<br/>
13
15
> Without safeguards, agents may:
14
16
15
-
> * {reasons}
17
+
> * Generate or amplify **hate speech, harassment, or explicit content**.
18
+
19
+
> * Act on inappropriate user inputs causing **unintended behavoiour**.
20
+
21
+
> ***Spread misinformation** or reinforce harmful stereotypes.
16
22
17
-
{bridge}
23
+
24
+
The `moderated` function provided in guardrails helps you safeguard your systems and prevent toxic content.
The threshold for when content is classified as requiring moderation can also be modified using the `cat_threshold` parameter.
79
+
The threshold for when content is classified as requiring moderation can also be modified using the `cat_threshold` parameter. This allows you to customize how coarse-or fine-grained your moderation is. The default is`0.5`.
0 commit comments