Skip to content

Commit 5de4cb6

Browse files
authored
Merge pull request #422 from guardrails-ai/karan/sensitive-language
Add ToxicLanguage validator
2 parents bf8286b + 3500799 commit 5de4cb6

File tree

8 files changed

+1153
-61
lines changed

8 files changed

+1153
-61
lines changed

docs/examples/toxic_language.ipynb

Lines changed: 206 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,206 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"## Check whether an LLM-generated response contains toxic language\n",
8+
"\n",
9+
"### Using the `ToxicLanguage` validator\n",
10+
"\n",
11+
"This is a simple walkthrough of the `ToxicLanguage` validator. This validator checks whether an LLM-generated response contains toxic language. It uses the pre-trained multi-label model from HuggingFace -`unitary/unbiased-toxic-roberta` to check whether the generated text is toxic. It supports both full-text-level and sentence-level validation.\n"
12+
]
13+
},
14+
{
15+
"cell_type": "code",
16+
"execution_count": 1,
17+
"metadata": {},
18+
"outputs": [],
19+
"source": [
20+
"# Import the guardrails package\n",
21+
"# and the ToxicLanguage validator\n",
22+
"import guardrails as gd\n",
23+
"from guardrails.validators import ToxicLanguage\n",
24+
"from rich import print"
25+
]
26+
},
27+
{
28+
"cell_type": "code",
29+
"execution_count": 2,
30+
"metadata": {},
31+
"outputs": [],
32+
"source": [
33+
"# Create a Guard object with this validator\n",
34+
"# Here, we'll use the default validation method of \"sentence\"\n",
35+
"# and the default threshold of 0.25\n",
36+
"\n",
37+
"guard = gd.Guard.from_string(\n",
38+
" validators=[ToxicLanguage(on_fail=\"fix\")],\n",
39+
" description=\"testmeout\",\n",
40+
")"
41+
]
42+
},
43+
{
44+
"cell_type": "code",
45+
"execution_count": 3,
46+
"metadata": {},
47+
"outputs": [
48+
{
49+
"data": {
50+
"text/html": [
51+
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
52+
"What a lovely day it has been to roam around the city.\n",
53+
"I love it here!\n",
54+
"</pre>\n"
55+
],
56+
"text/plain": [
57+
"\n",
58+
"What a lovely day it has been to roam around the city.\n",
59+
"I love it here!\n"
60+
]
61+
},
62+
"metadata": {},
63+
"output_type": "display_data"
64+
}
65+
],
66+
"source": [
67+
"# Parse the raw response\n",
68+
"raw_response = \"\"\"\n",
69+
"What a lovely day it has been to roam around the city. I love it here!\n",
70+
"I hate how pathetic and needy you can be sometimes.\n",
71+
"\"\"\"\n",
72+
"output = guard.parse(\n",
73+
" llm_output=raw_response,\n",
74+
")\n",
75+
"\n",
76+
"# Print the output\n",
77+
"print(output)"
78+
]
79+
},
80+
{
81+
"cell_type": "markdown",
82+
"metadata": {},
83+
"source": [
84+
"As you can see here, the sentence `I hate how pathetic and needy you can be sometimes.` has been detected as toxic, and hence removed from the validated output.\n"
85+
]
86+
},
87+
{
88+
"cell_type": "code",
89+
"execution_count": 4,
90+
"metadata": {},
91+
"outputs": [
92+
{
93+
"data": {
94+
"text/html": [
95+
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
96+
"I am a Generative AI model that is trained on a large corpus of text.\n",
97+
"This is a very powerful tool for generating new text, but it can also be used to generate text that is offensive or\n",
98+
"hateful.\n",
99+
"</pre>\n"
100+
],
101+
"text/plain": [
102+
"\n",
103+
"I am a Generative AI model that is trained on a large corpus of text.\n",
104+
"This is a very powerful tool for generating new text, but it can also be used to generate text that is offensive or\n",
105+
"hateful.\n"
106+
]
107+
},
108+
"metadata": {},
109+
"output_type": "display_data"
110+
}
111+
],
112+
"source": [
113+
"# Parse the raw response\n",
114+
"raw_response = \"\"\"\n",
115+
"I am a Generative AI model that is trained on a large corpus of text.\n",
116+
"I am shocked by how disgusting and vile you are.\n",
117+
"This is a very powerful tool for generating new text, but it can also be used to generate text that is offensive or hateful.\n",
118+
"\"\"\"\n",
119+
"output = guard.parse(\n",
120+
" llm_output=raw_response,\n",
121+
")\n",
122+
"\n",
123+
"# Print the output\n",
124+
"print(output)"
125+
]
126+
},
127+
{
128+
"cell_type": "markdown",
129+
"metadata": {},
130+
"source": [
131+
"Similarly, here the sentence `I am shocked by how disgusting and vile you are.` has been detected as toxic, and hence removed from the validated output.\n"
132+
]
133+
},
134+
{
135+
"cell_type": "code",
136+
"execution_count": 5,
137+
"metadata": {},
138+
"outputs": [],
139+
"source": [
140+
"# Test with validation method 'full'\n",
141+
"full_guard = gd.Guard.from_string(\n",
142+
" validators=[ToxicLanguage(validation_method=\"full\", on_fail=\"fix\")],\n",
143+
" description=\"testmeout\",\n",
144+
")"
145+
]
146+
},
147+
{
148+
"cell_type": "code",
149+
"execution_count": 6,
150+
"metadata": {},
151+
"outputs": [
152+
{
153+
"data": {
154+
"text/html": [
155+
"<pre style=\"white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace\">\n",
156+
"</pre>\n"
157+
],
158+
"text/plain": [
159+
"\n"
160+
]
161+
},
162+
"metadata": {},
163+
"output_type": "display_data"
164+
}
165+
],
166+
"source": [
167+
"# Parse the raw response\n",
168+
"raw_response = \"Stop being such a dumb piece of shit. Why can't you comprehend this?\"\n",
169+
"output = full_guard.parse(\n",
170+
" llm_output=raw_response,\n",
171+
")\n",
172+
"\n",
173+
"# Print the output\n",
174+
"print(output)"
175+
]
176+
},
177+
{
178+
"cell_type": "markdown",
179+
"metadata": {},
180+
"source": [
181+
"Here, we're doing validation on the entire text, and toxic language was detected here - hence, the nothing is returned here.\n"
182+
]
183+
}
184+
],
185+
"metadata": {
186+
"kernelspec": {
187+
"display_name": "lang",
188+
"language": "python",
189+
"name": "python3"
190+
},
191+
"language_info": {
192+
"codemirror_mode": {
193+
"name": "ipython",
194+
"version": 3
195+
},
196+
"file_extension": ".py",
197+
"mimetype": "text/x-python",
198+
"name": "python",
199+
"nbconvert_exporter": "python",
200+
"pygments_lexer": "ipython3",
201+
"version": "3.11.6"
202+
}
203+
},
204+
"nbformat": 4,
205+
"nbformat_minor": 2
206+
}

guardrails/validators/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@
3737
from guardrails.validators.similar_to_document import SimilarToDocument
3838
from guardrails.validators.similar_to_list import SimilarToList
3939
from guardrails.validators.sql_column_presence import SqlColumnPresence
40+
from guardrails.validators.toxic_language import ToxicLanguage, pipeline
4041
from guardrails.validators.two_words import TwoWords
4142
from guardrails.validators.upper_case import UpperCase
4243
from guardrails.validators.valid_choices import ValidChoices
@@ -76,11 +77,13 @@
7677
"PIIFilter",
7778
"SimilarToList",
7879
"DetectSecrets",
80+
"ToxicLanguage",
7981
"CompetitorCheck",
8082
# Validator helpers
8183
"detect_secrets",
8284
"AnalyzerEngine",
8385
"AnonymizerEngine",
86+
"pipeline",
8487
# Base classes
8588
"Validator",
8689
"register_validator",

0 commit comments

Comments
 (0)