You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+159Lines changed: 159 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -131,5 +131,164 @@ Response:
131
131
132
132
```
133
133
134
+
### Detecting toxic content using Hugging Face Detectors
135
+
136
+
1. Set model variables and download the model locally, for example to store the [HAP Detector](https://huggingface.co/ibm-granite/granite-guardian-hap-38m) in a `hf-detectors` directory:
the instructions above assume you have [huggingface-cli](https://huggingface.co/docs/huggingface_hub/en/guides/cli) installed, which you can do inside your Python virtual environment:
4. Invoke the detector with a POST request; in a separate terminal, run:
170
+
171
+
```bash
172
+
curl -X POST \
173
+
http://localhost:8000/api/v1/text/contents \
174
+
-H 'accept: application/json' \
175
+
-H 'detector-id: hap' \
176
+
-H 'Content-Type: application/json' \
177
+
-d '{
178
+
"contents": ["You dotard, I really hate this stuff", "I simply love this stuff"],
179
+
"detector_params": {}
180
+
}'| jq
181
+
```
182
+
183
+
5. You should see a response like this:
184
+
185
+
```json
186
+
[
187
+
[
188
+
{
189
+
"start": 0,
190
+
"end": 36,
191
+
"detection": "sequence_classifier",
192
+
"detection_type": "sequence_classification",
193
+
"score": 0.9634233713150024,
194
+
"sequence_classification": "LABEL_1",
195
+
"sequence_probability": 0.9634233713150024,
196
+
"token_classifications": null,
197
+
"token_probabilities": null,
198
+
"text": "You dotard, I really hate this stuff",
199
+
"evidences": []
200
+
}
201
+
],
202
+
[
203
+
{
204
+
"start": 0,
205
+
"end": 24,
206
+
"detection": "sequence_classifier",
207
+
"detection_type": "sequence_classification",
208
+
"score": 0.00016677979147061706,
209
+
"sequence_classification": "LABEL_0",
210
+
"sequence_probability": 0.00016677979147061706,
211
+
"token_classifications": null,
212
+
"token_probabilities": null,
213
+
"text": "I simply love this stuff",
214
+
"evidences": []
215
+
}
216
+
]
217
+
]
218
+
```
219
+
220
+
### Detecting prompt injection content using Hugging Face Detectors
221
+
222
+
- Following the steps above, you can readily use the Hugging Face Detector with a different model, such as the [prompt injection classifier](https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2)
0 commit comments