Merge pull request trustyai-explainability#34 from m-misiura/local-example-hf-detector

RobGeada · web-flow · commit a5c20e6f7ff1 · 2025-07-16T13:29:04.000+01:00
CHORE: added examples of how to run the HF detectors locally within the README
diff --git a/README.md b/README.md
@@ -24,112 +24,10 @@ At the moment, the following detectors are supported:
 ## Running locally
 * `builtIn`: podman run -p 8080:8080 $BUILT_IN_IMAGE
 
-### File Type Validation Example
-```bash
-curl -X POST http://localhost:8080/api/v1/text/contents \
-  -H "Content-Type: application/json" \
-  -d '{
-    "contents": [
-      "{\"hello\": \"message\"}",
-      "not valid json"
-    ],
-    "detector_params": {
-      "file_type": [
-        "json"
-      ]
-    }
-  }'
-```
-Response:
-```json
-[
-  [],
-  [
-    {
-      "start": 0,
-      "end": 14,
-      "text": "not valid json",
-      "detection": "invalid_json",
-      "detection_type": "file_type",
-      "score": 1.0,
-      "evidences": null
-    }
-  ]
-]
-```
+## Examples
 
-### PII Validation Example
-```bash
-curl -X POST http://localhost:8080/api/v1/text/contents \
-  -H "Content-Type: application/json" \
-  -d '{
-    "contents": [
-      "Hi my email is abc@def.com",
-      "There is a party@my house and you can reach me at 123-456-7890"
-    ],
-    "detector_params": {
-      "regex": [
-        "email", "us-phone-number"
-      ]
-    }
-  }' | jq
-```
-Response:
-```json
-[
-  [
-    {
-      "start": 15,
-      "end": 26,
-      "text": "abc@def.com",
-      "detection": "email_address",
-      "detection_type": "pii",
-      "score": 1.0,
-      "evidences": null
-    }
-  ],
-  [
-    {
-      "start": 50,
-      "end": 62,
-      "text": "123-456-7890",
-      "detection": "us-phone-number",
-      "detection_type": "pii",
-      "score": 1.0,
-      "evidences": null
-    }
-  ]
-]
-```
-
-### Get list of built-in detection algorithms:
-```bash
-curl http://localhost:8080/registry | jq
-```
-Response:
-```json
-{
-  "regex": {
-    "credit-card": "Detect credit cards in the text contents (Visa, MasterCard, Amex, Discover, Diners Club, JCB) with Luhn check",
-    "email": "Detect email addresses in the text contents",
-    "ipv4": "Detect IPv4 addresses in the text contents",
-    "ipv6": "Detect IPv6 addresses in the text contents",
-    "us-phone-number": "Detect US phone numbers in the text contents",
-    "us-social-security-number": "Detect social security numbers in the text contents",
-    "uk-post-code": "Detect UK post codes in the text contents",
-    "$CUSTOM_REGEX": "Replace $CUSTOM_REGEX with a custom regex to define your own regex detector"
-  },
-  "file_type": {
-    "json": "Detect if the text contents is not valid JSON",
-    "xml": "Detect if the text contents is not valid XML",
-    "yaml": "Detect if the text contents is not valid YAML",
-    "json-with-schema:$SCHEMA": "Detect if the text contents does not satisfy a provided JSON schema. To specify a schema, replace $SCHEMA with a JSON schema.",
-    "xml-with-schema:$SCHEMA": "Detect if the text contents does not satisfy a provided XML schema. To specify a schema, replace $SCHEMA with an XML Schema Definition (XSD)",
-    "yaml-with-schema:$SCHEMA": "Detect if the text contents does not satisfy a provided schema. To specify a schema, replace $SCHEMA with a JSON schema. That's not a typo, you validate YAML with a JSON schema!"
-  }
-}
-
-```
+- Check out [built-in detector examples](docs/builtIn_examples.md) to see how to use the built-in detectors for file type validation and personally identifiable information (PII) detection
+- Check out [Hugging Face detector examples](docs/hf_examples.md) to see how to use the Hugging Face detectors for detecting toxic content and prompt injection 
 
 ## API
 See [IBM Detector API](https://foundation-model-stack.github.io/fms-guardrails-orchestrator/?urls.primaryName=Detector+API)
diff --git a/docs/builtin_examples.md b/docs/builtin_examples.md
@@ -0,0 +1,107 @@
+## Built-In Detector Examples
+
+### File Type Validation Example
+```bash
+curl -X POST http://localhost:8080/api/v1/text/contents \
+  -H "Content-Type: application/json" \
+  -d '{
+    "contents": [
+      "{\"hello\": \"message\"}",
+      "not valid json"
+    ],
+    "detector_params": {
+      "file_type": [
+        "json"
+      ]
+    }
+  }'
+```
+Response:
+```json
+[
+  [],
+  [
+    {
+      "start": 0,
+      "end": 14,
+      "text": "not valid json",
+      "detection": "invalid_json",
+      "detection_type": "file_type",
+      "score": 1.0,
+      "evidences": null
+    }
+  ]
+]
+```
+
+### PII Validation Example
+```bash
+curl -X POST http://localhost:8080/api/v1/text/contents \
+  -H "Content-Type: application/json" \
+  -d '{
+    "contents": [
+      "Hi my email is abc@def.com",
+      "There is a party@my house and you can reach me at 123-456-7890"
+    ],
+    "detector_params": {
+      "regex": [
+        "email", "us-phone-number"
+      ]
+    }
+  }' | jq
+```
+Response:
+```json
+[
+  [
+    {
+      "start": 15,
+      "end": 26,
+      "text": "abc@def.com",
+      "detection": "email_address",
+      "detection_type": "pii",
+      "score": 1.0,
+      "evidences": null
+    }
+  ],
+  [
+    {
+      "start": 50,
+      "end": 62,
+      "text": "123-456-7890",
+      "detection": "us-phone-number",
+      "detection_type": "pii",
+      "score": 1.0,
+      "evidences": null
+    }
+  ]
+]
+```
+
+### Get list of built-in detection algorithms:
+```bash
+curl http://localhost:8080/registry | jq
+```
+Response:
+```json
+{
+  "regex": {
+    "credit-card": "Detect credit cards in the text contents (Visa, MasterCard, Amex, Discover, Diners Club, JCB) with Luhn check",
+    "email": "Detect email addresses in the text contents",
+    "ipv4": "Detect IPv4 addresses in the text contents",
+    "ipv6": "Detect IPv6 addresses in the text contents",
+    "us-phone-number": "Detect US phone numbers in the text contents",
+    "us-social-security-number": "Detect social security numbers in the text contents",
+    "uk-post-code": "Detect UK post codes in the text contents",
+    "$CUSTOM_REGEX": "Replace $CUSTOM_REGEX with a custom regex to define your own regex detector"
+  },
+  "file_type": {
+    "json": "Detect if the text contents is not valid JSON",
+    "xml": "Detect if the text contents is not valid XML",
+    "yaml": "Detect if the text contents is not valid YAML",
+    "json-with-schema:$SCHEMA": "Detect if the text contents does not satisfy a provided JSON schema. To specify a schema, replace $SCHEMA with a JSON schema.",
+    "xml-with-schema:$SCHEMA": "Detect if the text contents does not satisfy a provided XML schema. To specify a schema, replace $SCHEMA with an XML Schema Definition (XSD)",
+    "yaml-with-schema:$SCHEMA": "Detect if the text contents does not satisfy a provided schema. To specify a schema, replace $SCHEMA with a JSON schema. That's not a typo, you validate YAML with a JSON schema!"
+  }
+}
+```
diff --git a/docs/hf_examples.md b/docs/hf_examples.md
@@ -0,0 +1,160 @@
+## Hugging Face Detector Examples
+
+### Detecting toxic content using Hugging Face Detectors
+
+1. Set model variables and download the model locally, for example to store the [HAP Detector](https://huggingface.co/ibm-granite/granite-guardian-hap-38m) in a `hf-detectors` directory:
+
+```bash
+export HF_MODEL=ibm-granite/granite-guardian-hap-38m
+export DETECTOR_STORAGE=hf-detectors
+export DETECTOR_NAME=$(basename "$HF_MODEL")
+export DETECTOR_DIR=$DETECTOR_STORAGE/$DETECTOR_NAME
+
+huggingface-cli download "$HF_MODEL" --local-dir "$DETECTOR_DIR"
+```
+
+the instructions above assume you have [huggingface-cli](https://huggingface.co/docs/huggingface_hub/en/guides/cli) installed, which you can do inside your Python virtual environment:
+
+```bash
+pip install "huggingface_hub[cli]"
+```
+
+2. Build the image for the Hugging Face Detector:
+
+```bash
+export HF_IMAGE=hf-detector:latest
+podman build -f detectors/Dockerfile.hf -t $HF_IMAGE detectors
+```
+
+3. Run the detector container, mounting the model directory you downloaded in the previous step:
+
+```bash
+podman run --rm -p 8000:8000 \
+  -e MODEL_DIR=/mnt/models/$DETECTOR_NAME \
+  -v $(pwd)/$DETECTOR_DIR:/mnt/models/$DETECTOR_NAME:Z \
+  $HF_IMAGE
+```
+
+4. Invoke the detector with a POST request; in a separate terminal, run:
+
+```bash
+curl -X POST \
+  http://localhost:8000/api/v1/text/contents \
+  -H 'accept: application/json' \
+  -H 'detector-id: hap' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "contents": ["You dotard, I really hate this stuff", "I simply love this stuff"],
+    "detector_params": {}
+  }' | jq
+```
+
+5. You should see a response like this:
+
+```json
+[
+  [
+    {
+      "start": 0,
+      "end": 36,
+      "detection": "sequence_classifier",
+      "detection_type": "sequence_classification",
+      "score": 0.9634233713150024,
+      "sequence_classification": "LABEL_1",
+      "sequence_probability": 0.9634233713150024,
+      "token_classifications": null,
+      "token_probabilities": null,
+      "text": "You dotard, I really hate this stuff",
+      "evidences": []
+    }
+  ],
+  [
+    {
+      "start": 0,
+      "end": 24,
+      "detection": "sequence_classifier",
+      "detection_type": "sequence_classification",
+      "score": 0.00016677979147061706,
+      "sequence_classification": "LABEL_0",
+      "sequence_probability": 0.00016677979147061706,
+      "token_classifications": null,
+      "token_probabilities": null,
+      "text": "I simply love this stuff",
+      "evidences": []
+    }
+  ]
+]
+```
+
+### Detecting prompt injection content using Hugging Face Detectors
+
+- Following the steps above, you can readily use the Hugging Face Detector with a different model, such as the [prompt injection classifier](https://huggingface.co/protectai/deberta-v3-base-prompt-injection-v2)
+
+```bash
+export HF_MODEL=protectai/deberta-v3-base-prompt-injection-v2
+export DETECTOR_STORAGE=hf-detectors
+export DETECTOR_NAME=$(basename "$HF_MODEL")
+export DETECTOR_DIR=$DETECTOR_STORAGE/$DETECTOR_NAME
+
+huggingface-cli download "$HF_MODEL" --local-dir "$DETECTOR_DIR"
+```
+
+- then spin up the container as before:
+
+```bash
+podman run --rm -p 8000:8000 \
+  -e MODEL_DIR=/mnt/models/$DETECTOR_NAME \
+  -v $(pwd)/$DETECTOR_DIR:/mnt/models/$DETECTOR_NAME:Z \
+  $HF_IMAGE
+```
+
+- and invoke the detector with a POST request; in a separate terminal, run:
+
+```bash
+curl -X POST \
+  http://localhost:8000/api/v1/text/contents \
+  -H 'accept: application/json' \
+  -H 'detector-id: prompt-injection' \
+  -H 'Content-Type: application/json' \
+  -d '{
+    "contents": ["Trolol?", "How to make a delicious espresso?"],
+    "detector_params": {}
+  }' | jq
+```
+
+which should yield a response like this:
+
+```json
+[
+  [
+    {
+      "start": 0,
+      "end": 48,
+      "detection": "sequence_classifier",
+      "detection_type": "sequence_classification",
+      "score": 0.9998816251754761,
+      "sequence_classification": "INJECTION",
+      "sequence_probability": 0.9998816251754761,
+      "token_classifications": null,
+      "token_probabilities": null,
+      "text": "Trolol?",
+      "evidences": []
+    }
+  ],
+  [
+    {
+      "start": 0,
+      "end": 33,
+      "detection": "sequence_classifier",
+      "detection_type": "sequence_classification",
+      "score": 9.671030056779273E-7,
+      "sequence_classification": "SAFE",
+      "sequence_probability": 9.671030056779273E-7,
+      "token_classifications": null,
+      "token_probabilities": null,
+      "text": "How to make a delicious espresso?",
+      "evidences": []
+    }
+  ]
+]
+```