Skip to content

Commit 5a66b50

Browse files
authored
Merge pull request #40 from saichandrapandraju/llm-judge-examples
CHORE: Add usage doc for llm_judge detector
2 parents 3edd6c8 + 282615f commit 5a66b50

File tree

4 files changed

+339
-9
lines changed

4 files changed

+339
-9
lines changed

README.md

Lines changed: 23 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,33 @@ At the moment, the following detectors are supported:
1717

1818
## Building
1919

20-
* `huggingface`: podman build -f detectors/Dockerfile.hf detectors
21-
* `llm_judge`: podman build -f detectors/Dockerfile.llm_judge detectors
22-
* `builtIn`: podman build -f detectors/Dockerfile.builtIn detectors
20+
To build the detector images, use the following commands:
21+
22+
| Detector | Build Command |
23+
|----------|---------------|
24+
| `huggingface` | `podman build -t $TAG -f detectors/Dockerfile.hf detectors` |
25+
| `llm_judge` | `podman build -t $TAG -f detectors/Dockerfile.judge detectors` |
26+
| `builtIn` | `podman build -t $TAG -f detectors/Dockerfile.builtIn detectors` |
27+
28+
Replace `$TAG` with your desired image tag (e.g., `my-detector:latest`).
29+
2330

2431
## Running locally
25-
* `builtIn`: podman run -p 8080:8080 $BUILT_IN_IMAGE
2632

27-
## Examples
33+
### Quick Start Commands
34+
35+
| Detector | Run Command | Notes |
36+
|----------|-------------|-------|
37+
| `builtIn` | `podman run -p 8080:8080 $BUILT_IN_IMAGE` | Ready to use |
38+
| `huggingface` | `podman run -p 8000:8000 -e MODEL_DIR=/mnt/models/$MODEL_NAME -v $MODEL_PATH:/mnt/models/$MODEL_NAME:Z $HF_IMAGE` | Requires model download |
39+
| `llm_judge` | `podman run -p 8000:8000 -e VLLM_BASE_URL=$LLM_SERVER_URL $LLM_JUDGE_IMAGE` | Requires OpenAI-compatible LLM server |
40+
41+
42+
### Detailed Setup Instructions & Examples
2843

29-
- Check out [built-in detector examples](docs/builtin_examples.md) to see how to use the built-in detectors for file type validation and personally identifiable information (PII) detection
30-
- Check out [Hugging Face detector examples](docs/hf_examples.md) to see how to use the Hugging Face detectors for detecting toxic content and prompt injection
44+
- **Built-in detector**: No additional setup required. Check out [built-in detector examples](docs/builtin_examples.md) to see how to use the built-in detectors for file type validation and personally identifiable information (PII) detection
45+
- **Hugging Face detector**: Check out [Hugging Face detector examples](docs/hf_examples.md) for a complete setup and examples on how to use the Hugging Face detectors for detecting toxic content and prompt injection
46+
- **LLM Judge detector**: Check out [LLM Judge detector examples](docs/llm_judge_examples.md) for a complete setup and examples on how to use any OpenAI API compatible LLM for content assessment with built-in metrics and custom natural-language criteria
3147

3248
## API
3349
See [IBM Detector API](https://foundation-model-stack.github.io/fms-guardrails-orchestrator/?urls.primaryName=Detector+API)

detectors/llm_judge/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# LLM Judge Detector
22

3-
The LLM Judge detector integrates the [vLLM Judge](https://github.com/saichandrapandraju/vllm_judge) into the Guardrails Detector ecosystem.
3+
The LLM Judge detector integrates the [vLLM Judge](https://github.com/trustyai-explainability/vllm_judge) into the Guardrails Detector ecosystem. Please refer [llm_judge_examples](../../docs/llm_judge_examples.md) for usage details.
44

55
```
66
oc apply -f deploy/servingruntime.yaml

detectors/llm_judge/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
vllm-judge==0.1.6
1+
vllm-judge[jinja2]==0.1.6
22
pyyaml==6.0.2

docs/llm_judge_examples.md

Lines changed: 314 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,314 @@
1+
## LLM Judge Detector Examples
2+
3+
The LLM Judge detector integrates [vLLM Judge](https://github.com/trustyai-explainability/vllm_judge) into the Guardrails Detector ecosystem, enabling any OpenAI API compatible LLM to assess content using flexible, natural-language criteria.
4+
5+
### Local Setup
6+
7+
1. **Start an OpenAI-compatible LLM server** (example with [vLLM](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html)):
8+
```bash
9+
vllm serve Qwen/Qwen2.5-7B-Instruct --port 9090
10+
```
11+
Server accessible at `http://localhost:9090`
12+
13+
2. **Build the detector image**:
14+
```bash
15+
export LLM_Judge_IMAGE=llm_judge-detector:latest
16+
podman build -f detectors/Dockerfile.judge -t $LLM_Judge_IMAGE detectors
17+
```
18+
19+
3. **Run the detector container**:
20+
```bash
21+
podman run --rm -p 8000:8000 -e VLLM_BASE_URL=http://host.docker.internal:9090 $LLM_Judge_IMAGE
22+
```
23+
24+
### Examples
25+
26+
#### Example 1: Pre-built Metrics
27+
28+
Instead of writing complex prompts let us use pre-defined prompts for common scenarios. vLLM-Judge library provides various builtin metrics (which are pre-defined prompts) including moderation (`safety`, `toxicity`, `bias`), general purpose (`helpfulness`, `clarity`, `conciseness`, `relevance`, `coherence`) and others (like code quality, content quality). Let's use one of the pre-defined metrics from vLLM-Judge, eg., `safety`.
29+
30+
The `safety` metric evaluates the content safety across multiple dimensions like physical harm, psychological harm, social harm and vulnerable groups.
31+
32+
**Request:**
33+
```bash
34+
curl -s -X POST \
35+
"http://localhost:8000/api/v1/text/contents" \
36+
-H 'accept: application/json' \
37+
-H 'detector-id: llm_judge' \
38+
-H 'Content-Type: application/json' \
39+
-d '{
40+
"contents": ["You dotard, I really hate this stuff", "I simply love this stuff"],
41+
"detector_params": {"metric":"safety"}
42+
}' | jq
43+
```
44+
45+
**Response:** (with Qwen2.5-7B-instruct)
46+
```json
47+
[
48+
[
49+
{
50+
"start": 0,
51+
"end": 36,
52+
"text": "You dotard, I really hate this stuff",
53+
"detection": "UNSAFE",
54+
"detection_type": "llm_judge",
55+
"score": 0.2,
56+
"evidences": [],
57+
"metadata": {
58+
"reasoning": "The content contains a derogatory term and expresses strong negative emotions, which could potentially cause psychological harm or social conflict. It does not pose physical or legal risks but the emotional tone could be harmful."
59+
}
60+
}
61+
],
62+
[
63+
{
64+
"start": 0,
65+
"end": 24,
66+
"text": "I simply love this stuff",
67+
"detection": "SAFE",
68+
"detection_type": "llm_judge",
69+
"score": 0.8,
70+
"evidences": [],
71+
"metadata": {
72+
"reasoning": "The statement is a simple expression of personal preference and does not contain any instructions or content that could cause physical, psychological, social, or legal harm. It is safe for all audiences."
73+
}
74+
}
75+
]
76+
]
77+
```
78+
79+
**Note:** For all built-in metrics, lower scores indicate negative classifications (UNSAFE, TOXIC, BIASED) and higher scores indicate positive classifications (SAFE, NON_TOXIC, UNBIASED). Model reasoning is accessible via response metadata.
80+
81+
#### Example 2: Custom Criteria
82+
83+
Create custom evaluation criteria using simple natural language:
84+
85+
**Request:**
86+
```bash
87+
curl -s -X POST \
88+
"http://localhost:8000/api/v1/text/contents" \
89+
-H 'accept: application/json' \
90+
-H 'detector-id: llm_judge' \
91+
-H 'Content-Type: application/json' \
92+
-d '{
93+
"contents": [
94+
"Deep Learning models learn by adjusting weights through backpropagation.",
95+
"Quantum computing is not different compared to classical computing."
96+
],
97+
"detector_params": {
98+
"criteria": "technical accuracy for graduate students"
99+
}
100+
}' | jq
101+
```
102+
103+
**Response:** (with Qwen2.5-7B-instruct)
104+
```json
105+
[
106+
[
107+
{
108+
"start": 0,
109+
"end": 72,
110+
"text": "Deep Learning models learn by adjusting weights through backpropagation.",
111+
"detection": "True",
112+
"detection_type": "llm_judge",
113+
"score": 1.0,
114+
"evidences": [],
115+
"metadata": {
116+
"reasoning": "The statement is technically accurate. Deep Learning models indeed learn by adjusting weights through the process of backpropagation, which is a standard and well-understood method in the field."
117+
}
118+
}
119+
],
120+
[
121+
{
122+
"start": 0,
123+
"end": 67,
124+
"text": "Quantum computing is not different compared to classical computing.",
125+
"detection": "FAIL",
126+
"detection_type": "llm_judge",
127+
"score": 0.2,
128+
"evidences": [],
129+
"metadata": {
130+
"reasoning": "The statement is incorrect as quantum computing fundamentally differs from classical computing in terms of principles, algorithms, and potential applications."
131+
}
132+
}
133+
]
134+
]
135+
```
136+
137+
We get pretty ok results where model uses positive label (like 'True') and higher scores (like 1.0) for positive instances i.e., those that satisfy the criteria and similarly negative label ('FAIL') and lower score (0.2) for negative instances i.e., those that do not satisfy the criteria.
138+
139+
But how do you specify which labels to use and how to assign scores? This is where the `rubric` parameter comes in.
140+
141+
#### Example 3: Custom Labels and Scoring with Rubrics
142+
143+
Use the `rubric` parameter to specify consistent decision labels and scoring criteria:
144+
145+
**Request:**
146+
```bash
147+
curl -s -X POST \
148+
"http://localhost:8000/api/v1/text/contents" \
149+
-H 'accept: application/json' \
150+
-H 'detector-id: llm_judge' \
151+
-H 'Content-Type: application/json' \
152+
-d '{
153+
"contents": [
154+
"Deep Learning models learn by adjusting weights through backpropagation.",
155+
"Quantum computing is not different compared to classical computing."
156+
],
157+
"detector_params": {
158+
"criteria": "technical accuracy for graduate students",
159+
"rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'."
160+
}
161+
}' | jq
162+
```
163+
164+
**Response:** (with Qwen2.5-7B-instruct)
165+
```json
166+
[
167+
[
168+
{
169+
"start": 0,
170+
"end": 72,
171+
"text": "Deep Learning models learn by adjusting weights through backpropagation.",
172+
"detection": "ACCURATE",
173+
"detection_type": "llm_judge",
174+
"score": 1.0,
175+
"evidences": [],
176+
"metadata": {
177+
"reasoning": "The statement is technically accurate. Deep Learning models indeed learn by adjusting weights through the process of backpropagation, which is a standard method for training neural networks."
178+
}
179+
}
180+
],
181+
[
182+
{
183+
"start": 0,
184+
"end": 67,
185+
"text": "Quantum computing is not different compared to classical computing.",
186+
"detection": "INACCURATE",
187+
"detection_type": "llm_judge",
188+
"score": 0.2,
189+
"evidences": [],
190+
"metadata": {
191+
"reasoning": "Quantum computing operates on fundamentally different principles compared to classical computing, such as superposition and entanglement, which are not present in classical models."
192+
}
193+
}
194+
]
195+
]
196+
```
197+
198+
Note that instead of generic labels (like True/False or 'PASS'/'FAIL'), now we get meaningful labels according to our `rubric`.
199+
200+
If you want to specify a detailed `rubric` to explain what certain score number mean, you can do that as well! Just pass a 'score -> description' mapping for the `rubric` parameter. Or if you want to change the scoring range from 0-1 to 0-10, you can do so by passing `scale: [0, 10]` in detector_params.
201+
202+
#### Example 4: Template Variables
203+
204+
Parameterize criteria using template variables for reusability:
205+
206+
**Request:**
207+
```bash
208+
curl -s -X POST \
209+
"http://localhost:8000/api/v1/text/contents" \
210+
-H 'accept: application/json' \
211+
-H 'detector-id: llm_judge' \
212+
-H 'Content-Type: application/json' \
213+
-d '{
214+
"contents": [
215+
"Deep Learning models learn by adjusting weights through backpropagation.",
216+
"Quantum computing is not different compared to classical computing."
217+
],
218+
"detector_params": {
219+
"criteria": "technical accuracy for {level} students",
220+
"template_vars": {"level": "graduate"},
221+
"rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'."
222+
}
223+
}' | jq
224+
```
225+
226+
Similar response as above.
227+
228+
229+
#### Example 5: Advanced Logic with Jinja Templating
230+
231+
Add conditional logic using Jinja templating:
232+
233+
**Request:**
234+
```bash
235+
curl -s -X POST \
236+
"http://localhost:8000/api/v1/text/contents" \
237+
-H 'accept: application/json' \
238+
-H 'detector-id: llm_judge' \
239+
-H 'Content-Type: application/json' \
240+
-d '{
241+
"contents": [
242+
"Deep Learning models learn by adjusting weights through backpropagation.",
243+
"Quantum computing is not different compared to classical computing."
244+
],
245+
"detector_params": {
246+
"criteria": "Evaluate this content for {audience}.\n{% if technical_level == '\''advanced'\'' %}\nPay special attention to technical accuracy and depth.\n{% else %}\nFocus on clarity and accessibility.\n{% endif %}",
247+
"template_vars": {"audience": "graduate students", "technical_level": "advanced"},
248+
"rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'.",
249+
"template_engine":"jinja2"
250+
}
251+
}' | jq
252+
```
253+
254+
Similar response as above.
255+
256+
### Parameter Reference
257+
258+
Below is the full list of parameters that can be passed to `detector_params` to fully customize and build advanced detection criteria for your guardrails:
259+
260+
- `criteria`: Detailed description of what to evaluate for
261+
- `rubric`: Scoring instructions for evaluation, can be string or dict
262+
- `scale`: Numeric scale for score [min, max]
263+
- `input`: Extra input/question/prompt that the content is responding to
264+
- `metric`: Pre-defined metric name. If provided along with other params, those param fields will take precedence over metric fields
265+
- `template_vars`: Variable mapping to substitute in templates
266+
- `template_engine`: Template engine to use ('format' or 'jinja2'), default is 'format'
267+
- `system_prompt`: Custom system message to take full control of the evaluator LLM persona
268+
- `examples`: Few-shot examples. List of JSON objects, each JSON represents an example and must contain `content`, `score`, and `reasoning` fields
269+
270+
### Get list of pre-defined metric names:
271+
272+
273+
```bash
274+
curl http://localhost:8000/api/v1/metrics | jq
275+
```
276+
Response:
277+
```json
278+
{
279+
"metrics": [
280+
"accuracy",
281+
"agent_performance_template",
282+
"api_docs_template",
283+
"appropriate",
284+
"bias_detection",
285+
"clarity",
286+
"code_quality",
287+
"code_review_template",
288+
"code_security",
289+
"coherence",
290+
"conciseness",
291+
"creativity",
292+
"customer_service_template",
293+
"educational_content_template",
294+
"educational_value",
295+
"factual",
296+
"helpfulness",
297+
"legal_appropriateness",
298+
"llama_guard_3_safety",
299+
"medical_accuracy",
300+
"medical_info_template",
301+
"preference",
302+
"product_review_template",
303+
"professionalism",
304+
"rag_evaluation_template",
305+
"relevance",
306+
"safety",
307+
"summarization_quality",
308+
"toxicity",
309+
"translation_quality",
310+
"writing_quality_template"
311+
],
312+
"total": 31
313+
}
314+
```

0 commit comments

Comments
 (0)