Skip to content

Commit 2050a28

Browse files
Add usage example doc for llm_judge detector & update reqs for jinja templates
1 parent 3edd6c8 commit 2050a28

File tree

4 files changed

+319
-4
lines changed

4 files changed

+319
-4
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ At the moment, the following detectors are supported:
1818
## Building
1919

2020
* `huggingface`: podman build -f detectors/Dockerfile.hf detectors
21-
* `llm_judge`: podman build -f detectors/Dockerfile.llm_judge detectors
21+
* `llm_judge`: podman build -f detectors/Dockerfile.judge detectors
2222
* `builtIn`: podman build -f detectors/Dockerfile.builtIn detectors
2323

2424
## Running locally
@@ -27,7 +27,8 @@ At the moment, the following detectors are supported:
2727
## Examples
2828

2929
- Check out [built-in detector examples](docs/builtin_examples.md) to see how to use the built-in detectors for file type validation and personally identifiable information (PII) detection
30-
- Check out [Hugging Face detector examples](docs/hf_examples.md) to see how to use the Hugging Face detectors for detecting toxic content and prompt injection
30+
- Check out [Hugging Face detector examples](docs/hf_examples.md) to see how to use the Hugging Face detectors for detecting toxic content and prompt injection
31+
- Check out [LLM Judge detector examples](docs/llm_judge_examples.md) to see how to use any OpenAI API compatible LLM for content assessment with built-in metrics and custom natural-language criteria
3132

3233
## API
3334
See [IBM Detector API](https://foundation-model-stack.github.io/fms-guardrails-orchestrator/?urls.primaryName=Detector+API)

detectors/llm_judge/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# LLM Judge Detector
22

3-
The LLM Judge detector integrates the [vLLM Judge](https://github.com/saichandrapandraju/vllm_judge) into the Guardrails Detector ecosystem.
3+
The LLM Judge detector integrates the [vLLM Judge](https://github.com/trustyai-explainability/vllm_judge) into the Guardrails Detector ecosystem. Please refer [llm_judge_examples](docs/llm_judge_examples.md) for usage details.
44

55
```
66
oc apply -f deploy/servingruntime.yaml

detectors/llm_judge/requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
vllm-judge==0.1.6
1+
vllm-judge[jinja2]==0.1.6
22
pyyaml==6.0.2

docs/llm_judge_examples.md

Lines changed: 314 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,314 @@
1+
## LLM Judge Detector Examples
2+
3+
The LLM Judge detector integrates [vLLM Judge](https://github.com/trustyai-explainability/vllm_judge) into the Guardrails Detector ecosystem, enabling any OpenAI API compatible LLM to assess content using flexible, natural-language criteria.
4+
5+
### Local Setup
6+
7+
1. **Start an OpenAI-compatible LLM server** (example with vLLM):
8+
```bash
9+
vllm serve Qwen/Qwen2.5-7B-Instruct --port 9090
10+
```
11+
Server accessible at `http://localhost:9090`
12+
13+
2. **Build the detector image**:
14+
```bash
15+
export LLM_Judge_IMAGE=llm_judge-detector:latest
16+
podman build -f detectors/Dockerfile.judge -t $LLM_Judge_IMAGE detectors
17+
```
18+
19+
3. **Run the detector container**:
20+
```bash
21+
podman run --rm -p 8000:8000 -e VLLM_BASE_URL=http://host.docker.internal:9090 $LLM_Judge_IMAGE
22+
```
23+
24+
### Examples
25+
26+
#### Example 1: Pre-built Metrics
27+
28+
Instead of writing complex prompts let us use pre-defined prompts for common scenarios. vLLM-Judge library provides various builtin metrics (which are pre-defined prompts) including moderation (`safety`, `toxicity`, `bias`), general purpose (`helpfulness`, `clarity`, `conciseness`, `relevance`, `coherence`) and others (like code quality, content quality). Let's use one of the pre-defined metrics from vLLM-Judge, eg., `safety`.
29+
30+
The `safety` metric evaluates the content safety across multiple dimensions like physical harm, psycological harm, social harm and vulnerable groups.
31+
32+
**Request:**
33+
```bash
34+
curl -s -X POST \
35+
"http://localhost:8000/api/v1/text/contents" \
36+
-H 'accept: application/json' \
37+
-H 'detector-id: llm_judge' \
38+
-H 'Content-Type: application/json' \
39+
-d '{
40+
"contents": ["You dotard, I really hate this stuff", "I simply love this stuff"],
41+
"detector_params": {"metric":"safety"}
42+
}' | jq
43+
```
44+
45+
**Response:** (with Qwen2.5-7B-instruct)
46+
```json
47+
[
48+
[
49+
{
50+
"start": 0,
51+
"end": 36,
52+
"text": "You dotard, I really hate this stuff",
53+
"detection": "UNSAFE",
54+
"detection_type": "llm_judge",
55+
"score": 0.2,
56+
"evidences": [],
57+
"metadata": {
58+
"reasoning": "The content contains a derogatory term and expresses strong negative emotions, which could potentially cause psychological harm or social conflict. It does not pose physical or legal risks but the emotional tone could be harmful."
59+
}
60+
}
61+
],
62+
[
63+
{
64+
"start": 0,
65+
"end": 24,
66+
"text": "I simply love this stuff",
67+
"detection": "SAFE",
68+
"detection_type": "llm_judge",
69+
"score": 0.8,
70+
"evidences": [],
71+
"metadata": {
72+
"reasoning": "The statement is a simple expression of personal preference and does not contain any instructions or content that could cause physical, psychological, social, or legal harm. It is safe for all audiences."
73+
}
74+
}
75+
]
76+
]
77+
```
78+
79+
**Note:** For all built-in metrics, lower scores indicate negative classifications (UNSAFE, TOXIC, BIASED) and higher scores indicate positive classifications (SAFE, NON_TOXIC, UNBIASED). Model reasoning is accessible via response metadata.
80+
81+
#### Example 2: Custom Criteria
82+
83+
Create custom evaluation criteria using simple natural language:
84+
85+
**Request:**
86+
```bash
87+
curl -s -X POST \
88+
"http://localhost:8000/api/v1/text/contents" \
89+
-H 'accept: application/json' \
90+
-H 'detector-id: llm_judge' \
91+
-H 'Content-Type: application/json' \
92+
-d '{
93+
"contents": [
94+
"Deep Learning models learn by adjusting weights through backpropagation.",
95+
"Quantum computing is not different compared to classical computing."
96+
],
97+
"detector_params": {
98+
"criteria": "technical accuracy for graduate students"
99+
}
100+
}' | jq
101+
```
102+
103+
**Response:** (with Qwen2.5-7B-instruct)
104+
```json
105+
[
106+
[
107+
{
108+
"start": 0,
109+
"end": 72,
110+
"text": "Deep Learning models learn by adjusting weights through backpropagation.",
111+
"detection": "True",
112+
"detection_type": "llm_judge",
113+
"score": 1.0,
114+
"evidences": [],
115+
"metadata": {
116+
"reasoning": "The statement is technically accurate. Deep Learning models indeed learn by adjusting weights through the process of backpropagation, which is a standard and well-understood method in the field."
117+
}
118+
}
119+
],
120+
[
121+
{
122+
"start": 0,
123+
"end": 67,
124+
"text": "Quantum computing is not different compared to classical computing.",
125+
"detection": "FAIL",
126+
"detection_type": "llm_judge",
127+
"score": 0.2,
128+
"evidences": [],
129+
"metadata": {
130+
"reasoning": "The statement is incorrect as quantum computing fundamentally differs from classical computing in terms of principles, algorithms, and potential applications."
131+
}
132+
}
133+
]
134+
]
135+
```
136+
137+
We get pretty ok results where model uses positive label (like 'True') and higher scores (like 1.0) for positive instances i.e, that satisfy the criteria and similarly negative label ('FAIL') and lower score (0.2) for negative instances i.e, that does not satisfy the criteria.
138+
139+
But how to specifically say which labels to use and how to assign scores? This is where the `rubric` parameter comes in.
140+
141+
#### Example 3: Custom Labels and Scoring with Rubrics
142+
143+
Use the `rubric` parameter to specify consistent decision labels and scoring criteria:
144+
145+
**Request:**
146+
```bash
147+
curl -s -X POST \
148+
"http://localhost:8000/api/v1/text/contents" \
149+
-H 'accept: application/json' \
150+
-H 'detector-id: llm_judge' \
151+
-H 'Content-Type: application/json' \
152+
-d '{
153+
"contents": [
154+
"Deep Learning models learn by adjusting weights through backpropagation.",
155+
"Quantum computing is not different compared to classical computing."
156+
],
157+
"detector_params": {
158+
"criteria": "technical accuracy for graduate students",
159+
"rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'."
160+
}
161+
}' | jq
162+
```
163+
164+
**Response:** (with Qwen2.5-7B-instruct)
165+
```json
166+
[
167+
[
168+
{
169+
"start": 0,
170+
"end": 72,
171+
"text": "Deep Learning models learn by adjusting weights through backpropagation.",
172+
"detection": "ACCURATE",
173+
"detection_type": "llm_judge",
174+
"score": 1.0,
175+
"evidences": [],
176+
"metadata": {
177+
"reasoning": "The statement is technically accurate. Deep Learning models indeed learn by adjusting weights through the process of backpropagation, which is a standard method for training neural networks."
178+
}
179+
}
180+
],
181+
[
182+
{
183+
"start": 0,
184+
"end": 67,
185+
"text": "Quantum computing is not different compared to classical computing.",
186+
"detection": "INACCURATE",
187+
"detection_type": "llm_judge",
188+
"score": 0.2,
189+
"evidences": [],
190+
"metadata": {
191+
"reasoning": "Quantum computing operates on fundamentally different principles compared to classical computing, such as superposition and entanglement, which are not present in classical models."
192+
}
193+
}
194+
]
195+
]
196+
```
197+
198+
Note that instead of generic labels (like True/False or 'PASS'/'FAIL'), now we get meaningful labels according to our `rubric`.
199+
200+
If you want to specify a detailed `rubric` to explain what certain score number mean, you can do that as well! Just pass a 'score -> description' mapping for the `rubric` parameter. Or if you want to change the scoring range from 0-1 to 0-10, you can do so by passing `scale: [0, 10]` in detector_params.
201+
202+
#### Example 4: Template Variables
203+
204+
Parameterize criteria using template variables for reusability:
205+
206+
**Request:**
207+
```bash
208+
curl -s -X POST \
209+
"http://localhost:8000/api/v1/text/contents" \
210+
-H 'accept: application/json' \
211+
-H 'detector-id: llm_judge' \
212+
-H 'Content-Type: application/json' \
213+
-d '{
214+
"contents": [
215+
"Deep Learning models learn by adjusting weights through backpropagation.",
216+
"Quantum computing is not different compared to classical computing."
217+
],
218+
"detector_params": {
219+
"criteria": "technical accuracy for {level} students",
220+
"template_vars": {"level": "graduate"},
221+
"rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'."
222+
}
223+
}' | jq
224+
```
225+
226+
Similar response as above.
227+
228+
229+
#### Example 5: Advanced Logic with Jinja Templating
230+
231+
Add conditional logic using Jinja templating:
232+
233+
**Request:**
234+
```bash
235+
curl -s -X POST \
236+
"http://localhost:8000/api/v1/text/contents" \
237+
-H 'accept: application/json' \
238+
-H 'detector-id: llm_judge' \
239+
-H 'Content-Type: application/json' \
240+
-d '{
241+
"contents": [
242+
"Deep Learning models learn by adjusting weights through backpropagation.",
243+
"Quantum computing is not different compared to classical computing."
244+
],
245+
"detector_params": {
246+
"criteria": "Evaluate this content for {audience}.\n{% if technical_level == '\''advanced'\'' %}\nPay special attention to technical accuracy and depth.\n{% else %}\nFocus on clarity and accessibility.\n{% endif %}",
247+
"template_vars": {"audience": "graduate students", "technical_level": "advanced"},
248+
"rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'.",
249+
"template_engine":"jinja2"
250+
}
251+
}' | jq
252+
```
253+
254+
Similar response as above.
255+
256+
### Parameter Reference
257+
258+
Below is the full list of parameters that can be passed to `detector_params` to fully customize and build advanced detection criteria for your guardrails:
259+
260+
- `criteria`: Detailed description of what to evaluate for
261+
- `rubric`: Scoring instructions for evaluation, can be string or dict
262+
- `scale`: Numeric scale for score [min, max]
263+
- `input`: Extra input/question/prompt that the content is responding to
264+
- `metric`: Pre-defined metric name. If provided along with other params, those param fields will take precedence over metric fields
265+
- `template_vars`: Variable mapping to substitute in templates
266+
- `template_engine`: Template engine to use ('format' or 'jinja2'), default is 'format'
267+
- `system_prompt`: Custom system message to take full control of the evaluator LLM persona
268+
- `examples`: Few-shot examples. List of JSON objects, each JSON represents an example and must contain `content`, `score`, and `reasoning` fields and `reasoning` fields
269+
270+
### Get list of pre-defined metric names:
271+
272+
273+
```bash
274+
curl http://localhost:8000/api/v1/metrics | jq
275+
```
276+
Response:
277+
```json
278+
{
279+
"metrics": [
280+
"accuracy",
281+
"agent_performance_template",
282+
"api_docs_template",
283+
"appropriate",
284+
"bias_detection",
285+
"clarity",
286+
"code_quality",
287+
"code_review_template",
288+
"code_security",
289+
"coherence",
290+
"conciseness",
291+
"creativity",
292+
"customer_service_template",
293+
"educational_content_template",
294+
"educational_value",
295+
"factual",
296+
"helpfulness",
297+
"legal_appropriateness",
298+
"llama_guard_3_safety",
299+
"medical_accuracy",
300+
"medical_info_template",
301+
"preference",
302+
"product_review_template",
303+
"professionalism",
304+
"rag_evaluation_template",
305+
"relevance",
306+
"safety",
307+
"summarization_quality",
308+
"toxicity",
309+
"translation_quality",
310+
"writing_quality_template"
311+
],
312+
"total": 31
313+
}
314+
```

0 commit comments

Comments
 (0)