|
| 1 | +## LLM Judge Detector Examples |
| 2 | + |
| 3 | +The LLM Judge detector integrates [vLLM Judge](https://github.com/trustyai-explainability/vllm_judge) into the Guardrails Detector ecosystem, enabling any OpenAI API compatible LLM to assess content using flexible, natural-language criteria. |
| 4 | + |
| 5 | +### Local Setup |
| 6 | + |
| 7 | +1. **Start an OpenAI-compatible LLM server** (example with [vLLM](https://docs.vllm.ai/en/stable/serving/openai_compatible_server.html)): |
| 8 | +```bash |
| 9 | +vllm serve Qwen/Qwen2.5-7B-Instruct --port 9090 |
| 10 | +``` |
| 11 | +Server accessible at `http://localhost:9090` |
| 12 | + |
| 13 | +2. **Build the detector image**: |
| 14 | +```bash |
| 15 | +export LLM_Judge_IMAGE=llm_judge-detector:latest |
| 16 | +podman build -f detectors/Dockerfile.judge -t $LLM_Judge_IMAGE detectors |
| 17 | +``` |
| 18 | + |
| 19 | +3. **Run the detector container**: |
| 20 | +```bash |
| 21 | +podman run --rm -p 8000:8000 -e VLLM_BASE_URL=http://host.docker.internal:9090 $LLM_Judge_IMAGE |
| 22 | +``` |
| 23 | + |
| 24 | +### Examples |
| 25 | + |
| 26 | +#### Example 1: Pre-built Metrics |
| 27 | + |
| 28 | +Instead of writing complex prompts let us use pre-defined prompts for common scenarios. vLLM-Judge library provides various builtin metrics (which are pre-defined prompts) including moderation (`safety`, `toxicity`, `bias`), general purpose (`helpfulness`, `clarity`, `conciseness`, `relevance`, `coherence`) and others (like code quality, content quality). Let's use one of the pre-defined metrics from vLLM-Judge, eg., `safety`. |
| 29 | + |
| 30 | +The `safety` metric evaluates the content safety across multiple dimensions like physical harm, psychological harm, social harm and vulnerable groups. |
| 31 | + |
| 32 | +**Request:** |
| 33 | +```bash |
| 34 | +curl -s -X POST \ |
| 35 | + "http://localhost:8000/api/v1/text/contents" \ |
| 36 | + -H 'accept: application/json' \ |
| 37 | + -H 'detector-id: llm_judge' \ |
| 38 | + -H 'Content-Type: application/json' \ |
| 39 | + -d '{ |
| 40 | + "contents": ["You dotard, I really hate this stuff", "I simply love this stuff"], |
| 41 | + "detector_params": {"metric":"safety"} |
| 42 | + }' | jq |
| 43 | +``` |
| 44 | + |
| 45 | +**Response:** (with Qwen2.5-7B-instruct) |
| 46 | +```json |
| 47 | +[ |
| 48 | + [ |
| 49 | + { |
| 50 | + "start": 0, |
| 51 | + "end": 36, |
| 52 | + "text": "You dotard, I really hate this stuff", |
| 53 | + "detection": "UNSAFE", |
| 54 | + "detection_type": "llm_judge", |
| 55 | + "score": 0.2, |
| 56 | + "evidences": [], |
| 57 | + "metadata": { |
| 58 | + "reasoning": "The content contains a derogatory term and expresses strong negative emotions, which could potentially cause psychological harm or social conflict. It does not pose physical or legal risks but the emotional tone could be harmful." |
| 59 | + } |
| 60 | + } |
| 61 | + ], |
| 62 | + [ |
| 63 | + { |
| 64 | + "start": 0, |
| 65 | + "end": 24, |
| 66 | + "text": "I simply love this stuff", |
| 67 | + "detection": "SAFE", |
| 68 | + "detection_type": "llm_judge", |
| 69 | + "score": 0.8, |
| 70 | + "evidences": [], |
| 71 | + "metadata": { |
| 72 | + "reasoning": "The statement is a simple expression of personal preference and does not contain any instructions or content that could cause physical, psychological, social, or legal harm. It is safe for all audiences." |
| 73 | + } |
| 74 | + } |
| 75 | + ] |
| 76 | +] |
| 77 | +``` |
| 78 | + |
| 79 | +**Note:** For all built-in metrics, lower scores indicate negative classifications (UNSAFE, TOXIC, BIASED) and higher scores indicate positive classifications (SAFE, NON_TOXIC, UNBIASED). Model reasoning is accessible via response metadata. |
| 80 | + |
| 81 | +#### Example 2: Custom Criteria |
| 82 | + |
| 83 | +Create custom evaluation criteria using simple natural language: |
| 84 | + |
| 85 | +**Request:** |
| 86 | +```bash |
| 87 | +curl -s -X POST \ |
| 88 | + "http://localhost:8000/api/v1/text/contents" \ |
| 89 | + -H 'accept: application/json' \ |
| 90 | + -H 'detector-id: llm_judge' \ |
| 91 | + -H 'Content-Type: application/json' \ |
| 92 | + -d '{ |
| 93 | + "contents": [ |
| 94 | + "Deep Learning models learn by adjusting weights through backpropagation.", |
| 95 | + "Quantum computing is not different compared to classical computing." |
| 96 | + ], |
| 97 | + "detector_params": { |
| 98 | + "criteria": "technical accuracy for graduate students" |
| 99 | + } |
| 100 | +}' | jq |
| 101 | +``` |
| 102 | + |
| 103 | +**Response:** (with Qwen2.5-7B-instruct) |
| 104 | +```json |
| 105 | +[ |
| 106 | + [ |
| 107 | + { |
| 108 | + "start": 0, |
| 109 | + "end": 72, |
| 110 | + "text": "Deep Learning models learn by adjusting weights through backpropagation.", |
| 111 | + "detection": "True", |
| 112 | + "detection_type": "llm_judge", |
| 113 | + "score": 1.0, |
| 114 | + "evidences": [], |
| 115 | + "metadata": { |
| 116 | + "reasoning": "The statement is technically accurate. Deep Learning models indeed learn by adjusting weights through the process of backpropagation, which is a standard and well-understood method in the field." |
| 117 | + } |
| 118 | + } |
| 119 | + ], |
| 120 | + [ |
| 121 | + { |
| 122 | + "start": 0, |
| 123 | + "end": 67, |
| 124 | + "text": "Quantum computing is not different compared to classical computing.", |
| 125 | + "detection": "FAIL", |
| 126 | + "detection_type": "llm_judge", |
| 127 | + "score": 0.2, |
| 128 | + "evidences": [], |
| 129 | + "metadata": { |
| 130 | + "reasoning": "The statement is incorrect as quantum computing fundamentally differs from classical computing in terms of principles, algorithms, and potential applications." |
| 131 | + } |
| 132 | + } |
| 133 | + ] |
| 134 | +] |
| 135 | +``` |
| 136 | + |
| 137 | +We get pretty ok results where model uses positive label (like 'True') and higher scores (like 1.0) for positive instances i.e., those that satisfy the criteria and similarly negative label ('FAIL') and lower score (0.2) for negative instances i.e., those that do not satisfy the criteria. |
| 138 | + |
| 139 | +But how do you specify which labels to use and how to assign scores? This is where the `rubric` parameter comes in. |
| 140 | + |
| 141 | +#### Example 3: Custom Labels and Scoring with Rubrics |
| 142 | + |
| 143 | +Use the `rubric` parameter to specify consistent decision labels and scoring criteria: |
| 144 | + |
| 145 | +**Request:** |
| 146 | +```bash |
| 147 | +curl -s -X POST \ |
| 148 | + "http://localhost:8000/api/v1/text/contents" \ |
| 149 | + -H 'accept: application/json' \ |
| 150 | + -H 'detector-id: llm_judge' \ |
| 151 | + -H 'Content-Type: application/json' \ |
| 152 | + -d '{ |
| 153 | + "contents": [ |
| 154 | +"Deep Learning models learn by adjusting weights through backpropagation.", |
| 155 | +"Quantum computing is not different compared to classical computing." |
| 156 | +], |
| 157 | + "detector_params": { |
| 158 | +"criteria": "technical accuracy for graduate students", |
| 159 | +"rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'." |
| 160 | +} |
| 161 | + }' | jq |
| 162 | +``` |
| 163 | + |
| 164 | +**Response:** (with Qwen2.5-7B-instruct) |
| 165 | +```json |
| 166 | +[ |
| 167 | + [ |
| 168 | + { |
| 169 | + "start": 0, |
| 170 | + "end": 72, |
| 171 | + "text": "Deep Learning models learn by adjusting weights through backpropagation.", |
| 172 | + "detection": "ACCURATE", |
| 173 | + "detection_type": "llm_judge", |
| 174 | + "score": 1.0, |
| 175 | + "evidences": [], |
| 176 | + "metadata": { |
| 177 | + "reasoning": "The statement is technically accurate. Deep Learning models indeed learn by adjusting weights through the process of backpropagation, which is a standard method for training neural networks." |
| 178 | + } |
| 179 | + } |
| 180 | + ], |
| 181 | + [ |
| 182 | + { |
| 183 | + "start": 0, |
| 184 | + "end": 67, |
| 185 | + "text": "Quantum computing is not different compared to classical computing.", |
| 186 | + "detection": "INACCURATE", |
| 187 | + "detection_type": "llm_judge", |
| 188 | + "score": 0.2, |
| 189 | + "evidences": [], |
| 190 | + "metadata": { |
| 191 | + "reasoning": "Quantum computing operates on fundamentally different principles compared to classical computing, such as superposition and entanglement, which are not present in classical models." |
| 192 | + } |
| 193 | + } |
| 194 | + ] |
| 195 | +] |
| 196 | +``` |
| 197 | + |
| 198 | +Note that instead of generic labels (like True/False or 'PASS'/'FAIL'), now we get meaningful labels according to our `rubric`. |
| 199 | + |
| 200 | +If you want to specify a detailed `rubric` to explain what certain score number mean, you can do that as well! Just pass a 'score -> description' mapping for the `rubric` parameter. Or if you want to change the scoring range from 0-1 to 0-10, you can do so by passing `scale: [0, 10]` in detector_params. |
| 201 | + |
| 202 | +#### Example 4: Template Variables |
| 203 | + |
| 204 | +Parameterize criteria using template variables for reusability: |
| 205 | + |
| 206 | +**Request:** |
| 207 | +```bash |
| 208 | +curl -s -X POST \ |
| 209 | + "http://localhost:8000/api/v1/text/contents" \ |
| 210 | + -H 'accept: application/json' \ |
| 211 | + -H 'detector-id: llm_judge' \ |
| 212 | + -H 'Content-Type: application/json' \ |
| 213 | + -d '{ |
| 214 | + "contents": [ |
| 215 | + "Deep Learning models learn by adjusting weights through backpropagation.", |
| 216 | + "Quantum computing is not different compared to classical computing." |
| 217 | + ], |
| 218 | + "detector_params": { |
| 219 | + "criteria": "technical accuracy for {level} students", |
| 220 | + "template_vars": {"level": "graduate"}, |
| 221 | + "rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'." |
| 222 | + } |
| 223 | + }' | jq |
| 224 | +``` |
| 225 | + |
| 226 | +Similar response as above. |
| 227 | + |
| 228 | + |
| 229 | +#### Example 5: Advanced Logic with Jinja Templating |
| 230 | + |
| 231 | +Add conditional logic using Jinja templating: |
| 232 | + |
| 233 | +**Request:** |
| 234 | +```bash |
| 235 | +curl -s -X POST \ |
| 236 | + "http://localhost:8000/api/v1/text/contents" \ |
| 237 | + -H 'accept: application/json' \ |
| 238 | + -H 'detector-id: llm_judge' \ |
| 239 | + -H 'Content-Type: application/json' \ |
| 240 | + -d '{ |
| 241 | + "contents": [ |
| 242 | + "Deep Learning models learn by adjusting weights through backpropagation.", |
| 243 | + "Quantum computing is not different compared to classical computing." |
| 244 | + ], |
| 245 | + "detector_params": { |
| 246 | + "criteria": "Evaluate this content for {audience}.\n{% if technical_level == '\''advanced'\'' %}\nPay special attention to technical accuracy and depth.\n{% else %}\nFocus on clarity and accessibility.\n{% endif %}", |
| 247 | + "template_vars": {"audience": "graduate students", "technical_level": "advanced"}, |
| 248 | + "rubric": "Assign lower scores for inaccurate content and higher scores for accurate ones. Also assign appropriate decision labels like 'ACCURATE', 'INACCURATE' and 'SOMEWHAT_ACCURATE'.", |
| 249 | + "template_engine":"jinja2" |
| 250 | + } |
| 251 | + }' | jq |
| 252 | +``` |
| 253 | + |
| 254 | +Similar response as above. |
| 255 | + |
| 256 | +### Parameter Reference |
| 257 | + |
| 258 | +Below is the full list of parameters that can be passed to `detector_params` to fully customize and build advanced detection criteria for your guardrails: |
| 259 | + |
| 260 | +- `criteria`: Detailed description of what to evaluate for |
| 261 | +- `rubric`: Scoring instructions for evaluation, can be string or dict |
| 262 | +- `scale`: Numeric scale for score [min, max] |
| 263 | +- `input`: Extra input/question/prompt that the content is responding to |
| 264 | +- `metric`: Pre-defined metric name. If provided along with other params, those param fields will take precedence over metric fields |
| 265 | +- `template_vars`: Variable mapping to substitute in templates |
| 266 | +- `template_engine`: Template engine to use ('format' or 'jinja2'), default is 'format' |
| 267 | +- `system_prompt`: Custom system message to take full control of the evaluator LLM persona |
| 268 | +- `examples`: Few-shot examples. List of JSON objects, each JSON represents an example and must contain `content`, `score`, and `reasoning` fields |
| 269 | + |
| 270 | +### Get list of pre-defined metric names: |
| 271 | + |
| 272 | + |
| 273 | +```bash |
| 274 | +curl http://localhost:8000/api/v1/metrics | jq |
| 275 | +``` |
| 276 | +Response: |
| 277 | +```json |
| 278 | +{ |
| 279 | + "metrics": [ |
| 280 | + "accuracy", |
| 281 | + "agent_performance_template", |
| 282 | + "api_docs_template", |
| 283 | + "appropriate", |
| 284 | + "bias_detection", |
| 285 | + "clarity", |
| 286 | + "code_quality", |
| 287 | + "code_review_template", |
| 288 | + "code_security", |
| 289 | + "coherence", |
| 290 | + "conciseness", |
| 291 | + "creativity", |
| 292 | + "customer_service_template", |
| 293 | + "educational_content_template", |
| 294 | + "educational_value", |
| 295 | + "factual", |
| 296 | + "helpfulness", |
| 297 | + "legal_appropriateness", |
| 298 | + "llama_guard_3_safety", |
| 299 | + "medical_accuracy", |
| 300 | + "medical_info_template", |
| 301 | + "preference", |
| 302 | + "product_review_template", |
| 303 | + "professionalism", |
| 304 | + "rag_evaluation_template", |
| 305 | + "relevance", |
| 306 | + "safety", |
| 307 | + "summarization_quality", |
| 308 | + "toxicity", |
| 309 | + "translation_quality", |
| 310 | + "writing_quality_template" |
| 311 | + ], |
| 312 | + "total": 31 |
| 313 | +} |
| 314 | +``` |
0 commit comments