Skip to content

Commit c84a768

Browse files
🎉 add readme and fix default engine
1 parent ad205f2 commit c84a768

File tree

4 files changed

+152
-3
lines changed

4 files changed

+152
-3
lines changed

README.md

Lines changed: 84 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,84 @@
1-
# vllm-judge
1+
# vLLM Judge
2+
3+
A lightweight library for LLM-as-a-Judge evaluations using vLLM hosted models.
4+
5+
## Features
6+
7+
- 🚀 **Simple Interface**: Single `evaluate()` method that adapts to any use case
8+
- 🎯 **Pre-built Metrics**: 20+ ready-to-use evaluation metrics
9+
- 🔧 **Template Support**: Dynamic evaluations with template variables
10+
-**High Performance**: Optimized for vLLM with automatic batching
11+
- 🌐 **API Mode**: Run as a REST API service
12+
- 🔄 **Async Native**: Built for high-throughput evaluations
13+
14+
## Installation
15+
16+
```bash
17+
# Basic installation
18+
pip install vllm_judge
19+
20+
# With API support
21+
pip install vllm_judge[api]
22+
23+
# With Jinja2 template support
24+
pip install vllm_judge[jinja2]
25+
26+
# Everything
27+
pip install vllm_judge[api,jinja2]
28+
```
29+
30+
## Quick Start
31+
32+
```python
33+
from vllm_judge import Judge
34+
35+
# Initialize with vLLM url
36+
judge = await Judge.from_url("http://localhost:8000")
37+
38+
# Simple evaluation
39+
result = await judge.evaluate(
40+
response="The Earth orbits around the Sun.",
41+
criteria="scientific accuracy"
42+
)
43+
print(f"Decision: {result.decision}")
44+
print(f"Reasoning: {result.reasoning}")
45+
46+
# Using pre-built metrics
47+
from vllm_judge import CODE_QUALITY
48+
49+
result = await judge.evaluate(
50+
response="def add(a, b): return a + b",
51+
metric=CODE_QUALITY
52+
)
53+
54+
# With template variables
55+
result = await judge.evaluate(
56+
response="Essay content here...",
57+
criteria="Evaluate this {doc_type} for {audience}",
58+
template_vars={
59+
"doc_type": "essay",
60+
"audience": "high school students"
61+
}
62+
)
63+
```
64+
65+
## API Server
66+
67+
Run Judge as a REST API:
68+
69+
```bash
70+
vllm-judge serve --base-url http://localhost:8000 --port 9090 --host localhost
71+
```
72+
73+
Then use the HTTP API:
74+
75+
```python
76+
from vllm_judge.api import JudgeClient
77+
78+
client = JudgeClient("http://localhost:9090")
79+
result = await client.evaluate(
80+
response="Python is great!",
81+
criteria="technical accuracy"
82+
)
83+
```
84+

examples/basic_test.ipynb

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -196,6 +196,72 @@
196196
"res.model_dump()"
197197
]
198198
},
199+
{
200+
"cell_type": "code",
201+
"execution_count": 1,
202+
"metadata": {},
203+
"outputs": [],
204+
"source": [
205+
"from vllm_judge.api import JudgeClient\n",
206+
"\n",
207+
"client = JudgeClient(\"http://localhost:9090\")"
208+
]
209+
},
210+
{
211+
"cell_type": "code",
212+
"execution_count": 2,
213+
"metadata": {},
214+
"outputs": [
215+
{
216+
"data": {
217+
"text/plain": [
218+
"{'status': 'healthy',\n",
219+
" 'version': '0.1.0',\n",
220+
" 'model': 'qwen2',\n",
221+
" 'base_url': 'http://localhost:8080',\n",
222+
" 'uptime_seconds': 62.64390587806702,\n",
223+
" 'total_evaluations': 1,\n",
224+
" 'active_connections': 0,\n",
225+
" 'metrics_available': 24}"
226+
]
227+
},
228+
"execution_count": 2,
229+
"metadata": {},
230+
"output_type": "execute_result"
231+
}
232+
],
233+
"source": [
234+
"await client.health_check()"
235+
]
236+
},
237+
{
238+
"cell_type": "code",
239+
"execution_count": null,
240+
"metadata": {},
241+
"outputs": [
242+
{
243+
"data": {
244+
"text/plain": [
245+
"{'decision': False,\n",
246+
" 'reasoning': 'The response lacks technical detail and does not provide a substantive explanation of why Python is great.',\n",
247+
" 'score': None,\n",
248+
" 'metadata': {'model': 'qwen2',\n",
249+
" 'raw_response': '{\\n \"decision\": false,\\n \"reasoning\": \"The response lacks technical detail and does not provide a substantive explanation of why Python is great.\",\\n \"score\": null\\n}'}}"
250+
]
251+
},
252+
"execution_count": 3,
253+
"metadata": {},
254+
"output_type": "execute_result"
255+
}
256+
],
257+
"source": [
258+
"result = await client.evaluate(\n",
259+
" response=\"Python is great!\",\n",
260+
" criteria=\"technical accuracy\"\n",
261+
")\n",
262+
"result.model_dump() "
263+
]
264+
},
199265
{
200266
"cell_type": "code",
201267
"execution_count": null,

src/vllm_judge/api/client.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ async def evaluate(
7474
system_prompt: str = None,
7575
examples: List[Dict[str, Any]] = None,
7676
template_vars: Dict[str, Any] = None,
77-
template_engine: str = None,
77+
template_engine: str = "format",
7878
**kwargs
7979
) -> EvaluationResult:
8080
"""

src/vllm_judge/api/server.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ async def vllm_judge_exception_handler(request, exc: VLLMJudgeError):
6969
error=exc.__class__.__name__,
7070
detail=str(exc),
7171
code="VLLM_JUDGE_ERROR"
72-
).dict()
72+
).model_dump()
7373
)
7474

7575

0 commit comments

Comments
 (0)