Skip to content

Commit f95ed69

Browse files
committed
Expand documentation for adding evaluation metrics and LLM models, including testing instructions
1 parent ac4a09e commit f95ed69

File tree

1 file changed

+29
-1
lines changed

1 file changed

+29
-1
lines changed

evaluation/README.md

Lines changed: 29 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -175,4 +175,32 @@ for metric in efficiency_metrics:
175175

176176
### Contributing
177177

178-
You're welcome to add LLM models to test in `server/api/services/llm_services`
178+
#### Adding Evaluation Metrics
179+
180+
To add new evaluation metrics, modify the `evaluate_response()` function in `evaluation/evals.py`:
181+
182+
**Update dependencies** in script header and ensure exception handling includes new metrics with `None` values.
183+
184+
#### Adding New LLM Models
185+
186+
To add a new LLM model for evaluation, implement a handler in `server/api/services/llm_services.py`:
187+
188+
1. **Create a handler class** inheriting from `BaseModelHandler`:
189+
2. **Register in ModelFactory** by adding to the `HANDLERS` dictionary:
190+
3. **Use in experiments** by referencing the handler key in your experiments CSV:
191+
192+
The evaluation system will automatically use your handler through the Factory Method pattern.
193+
194+
195+
#### Running Tests
196+
197+
The evaluation module includes comprehensive tests for all core functions. Run the test suite using:
198+
199+
```sh
200+
uv run test_evals.py
201+
```
202+
203+
The tests cover:
204+
- **Cost calculation** with various token usage and pricing scenarios
205+
- **CSV loading** with validation and error handling
206+
- **Response evaluation** including async operations and exception handling

0 commit comments

Comments
 (0)