Based on firstlinesoftware/eval-ai-library. This is an independently maintained version with additional features and PyPI distribution.
Comprehensive AI model evaluation framework for RAG systems and AI agents. Supports 35+ evaluation metrics, 12 LLM providers, built-in test data generation from documents, and an interactive web dashboard for visualization and analysis. Implements advanced techniques including G-Eval probability-weighted scoring and Temperature-Controlled Verdict Aggregation via Generalized Power Mean.
pip install eval-ai-libraryFull version with document parsing and OCR support:
pip install eval-ai-library[full]Lite version (core evaluation only):
pip install eval-ai-library[lite]from eval_lib import EvalAI
evaluator = EvalAI(model="gpt-4o")
result = evaluator.evaluate(
input="What is Python?",
actual_output="Python is a programming language.",
expected_output="Python is a high-level programming language.",
metrics=["answer_relevancy", "faithfulness"]
)
print(result.score)Full documentation is available at library.eval-ai.com.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
If you use this library in your research, please cite:
@software{eval_ai_library,
author = {Meshkov, Aleksandr},
title = {Eval AI Library: Comprehensive AI Model Evaluation Framework},
year = {2025},
url = {https://github.com/meshkovQA/Eval-ai-library.git}
}This library implements techniques from:
@inproceedings{liu2023geval,
title={G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment},
author={Liu, Yang and Iter, Dan and Xu, Yichong and Wang, Shuohang and Xu, Ruochen and Zhu, Chenguang},
booktitle={Proceedings of EMNLP},
year={2023}
}- Issues: GitHub Issues
- Documentation: library.eval-ai.com