v0.9.2
What's Changed
New Features 🎉
- enable together models and reasoning models as judges. by @JoelNiklaus in #537
 - Propagate vLLM batch size controls by @alvin319 in #588
 - Integrate huggingface_hub inference support for LLM as Judge by @alozowski in #651
 - add cot_prompt in vllm by @HERIUN in #654
 - Unify modelargs and use Pydantic for model configs by @NathanHB in #609
 - Improve test by @qubvel in #674
 - adds wandb loging of metrics by @NathanHB in #676
 - Adds wanddb logging by @NathanHB in #685
 - Added custom model inference. by @JoelNiklaus in #437
 - Update split iteration for DynamicBatchingDataset by @qubvel in #684
 
Documentation 📚
- Add --use-chat-template to the broken litellm example by @eldarkurtic in #614
 - Lighteval math by @HERIUN in #630
 - Update quicktour command by @qubvel in #679
 - fix wrong 'custom_task_directory' in python api doc by @xgwang in #671
 - docs: improve consistency in punctuation of metric list by @mariagrandury in #605
 
New Tasks 📈
- add arc agi 2 by @NathanHB in #642
 - Add G-Pass@k Metric by @jnanliu in #589
 - adds simpleqa by @NathanHB in #680
 
Task and Metrics changes 🛠️
- Pass At K Math by @clefourrier in #647
 - Use 
n=16samples to estimatepass@1for AIME benchmarks by @lewtun in #661 - adding uzbek literals by @shopulatov in #664
 - Align AIME pass@1 with literature by @lewtun in #666
 - Update LCB prompt & fix newlines by @rawsh in #645
 - fix gsm8k metric by @NathanHB in #688
 - Add pass@1 for GPQA-D and MATH-500 by @lewtun in #698
 
Bug Fixes 🐛
- Use 
blfoat16as default for vllm models. by @NathanHB in #638 - Fix passing of generation config to main_accelerate by @LoserCheems in #659
 - Parse seed for vLLM by @eldarkurtic in #602
 - Parse string values for add_special_tokens in vLLM by @eldarkurtic in #598
 - hardcode configs to not make lighteval crash if lcb repo unavailable by @NathanHB in #677
 - tokenizer 'padding' param is not correct. by @xgwang in #669
 - Fix TransformersModel.from_model() method by @Vectorrent in #691
 - Inference providers by @clefourrier in #701
 
New Contributors
- @DerekLiu35 made their first contribution in #620
 - @AnikiFan made their first contribution in #610
 - @alvin319 made their first contribution in #588
 - @alozowski made their first contribution in #643
 - @Laz4rz made their first contribution in #613
 - @shopulatov made their first contribution in #664
 - @HERIUN made their first contribution in #654
 - @rawsh made their first contribution in #645
 - @qubvel made their first contribution in #674
 - @xgwang made their first contribution in #669
 - @jnanliu made their first contribution in #589
 - @Vectorrent made their first contribution in #683
 - @omahs made their first contribution in #702
 
Full Changelog: v0.8.0...v0.9.0