v0.12
Exciting release in which we pivot into using inspect-ai as backend and make tasks much easier to find and add thanks to a finder space: here
New Features 🎉
- Registry refactorisation by @clefourrier in #937
- Multilingual extractiveness by @rolshoven in #956
- Added
backend_optionsparameter to llm judges. by @rolshoven in #963 - Add automatic tests for metrics by @NathanHB in #939
- Support local GGUF in VLLM and use HF tokenizer #943 by @JIElite in #972
- [RFC] Rework the dependencies to be more versatile by @LysandreJik in #951
- Sample to sample compare for integration tests by @NathanHB in #977
- Move tasks to individual files by @NathanHB in #1016
- Adds inspectai by @NathanHB in #1022
New Tasks
- GSM-PLUS by @NathanHB in #780
- TUMLU-mini by @ceferisbarov in #811
- Filipino Benchmark by @ljvmiranda921 in #852
- MMLU Redux by @clefourrier in #883
- IFBench by @clefourrier in #944
- SLR-Bench by @Ahmad21Omar in #983
- MMLU pro by @NathanHB in #1031
Enhancement ⚙️
- adds
enable_prefix_cachingoption to VLLMModelConfig by @GAD-cell in #945 - Added litellm model config options and improved
_prepare_max_new_tokensby @rolshoven in #967 - always provide parameters in the metric name to allow using several combinations by @clefourrier in #1017
Documentation 📚
- Add org_to_bill parameter to documentation by @tfrere in #781
- Update docs and enforces google's docstring style by @NathanHB in #941
- Fix broken link by @JoelNiklaus in #1014
- Update huggingface-cli login to use newer hf auth login by @Xceron in #1034
Task and Metrics changes 🛠️
- Add Bulgarian and Macedonian literals by @dianaonutu in #769
- Add TranslationLiterals for Language.DANISH by @spyysalo in #770
- Update translation_literals.py with icelandic by @joenaess in #775
- Complete TranslationLiterals for Language.ESTONIAN by @spyysalo in #779
- Update translation_literals.py by @dianaonutu in #923
- Fixing naming for sample evals + adding reqs in aime24 by @clefourrier in #989
- add translation literals for various Indic languages (Bengali, Gujarati, Punjabi, Tamil) by @rpm000 in #1015
Bug Fixes 🐛
- [#794] Fix: Assign SummaCZS instance to
self.summacin Faithfulness metric by @sahilds1 in #795 - Catch ROCM/HIP/AMD oom in
should_reduce_batch_sizeby @mcleish7 in #812 - Fix GPQA and index extractive metric by @clefourrier in #829
- Update extractive_match_utils.py for words where
:is preceded by a space by @clefourrier in #831 - fixes from_model function and adds tests by @NathanHB in #921
- fix tasks list by @alielfilali01 in #906
- set upper bound on vllm version by @NathanHB in #964
- Fixed bug that prevented the metrics from being mixed (batched/not batched) by @rolshoven in #958
- Fix inference providers calls by @clefourrier in #1012
- Fixing mixeval by @clefourrier in #1006
- Fix typo in attribute name: CONCURENT_CALLS -> CONCURRENT_CALLS by @muupan in #884
- Added ability to configure concurrent_requests in litellm_model.py by @dameikle in #911
- Added fallback for incomplete configs for vlm models launched as llms by @clefourrier in #828
New Contributors
- @pratyushmaini made their first contribution in #697
- @DeVikingMark made their first contribution in #782
- @sahilds1 made their first contribution in #795
- @dianaonutu made their first contribution in #769
- @tfrere made their first contribution in #781
- @mcleish7 made their first contribution in #812
- @leopardracer made their first contribution in #810
- @spyysalo made their first contribution in #770
- @ceferisbarov made their first contribution in #811
- @joenaess made their first contribution in #775
- @ryantzr1 made their first contribution in #784
- @dtung8068 made their first contribution in #862
- @muupan made their first contribution in #884
- @NouamaneTazi made their first contribution in #841
- @uralik made their first contribution in #887
- @dameikle made their first contribution in #911
- @ljvmiranda921 made their first contribution in #852
- @cpcdoy made their first contribution in #502
- @rolshoven made their first contribution in #958
- @JIElite made their first contribution in #972
- @LysandreJik made their first contribution in #951
- @GAD-cell made their first contribution in #945
- @amstu2 made their first contribution in #986
- @Ahmad21Omar made their first contribution in #983
- @cmpatino made their first contribution in #998
- @rpm000 made their first contribution in #1015
- @Xceron made their first contribution in #1034
Full Changelog: v0.10.0...v0.12.0