v0.12.0

Latest

Latest

NathanHB released this 04 Nov 13:33

· 3 commits to main since this release

b77c6b2

v0.12

Exciting release in which we pivot into using inspect-ai as backend and make tasks much easier to find and add thanks to a finder space: here

Screenshot 2025-11-04 at 14-32-45 Benchmark Finder - a Hugging Face Space by OpenEvals

New Features 🎉

Registry refactorisation by @clefourrier in #937
Multilingual extractiveness by @rolshoven in #956
Added backend_options parameter to llm judges. by @rolshoven in #963
Add automatic tests for metrics by @NathanHB in #939
Support local GGUF in VLLM and use HF tokenizer #943 by @JIElite in #972
[RFC] Rework the dependencies to be more versatile by @LysandreJik in #951
Sample to sample compare for integration tests by @NathanHB in #977
Move tasks to individual files by @NathanHB in #1016
Adds inspectai by @NathanHB in #1022

New Tasks

GSM-PLUS by @NathanHB in #780
TUMLU-mini by @ceferisbarov in #811
Filipino Benchmark by @ljvmiranda921 in #852
MMLU Redux by @clefourrier in #883
IFBench by @clefourrier in #944
SLR-Bench by @Ahmad21Omar in #983
MMLU pro by @NathanHB in #1031

Enhancement ⚙️

adds enable_prefix_caching option to VLLMModelConfig by @GAD-cell in #945
Added litellm model config options and improved _prepare_max_new_tokens by @rolshoven in #967
always provide parameters in the metric name to allow using several combinations by @clefourrier in #1017

Documentation 📚

Add org_to_bill parameter to documentation by @tfrere in #781
Update docs and enforces google's docstring style by @NathanHB in #941
Fix broken link by @JoelNiklaus in #1014
Update huggingface-cli login to use newer hf auth login by @Xceron in #1034

Task and Metrics changes 🛠️

Add Bulgarian and Macedonian literals by @dianaonutu in #769
Add TranslationLiterals for Language.DANISH by @spyysalo in #770
Update translation_literals.py with icelandic by @joenaess in #775
Complete TranslationLiterals for Language.ESTONIAN by @spyysalo in #779
Update translation_literals.py by @dianaonutu in #923
Fixing naming for sample evals + adding reqs in aime24 by @clefourrier in #989
add translation literals for various Indic languages (Bengali, Gujarati, Punjabi, Tamil) by @rpm000 in #1015

Bug Fixes 🐛

[#794] Fix: Assign SummaCZS instance to self.summac in Faithfulness metric by @sahilds1 in #795
Catch ROCM/HIP/AMD oom in should_reduce_batch_size by @mcleish7 in #812
Fix GPQA and index extractive metric by @clefourrier in #829
Update extractive_match_utils.py for words where : is preceded by a space by @clefourrier in #831
fixes from_model function and adds tests by @NathanHB in #921
fix tasks list by @alielfilali01 in #906
set upper bound on vllm version by @NathanHB in #964
Fixed bug that prevented the metrics from being mixed (batched/not batched) by @rolshoven in #958
Fix inference providers calls by @clefourrier in #1012
Fixing mixeval by @clefourrier in #1006
Fix typo in attribute name: CONCURENT_CALLS -> CONCURRENT_CALLS by @muupan in #884
Added ability to configure concurrent_requests in litellm_model.py by @dameikle in #911
Added fallback for incomplete configs for vlm models launched as llms by @clefourrier in #828

New Contributors

@pratyushmaini made their first contribution in #697
@DeVikingMark made their first contribution in #782
@sahilds1 made their first contribution in #795
@dianaonutu made their first contribution in #769
@tfrere made their first contribution in #781
@mcleish7 made their first contribution in #812
@leopardracer made their first contribution in #810
@spyysalo made their first contribution in #770
@ceferisbarov made their first contribution in #811
@joenaess made their first contribution in #775
@ryantzr1 made their first contribution in #784
@dtung8068 made their first contribution in #862
@muupan made their first contribution in #884
@NouamaneTazi made their first contribution in #841
@uralik made their first contribution in #887
@dameikle made their first contribution in #911
@ljvmiranda921 made their first contribution in #852
@cpcdoy made their first contribution in #502
@rolshoven made their first contribution in #958
@JIElite made their first contribution in #972
@LysandreJik made their first contribution in #951
@GAD-cell made their first contribution in #945
@amstu2 made their first contribution in #986
@Ahmad21Omar made their first contribution in #983
@cmpatino made their first contribution in #998
@rpm000 made their first contribution in #1015
@Xceron made their first contribution in #1034

Full Changelog: v0.10.0...v0.12.0

Contributors

spyysalo, muupan, and 29 other contributors

Assets 2