Skip to content

v0.12.0

Latest

Choose a tag to compare

@NathanHB NathanHB released this 04 Nov 13:33
· 3 commits to main since this release

v0.12

Exciting release in which we pivot into using inspect-ai as backend and make tasks much easier to find and add thanks to a finder space: here

Screenshot 2025-11-04 at 14-32-45 Benchmark Finder - a Hugging Face Space by OpenEvals

New Features 🎉

New Tasks

Enhancement ⚙️

  • adds enable_prefix_caching option to VLLMModelConfig by @GAD-cell in #945
  • Added litellm model config options and improved _prepare_max_new_tokens by @rolshoven in #967
  • always provide parameters in the metric name to allow using several combinations by @clefourrier in #1017

Documentation 📚

Task and Metrics changes 🛠️

  • Add Bulgarian and Macedonian literals by @dianaonutu in #769
  • Add TranslationLiterals for Language.DANISH by @spyysalo in #770
  • Update translation_literals.py with icelandic by @joenaess in #775
  • Complete TranslationLiterals for Language.ESTONIAN by @spyysalo in #779
  • Update translation_literals.py by @dianaonutu in #923
  • Fixing naming for sample evals + adding reqs in aime24 by @clefourrier in #989
  • add translation literals for various Indic languages (Bengali, Gujarati, Punjabi, Tamil) by @rpm000 in #1015

Bug Fixes 🐛

New Contributors

Full Changelog: v0.10.0...v0.12.0