v0.3.0
Release Note
This introduced the new extended tasks feature, documentation and many other patches for improved stability.
New tasks are also introduced:
- Big Bench Hard: https://huggingface.co/papers/2210.09261
 - AGIEval: https://huggingface.co/papers/2304.06364
 - TinyBench:
 - MT Bench: https://huggingface.co/papers/2306.05685
 - AlGhafa Benchmarking Suite: https://aclanthology.org/2023.arabicnlp-1.21/
 
MT-Bench marks the introduction of multi-turn prompting as well as llm-as-a-judge metric.
New tasks
- Add BBH by @clefourrier in #7, @bilgehanertan in #126
 - Add AGIEval by @clefourrier in #121
 - Adding TinyBench by @clefourrier in #104
 - Adding support for Arabic benchmarks : AlGhafa benchmarking suite by @alielfilali01 in #95
 - Add mt-bench by @NathanHB in #75
 
Features
- Extended Tasks ! by @clefourrier in #101, @lewtun in #108, @NathanHB in #122, #123
 - Added support for launching inference endpoint with different model dtypes by @shaltielshmid in #124
 
Documentation
- Adding LICENSE by @clefourrier in #86, @NathanHB in #89
 - Make it clearer in the README that the leaderboard uses the harness by @clefourrier in #94
 
Small patches
- Update huggingface-hub for compatibility with datasets 2.18 by @clefourrier in #84
 - Tidy up dependency groups by @lewtun in #81
 - bump git python by @NathanHB in #90
 - Sets a max length for the MATH task by @clefourrier in #83
 - Fix parallel data processing bug by @clefourrier in #92
 - Change the eos condition for GSM8K by @clefourrier in #85
 - Fixing rolling loglikelihood management by @clefourrier in #78
 - Fixes input length management for generative evals by @clefourrier in #103
 - Reorder addition of instruction in chat template by @clefourrier in #111
 - Ensure chat models terminate generation with EOS token by @lewtun in #115
 - Fix push details to hub by @NathanHB in #98
 - Small fixes to InferenceEndpointModel by @shaltielshmid in #112
 - Fix import typo autogptq by @clefourrier in #116
 - Fixed the loglikelihood method in inference endpoints models by @clefourrier in #119
 - Fix TextGenerationResponse import from hfh by @Wauplin in #129
 - Do not use deprecated list_files_info by @Wauplin in #133
 - Update test workflow name to 'Tests' by @Wauplin in #134
 
New Contributors
- @shaltielshmid made their first contribution in #112
 - @bilgehanertan made their first contribution in #126
 - @Wauplin made their first contribution in #129
 
Full Changelog: v0.2.0...v0.3.0