v0.3.0

NathanHB released this 29 Mar 16:42

· 406 commits to main since this release

e6fdcc7

Release Note

This introduced the new extended tasks feature, documentation and many other patches for improved stability.
New tasks are also introduced:

Big Bench Hard: https://huggingface.co/papers/2210.09261
AGIEval: https://huggingface.co/papers/2304.06364
TinyBench:
MT Bench: https://huggingface.co/papers/2306.05685
AlGhafa Benchmarking Suite: https://aclanthology.org/2023.arabicnlp-1.21/

MT-Bench marks the introduction of multi-turn prompting as well as llm-as-a-judge metric.

New tasks

Add BBH by @clefourrier in #7, @bilgehanertan in #126
Add AGIEval by @clefourrier in #121
Adding TinyBench by @clefourrier in #104
Adding support for Arabic benchmarks : AlGhafa benchmarking suite by @alielfilali01 in #95
Add mt-bench by @NathanHB in #75

Features

Extended Tasks ! by @clefourrier in #101, @lewtun in #108, @NathanHB in #122, #123
Added support for launching inference endpoint with different model dtypes by @shaltielshmid in #124

Documentation

Adding LICENSE by @clefourrier in #86, @NathanHB in #89
Make it clearer in the README that the leaderboard uses the harness by @clefourrier in #94

Small patches

Update huggingface-hub for compatibility with datasets 2.18 by @clefourrier in #84
Tidy up dependency groups by @lewtun in #81
bump git python by @NathanHB in #90
Sets a max length for the MATH task by @clefourrier in #83
Fix parallel data processing bug by @clefourrier in #92
Change the eos condition for GSM8K by @clefourrier in #85
Fixing rolling loglikelihood management by @clefourrier in #78
Fixes input length management for generative evals by @clefourrier in #103
Reorder addition of instruction in chat template by @clefourrier in #111
Ensure chat models terminate generation with EOS token by @lewtun in #115
Fix push details to hub by @NathanHB in #98
Small fixes to InferenceEndpointModel by @shaltielshmid in #112
Fix import typo autogptq by @clefourrier in #116
Fixed the loglikelihood method in inference endpoints models by @clefourrier in #119
Fix TextGenerationResponse import from hfh by @Wauplin in #129
Do not use deprecated list_files_info by @Wauplin in #133
Update test workflow name to 'Tests' by @Wauplin in #134

New Contributors

@shaltielshmid made their first contribution in #112
@bilgehanertan made their first contribution in #126
@Wauplin made their first contribution in #129

Full Changelog: v0.2.0...v0.3.0

Contributors

Wauplin, shaltielshmid, and 5 other contributors

Assets 2