Skip to content

Commit 3cd31fd

Browse files
authored
Move tasks to individual files (#1016)
* use inspect-ai to evaluate aime25 and gsm8k * revert file * working for 3 tasks * parallel evals of tasks * adds gpqa diamond to inspect * move tasks to individual files * move tasks to individual files * enable extended tasks as well * run precomit hook * fix mkqa * chaange extended suite to lighteval * chaange extended suite to lighteval * add metdata to tasks * add metdata to tasks * remove license notice and put docstring on top of file * homogenize tags * add docstring for all multilingual tasks * add docstring for all multilingual tasks * add name and dataset to metadata * use TASKS_TABLE for multilingual tasks * use TASKS_TABLE for default tasks * use TASKS_TABLE for default tasks * loads all tasks correclty * move community tasks to default tasks and update doc * move community tasks to default tasks and update doc * revert uneeded changes * fix doc build * fix doc build * remove custom tasks and let user decide if loading multilingual tasks * load-tasks multilingual fix * update doc * remove uneeded file * update readme * update readme * update readme * fix test * add back the custom tasks * add back the custom tasks * fix tasks * fix tasks * fix tasks * fix tests * fix tests
1 parent bf8b547 commit 3cd31fd

File tree

331 files changed

+24603
-28548
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

331 files changed

+24603
-28548
lines changed

README.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@
2525
<a href="https://huggingface.co/docs/lighteval/main/en/index" target="_blank">
2626
<img alt="Documentation" src="https://img.shields.io/badge/Documentation-4F4F4F?style=for-the-badge&logo=readthedocs&logoColor=white" />
2727
</a>
28+
<a href="https://huggingface.co/spaces/SaylorTwift/benchmark_finder" target="_blank">
29+
<img alt="Open Benchmark Index" src="https://img.shields.io/badge/Open%20Benchmark%20Index-4F4F4F?style=for-the-badge&logo=huggingface&logoColor=white" />
30+
</a>
2831
</p>
2932

3033
---
@@ -39,7 +42,10 @@ sample-by-sample results* to debug and see how your models stack-up.
3942

4043
## Available Tasks
4144

42-
Lighteval supports **7,000+ evaluation tasks** across multiple domains and languages. Here's an overview of some *popular benchmarks*:
45+
Lighteval supports **1000+ evaluation tasks** across multiple domains and
46+
languages. Use [this
47+
space](https://huggingface.co/spaces/SaylorTwift/benchmark_finder) to find what
48+
you need, or, here's an overview of some *popular benchmarks*:
4349

4450

4551
### 📚 **Knowledge**
@@ -62,7 +68,7 @@ Lighteval supports **7,000+ evaluation tasks** across multiple domains and langu
6268

6369
### 🌍 **Multilingual Evaluation**
6470
- **Cross-lingual**: XTREME, Flores200 (200 languages), XCOPA, XQuAD
65-
- **Language-specific**:
71+
- **Language-specific**:
6672
- **Arabic**: ArabicMMLU
6773
- **Filipino**: FilBench
6874
- **French**: IFEval-fr, GPQA-fr, BAC-fr

community_tasks/_template.py

Lines changed: 0 additions & 114 deletions
This file was deleted.

community_tasks/aimo_evals.py

Lines changed: 0 additions & 61 deletions
This file was deleted.

community_tasks/oz_evals.py

Lines changed: 0 additions & 87 deletions
This file was deleted.

community_tasks/slr_bench_requirements.txt

Lines changed: 0 additions & 2 deletions
This file was deleted.

docs/source/adding-a-custom-task.mdx

Lines changed: 9 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -2,37 +2,17 @@
22

33
Lighteval provides a flexible framework for creating custom evaluation tasks. This guide explains how to create and integrate new tasks into the evaluation system.
44

5-
## Task Categories
6-
7-
Before creating a custom task, consider which category it belongs to:
8-
9-
### Core Evaluations
10-
Core evaluations are evaluations that only require standard logic in their
11-
metrics and processing, and that we will add to our test suite to ensure non-regression through time. They already see high usage in the community.
12-
13-
### Extended Evaluations
14-
Extended evaluations are evaluations that require custom logic in their
15-
metrics (complex normalization, an LLM as a judge, etc.), that we added to
16-
facilitate the life of users. They already see high usage in the community.
17-
18-
### Community Evaluations
19-
Community evaluations are submissions by the community of new tasks.
20-
21-
A popular community evaluation can move to become an extended or core evaluation over time.
22-
23-
> [!TIP]
24-
> You can find examples of custom tasks in the [community_tasks](https://github.com/huggingface/lighteval/tree/main/community_tasks) directory.
25-
26-
## Step-by-Step Creation of a Custom Task
5+
## Step-by-Step Creation of a Task
276

287
> [!WARNING]
29-
> To contribute your custom task to the Lighteval repository, you would first need
8+
> To contribute your task to the Lighteval repository, you would first need
309
> to install the required dev dependencies by running `pip install -e .[dev]`
3110
> and then run `pre-commit install` to install the pre-commit hooks.
3211
3312
### Step 1: Create the Task File
3413

35-
First, create a Python file under the `community_tasks` directory.
14+
First, create a Python file or directory under the `src/lighteval/tasks/tasks` directory.
15+
A directory is helpfull if you need to split your file into multiple ones, just make sure to have one of the file named `main.py`.
3616

3717
### Step 2: Define the Prompt Function
3818

@@ -135,12 +115,12 @@ class CustomSubsetTask(LightevalTaskConfig):
135115
evaluation_splits=["test"],
136116
few_shots_split="train",
137117
few_shots_select="random_sampling_from_train",
138-
suite=["community"],
118+
suite=["lighteval"],
139119
generation_size=256,
140120
stop_sequence=["\n", "Question:"],
141121
)
142122

143-
SUBSET_TASKS = [CustomSubsetTask(name=f"mytask:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
123+
SUBSET_TASKS = [CustomSubsetTask(name=f"task:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
144124
```
145125

146126
### Step 5: Add Tasks to the Table
@@ -169,7 +149,7 @@ Once your file is created, you can run the evaluation with the following command
169149
```bash
170150
lighteval accelerate \
171151
"model_name=HuggingFaceH4/zephyr-7b-beta" \
172-
"community|{custom_task}|{fewshots}" \
152+
"lighteval|{task}|{fewshots}" \
173153
--custom-tasks {path_to_your_custom_task_file}
174154
```
175155

@@ -179,12 +159,12 @@ lighteval accelerate \
179159
# Run a custom task with zero-shot evaluation
180160
lighteval accelerate \
181161
"model_name=openai-community/gpt2" \
182-
"community|myothertask|0" \
162+
"lighteval|myothertask|0" \
183163
--custom-tasks community_tasks/my_custom_task.py
184164

185165
# Run a custom task with few-shot evaluation
186166
lighteval accelerate \
187167
"model_name=openai-community/gpt2" \
188-
"community|myothertask|3" \
168+
"lighteval|myothertask|3" \
189169
--custom-tasks community_tasks/my_custom_task.py
190170
```

0 commit comments

Comments
 (0)