You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* use inspect-ai to evaluate aime25 and gsm8k
* revert file
* working for 3 tasks
* parallel evals of tasks
* adds gpqa diamond to inspect
* move tasks to individual files
* move tasks to individual files
* enable extended tasks as well
* run precomit hook
* fix mkqa
* chaange extended suite to lighteval
* chaange extended suite to lighteval
* add metdata to tasks
* add metdata to tasks
* remove license notice and put docstring on top of file
* homogenize tags
* add docstring for all multilingual tasks
* add docstring for all multilingual tasks
* add name and dataset to metadata
* use TASKS_TABLE for multilingual tasks
* use TASKS_TABLE for default tasks
* use TASKS_TABLE for default tasks
* loads all tasks correclty
* move community tasks to default tasks and update doc
* move community tasks to default tasks and update doc
* revert uneeded changes
* fix doc build
* fix doc build
* remove custom tasks and let user decide if loading multilingual tasks
* load-tasks multilingual fix
* update doc
* remove uneeded file
* update readme
* update readme
* update readme
* fix test
* add back the custom tasks
* add back the custom tasks
* fix tasks
* fix tasks
* fix tasks
* fix tests
* fix tests
Copy file name to clipboardExpand all lines: docs/source/adding-a-custom-task.mdx
+9-29Lines changed: 9 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,37 +2,17 @@
2
2
3
3
Lighteval provides a flexible framework for creating custom evaluation tasks. This guide explains how to create and integrate new tasks into the evaluation system.
4
4
5
-
## Task Categories
6
-
7
-
Before creating a custom task, consider which category it belongs to:
8
-
9
-
### Core Evaluations
10
-
Core evaluations are evaluations that only require standard logic in their
11
-
metrics and processing, and that we will add to our test suite to ensure non-regression through time. They already see high usage in the community.
12
-
13
-
### Extended Evaluations
14
-
Extended evaluations are evaluations that require custom logic in their
15
-
metrics (complex normalization, an LLM as a judge, etc.), that we added to
16
-
facilitate the life of users. They already see high usage in the community.
17
-
18
-
### Community Evaluations
19
-
Community evaluations are submissions by the community of new tasks.
20
-
21
-
A popular community evaluation can move to become an extended or core evaluation over time.
22
-
23
-
> [!TIP]
24
-
> You can find examples of custom tasks in the [community_tasks](https://github.com/huggingface/lighteval/tree/main/community_tasks) directory.
25
-
26
-
## Step-by-Step Creation of a Custom Task
5
+
## Step-by-Step Creation of a Task
27
6
28
7
> [!WARNING]
29
-
> To contribute your custom task to the Lighteval repository, you would first need
8
+
> To contribute your task to the Lighteval repository, you would first need
30
9
> to install the required dev dependencies by running `pip install -e .[dev]`
31
10
> and then run `pre-commit install` to install the pre-commit hooks.
32
11
33
12
### Step 1: Create the Task File
34
13
35
-
First, create a Python file under the `community_tasks` directory.
14
+
First, create a Python file or directory under the `src/lighteval/tasks/tasks` directory.
15
+
A directory is helpfull if you need to split your file into multiple ones, just make sure to have one of the file named `main.py`.
36
16
37
17
### Step 2: Define the Prompt Function
38
18
@@ -135,12 +115,12 @@ class CustomSubsetTask(LightevalTaskConfig):
135
115
evaluation_splits=["test"],
136
116
few_shots_split="train",
137
117
few_shots_select="random_sampling_from_train",
138
-
suite=["community"],
118
+
suite=["lighteval"],
139
119
generation_size=256,
140
120
stop_sequence=["\n", "Question:"],
141
121
)
142
122
143
-
SUBSET_TASKS= [CustomSubsetTask(name=f"mytask:{subset}", hf_subset=subset) for subset inSAMPLE_SUBSETS]
123
+
SUBSET_TASKS= [CustomSubsetTask(name=f"task:{subset}", hf_subset=subset) for subset inSAMPLE_SUBSETS]
144
124
```
145
125
146
126
### Step 5: Add Tasks to the Table
@@ -169,7 +149,7 @@ Once your file is created, you can run the evaluation with the following command
0 commit comments