Skip to content

Commit e941a55

Browse files
Simplified system for extended tasks (#123)
--------- Co-authored-by: Nathan Habib <[email protected]>
1 parent ef631cf commit e941a55

File tree

12 files changed

+80
-43
lines changed

12 files changed

+80
-43
lines changed

.github/workflows/tests.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ jobs:
2626
cache: 'pip'
2727
- name: Install lighteval in editable mode
2828
run: |
29-
pip install -e .[dev]
29+
pip install -e .[dev,extended_tasks]
3030
- name: Get cached files
3131
uses: actions/cache@v2
3232
id: get-cache

README.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -167,28 +167,38 @@ python run_evals_accelerate.py \
167167

168168
Independently of the default tasks provided in `lighteval` that you will find in the `tasks_table.jsonl` file, you can use `lighteval` to evaluate models on tasks that require special processing (or have been added by the community). These tasks have their own evaluation suites and are defined as follows:
169169

170-
* `extended`: tasks which have complex pre- or post-processing and are added by the `lighteval` maintainers. See the [`extended_tasks`](./extended_tasks) folder for examples.
170+
* `extended`: tasks which have complex pre- or post-processing and are added by the `lighteval` maintainers. See the [`extended_tasks`](./src/lighteval/tasks/extended_tasks) folder for examples.
171171
* `community`: tasks which have been added by the community. See the [`community_tasks`](./community_tasks) folder for examples.
172172
* `custom`: tasks which are defined locally and not present in the core library. Use this suite if you want to experiment with designing a special metric or task.
173173

174-
For example, to run an extended task you can run:
174+
175+
For example, to run an extended task like ifeval, you can run:
176+
```shell
177+
python run_evals_accelerate.py \
178+
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
179+
--use_chat_template \ # optional, if you want to run the evaluation with the chat template
180+
--tasks "extended|ifeval|0|0" \
181+
--output_dir "./evals"
182+
```
183+
184+
To run a community or custom task, you can use (note the custom_tasks flag):
175185

176186
```shell
177187
python run_evals_accelerate.py \
178188
--model_args="pretrained=<path to model on the hub>"\
179189
--tasks <task parameters> \
180-
--extended_tasks "extended_tasks" \
190+
--custom_tasks <path to your custom or community task> \
181191
--output_dir output_dir
182192
```
183193

184-
For example, to launch `lighteval` on `ifeval` for `HuggingFaceH4/zephyr-7b-beta`, run:
194+
For example, to launch `lighteval` on `arabic_mmlu:abstract_algebra` for `HuggingFaceH4/zephyr-7b-beta`, run:
185195

186196
```shell
187197
python run_evals_accelerate.py \
188198
--model_args "pretrained=HuggingFaceH4/zephyr-7b-beta" \
189199
--use_chat_template \ # optional, if you want to run the evaluation with the chat template
190-
--tasks "extended|ifeval|0|0" \
191-
--extended_tasks "extended_tasks" \
200+
--tasks "community|arabic_mmlu:abstract_algebra|5|1" \
201+
--custom_tasks "community_tasks/arabic_evals" \
192202
--output_dir "./evals"
193203
```
194204

@@ -209,7 +219,7 @@ However, we are very grateful to the Harness and HELM teams for their continued
209219
- [logging](https://github.com/huggingface/lighteval/tree/main/src/lighteval/logging): Our loggers, to display experiment information and push it to the hub after a run
210220
- [metrics](https://github.com/huggingface/lighteval/tree/main/src/lighteval/metrics): All the available metrics you can use. They are described in metrics, and divided between sample metrics (applied at the sample level, such as a prediction accuracy) and corpus metrics (applied over the whole corpus). You'll also find available normalisation functions.
211221
- [models](https://github.com/huggingface/lighteval/tree/main/src/lighteval/models): Possible models to use. We cover transformers (base_model), with adapter or delta weights, as well as TGI models locally deployed (it's likely the code here is out of date though), and brrr/nanotron models.
212-
- [tasks](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks): Available tasks. The complete list is in `tasks_table.jsonl`, and you'll find all the prompts in `tasks_prompt_formatting.py`.
222+
- [tasks](https://github.com/huggingface/lighteval/tree/main/src/lighteval/tasks): Available tasks. The complete list is in `tasks_table.jsonl`, and you'll find all the prompts in `tasks_prompt_formatting.py`. Popular tasks requiring custom logic are exceptionally added in the [extended tasks](https://github.com/huggingface/lighteval/blob/main/src/lighteval/tasks/extended).
213223
- [tasks_examples](https://github.com/huggingface/lighteval/tree/main/tasks_examples) contains a list of available tasks you can launch. We advise using tasks in the `recommended_set`, as it's possible that some of the other tasks need double checking.
214224
- [tests](https://github.com/huggingface/lighteval/tree/main/tests) contains our test suite, that we run at each PR to prevent regressions in metrics/prompts/tasks, for a subset of important tasks.
215225

@@ -252,9 +262,6 @@ Summary: create a **line summary** of your evaluation, in `src/lighteval/tasks/t
252262

253263
Make sure you can launch your model with your new task using `--tasks lighteval|yournewtask|2|0`.
254264

255-
### Extended evaluations
256-
Proceed as for community evaluations, but in the `extended_tasks` folder.
257-
258265
#### Community evaluations
259266
Copy the `community_tasks/_template.yml` to `community_tasks/yourevalname.py` and edit it to add your custom tasks (the parameters you can use are explained above). It contains an interesting mechanism if the dataset you are adding contains a lot of subsets.
260267

run_evals_accelerate.py

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -104,12 +104,6 @@ def get_parser():
104104
default=None,
105105
help="Path to a file with custom tasks (a TASK list of dict and potentially prompt formating functions)",
106106
)
107-
parser.add_argument(
108-
"--extended_tasks",
109-
type=str,
110-
default=None,
111-
help="Path to the folder which contains all extended tasks",
112-
)
113107
group.add_argument(
114108
"--tasks",
115109
type=str,

src/lighteval/main_accelerate.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ def main(args):
8181
with accelerator.main_process_first() if accelerator is not None else nullcontext():
8282
task_names_list, few_shots_dict = taskinfo_selector(args.tasks)
8383
task_dict = Registry(cache_dir=env_config.cache_dir).get_task_dict(
84-
task_names_list, custom_tasks=args.custom_tasks, extended_tasks=args.extended_tasks
84+
task_names_list, custom_tasks=args.custom_tasks
8585
)
8686
LightevalTask.load_datasets(task_dict.values(), args.dataset_loading_processes)
8787

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# MIT License
2+
3+
# Copyright (c) 2024 The HuggingFace Team
4+
5+
# Permission is hereby granted, free of charge, to any person obtaining a copy
6+
# of this software and associated documentation files (the "Software"), to deal
7+
# in the Software without restriction, including without limitation the rights
8+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
# copies of the Software, and to permit persons to whom the Software is
10+
# furnished to do so, subject to the following conditions:
11+
12+
# The above copyright notice and this permission notice shall be included in all
13+
# copies or substantial portions of the Software.
14+
15+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
# SOFTWARE.
22+
23+
from lighteval.utils import can_load_extended_tasks
24+
25+
26+
if can_load_extended_tasks():
27+
import lighteval.tasks.extended.ifeval.main as ifeval
28+
import lighteval.tasks.extended.tiny_benchmarks.main as tiny_benchmarks
29+
30+
AVAILABLE_EXTENDED_TASKS_MODULES = [ifeval, tiny_benchmarks]
31+
32+
else:
33+
AVAILABLE_EXTENDED_TASKS_MODULES = []

extended_tasks/ifeval/instructions.py renamed to src/lighteval/tasks/extended/ifeval/instructions.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323

2424
import langdetect
2525

26-
import extended_tasks.ifeval.instructions_utils as instructions_util
26+
import lighteval.tasks.extended.ifeval.instructions_utils as instructions_util
2727

2828

2929
logger = logging.getLogger(__name__)

extended_tasks/ifeval/instructions_registry.py renamed to src/lighteval/tasks/extended/ifeval/instructions_registry.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
# limitations under the License.
1414

1515
"""Registry of all instructions."""
16-
import extended_tasks.ifeval.instructions as instructions
16+
import lighteval.tasks.extended.ifeval.instructions as instructions
1717

1818

1919
_KEYWORD = "keywords:"

extended_tasks/ifeval/main.py renamed to src/lighteval/tasks/extended/ifeval/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
import numpy as np
2424
from aenum import extend_enum
2525

26-
import extended_tasks.ifeval.instructions_registry as instructions_registry
26+
import lighteval.tasks.extended.ifeval.instructions_registry as instructions_registry
2727
from lighteval.metrics import Metrics
2828
from lighteval.metrics.utils import (
2929
MetricCategory,

extended_tasks/tiny_benchmarks/main.py renamed to src/lighteval/tasks/extended/tiny_benchmarks/main.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@
2727
Test with `python run_evals_accelerate.py --model_args "pretrained=EleutherAI/pythia-70m" --tasks "extended|tiny:winogrande|0|0,extended|tiny:gsm8k|0|0,extended|tiny:hellaswag|0|0,extended|tiny:arc|0|0,extended|tiny:truthfulqa|0|0" --extended_tasks extended_tasks --output_dir "./evals"`
2828
"""
2929
import os
30+
import pathlib
3031
import pickle
3132

3233
import numpy as np
@@ -40,7 +41,6 @@
4041
from lighteval.metrics.normalizations import gsm8k_normalizer
4142
from lighteval.metrics.utils import MetricCategory, MetricUseCase
4243
from lighteval.tasks.lighteval_task import LightevalTaskConfig
43-
from lighteval.tasks.requests import Doc
4444

4545

4646
# Utility functions
@@ -89,13 +89,15 @@ def __init__(self, task: str):
8989
self.num_samples = 0
9090

9191
def download(self):
92+
# Likely to crash in // processes if we don't include the pkl
93+
path_dld = os.path.join(pathlib.Path(__file__).parent.resolve(), "tinyBenchmarks.pkl")
9294
# Downloading files
93-
if not os.path.isfile("extended_tasks/tiny_benchmarks/tinyBenchmarks.pkl"):
95+
if not os.path.isfile(path_dld):
9496
url = "https://raw.githubusercontent.com/felipemaiapolo/tinyBenchmarks/main/tinyBenchmarks/tinyBenchmarks.pkl"
9597
response = requests.get(url)
9698
if response.status_code == 200:
9799
# Write the content to a file
98-
with open("extended_tasks/tiny_benchmarks/tinyBenchmarks.pkl", "wb") as file:
100+
with open(path_dld, "wb") as file:
99101
file.write(response.content)
100102

101103
def compute(self, **args):

0 commit comments

Comments
 (0)