Skip to content

Commit 3785d85

Browse files
authored
Just adding the custom metrics system (#65)
1 parent 4907499 commit 3785d85

File tree

4 files changed

+45
-4
lines changed

4 files changed

+45
-4
lines changed

README.md

Lines changed: 22 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -185,8 +185,28 @@ However, we are very grateful to the Harness and HELM teams for their continued
185185

186186
## Customisation
187187
### Adding a new metric
188-
If you want to add a new metric, first check if you can use one of the parametrized functions in `src.lighteval.metrics.metrics_corpus` or `src.lighteval.metrics.metrics_sample`. If not, add it to either of these files depending on the level at which it is applied.
189-
Then, follow the example in `src.lighteval.metrics.metrics` to register your metric.
188+
First check if you can use one of the parametrized functions in `src.lighteval.metrics.metrics_corpus` or `src.lighteval.metrics.metrics_sample`.
189+
190+
If not, you can use the custom_task system to register your new metric:
191+
- create a new python file which should contain the full logic of your metric.
192+
- the file also needs to start with these imports
193+
```python
194+
from aenum import extend_enum
195+
from lighteval.metrics import Metrics
196+
197+
# And any other class you might need to redefine your specific metric, depending on whether it's a sample or corpus metric.
198+
```
199+
200+
- and to end with the following, so that it adds your metric to our metrics list when loaded as a module.
201+
202+
```python
203+
# Adds the metric to the metric list!
204+
extend_enum(Metrics, "ifeval_metric", ifeval_metrics)
205+
if __name__ == "__main__":
206+
print("Imported metric")
207+
```
208+
209+
You can then give your custom metric to lighteval by using `--custom-tasks path_to_your_file` when launching it.
190210

191211
### Adding a new task
192212
To add a new task, first either open an issue, to determine whether it will be integrated in the core evaluations of lighteval, or in the community tasks, and **add its dataset** on the hub.

community_tasks/_template.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,12 @@
66
77
Author:
88
"""
9+
import numpy as np
10+
from aenum import extend_enum
11+
12+
from lighteval.metrics import Metrics
13+
from lighteval.metrics.metrics import SampleLevelMetric
14+
from lighteval.metrics.utils import MetricCategory, MetricUseCase
915
from lighteval.tasks.lighteval_task import LightevalTaskConfig
1016
from lighteval.tasks.requests import Doc
1117
from lighteval.tasks.tasks_prompt_formatting import LETTER_INDICES
@@ -80,6 +86,19 @@ def prompt_fn(line, task_name: str = None):
8086
SUBSET_TASKS = [CustomSubsetTask(name=f"mytask:{subset}", hf_subset=subset) for subset in SAMPLE_SUBSETS]
8187
_TASKS = SUBSET_TASKS + [task]
8288

89+
90+
## CUSTOM METRIC IF NEEDED
91+
custom_metric = SampleLevelMetric(
92+
metric="my_custom_metric_name",
93+
higher_is_better=True,
94+
category=MetricCategory.IGNORED,
95+
use_case=MetricUseCase.NONE,
96+
sample_level_fn=lambda x: x, # how to compute score for one sample
97+
corpus_level_fn=np.mean, # aggregation
98+
)
99+
100+
extend_enum(Metrics, "my_custom_metric_name", custom_metric)
101+
83102
## MODULE LOGIC
84103
# You should not need to touch this
85104
# Convert to dict for lighteval

pyproject.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,9 @@ dependencies = [
6060
"termcolor==2.3.0",
6161
"pytablewriter",
6262
"colorama",
63+
64+
# Extension of metrics
65+
"aenum==3.1.15",
6366
# Base metrics
6467
"nltk==3.8.1",
6568
"numpy",

src/lighteval/metrics/metrics.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
1-
from enum import Enum
2-
31
import numpy as np
2+
from aenum import Enum
43

54
from lighteval.metrics.harness_compatibility.drop import drop_metrics
65
from lighteval.metrics.harness_compatibility.truthful_qa import truthfulqa_mc_metrics

0 commit comments

Comments
 (0)