Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
df27a43
Add Reference docs with Pipeline docs
albertvillanova Nov 29, 2024
afb0ce2
Pin numpy<2
albertvillanova Nov 29, 2024
696416f
Add Tasks docs
albertvillanova Nov 29, 2024
89b2581
Add more Tasks docs
albertvillanova Nov 29, 2024
77f779a
Add Models docs
albertvillanova Nov 29, 2024
aaeee7b
Fix Models docs
albertvillanova Nov 29, 2024
e46b1d3
Remove AdapterModel that requires peft
albertvillanova Nov 29, 2024
17c8088
Remove NanotronLightevalModel and VLLMModel that require nanotron and…
albertvillanova Nov 29, 2024
ee937bb
Fix markdown comment syntax
albertvillanova Nov 29, 2024
6ad6a2f
Add Metrics docs
albertvillanova Nov 29, 2024
7874f1b
Fix typo
albertvillanova Nov 29, 2024
bb1a20a
Remove Main classes section
albertvillanova Nov 29, 2024
d281f10
Add Datasets docs
albertvillanova Nov 29, 2024
812ef35
Create Main classes section with Pipeline
albertvillanova Nov 29, 2024
632e89b
Add EvaluationTracker docs
albertvillanova Nov 29, 2024
3e53ddb
Add ModelConfig docs
albertvillanova Nov 29, 2024
ea6af22
Add ParallelismManager to Pipeline docs
albertvillanova Nov 29, 2024
7a413ac
Add inter-links from using-the-python-api
albertvillanova Nov 29, 2024
4e9c80b
Fix inter-links
albertvillanova Nov 29, 2024
9955186
Add more Metrics docs
albertvillanova Nov 30, 2024
82a8fcb
Comment Metrics enum
albertvillanova Nov 30, 2024
6a08f01
Fix typo
albertvillanova Nov 30, 2024
95ac6d5
Add explanation and GH issue to comment in Metrics enum
albertvillanova Nov 30, 2024
5962be6
Add inter-link to Metrics
albertvillanova Nov 30, 2024
6eb2348
Add subsection titles to LightevalTask
albertvillanova Nov 30, 2024
bb4c95c
Add inter-link to LightevalTaskConfig
albertvillanova Nov 30, 2024
7153bfe
Add inter-link to section heading anchor
albertvillanova Nov 30, 2024
ae8ce62
Add more Metrics docs
albertvillanova Nov 30, 2024
9849a96
Add inter-link to SampleLevelMetric and Grouping
albertvillanova Nov 30, 2024
c5250e7
Add inter-link to LightevalTaskConfig
albertvillanova Nov 30, 2024
f2ead25
Fix section title with trailing colon
albertvillanova Nov 30, 2024
b2d82e3
Add sections to Models docs
albertvillanova Dec 3, 2024
c4ea699
Move Models docs to Main classes section
albertvillanova Dec 3, 2024
39e145b
Document you can pass either model or model config to Pipeline
albertvillanova Dec 3, 2024
83043b3
Move Datasets docs to Tasks docs
albertvillanova Dec 3, 2024
0893e55
Add logging docs
albertvillanova Dec 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions docs/source/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,21 @@
- local: available-tasks
title: Available Tasks
title: API
- sections:
- sections:
- local: package_reference/evaluation_tracker
title: EvaluationTracker
- local: package_reference/models
title: Models
- local: package_reference/model_config
title: ModelConfig
- local: package_reference/pipeline
title: Pipeline
title: Main classes
- local: package_reference/metrics
title: Metrics
- local: package_reference/tasks
title: Tasks
- local: package_reference/logging
title: Logging
title: Reference
8 changes: 5 additions & 3 deletions docs/source/adding-a-custom-task.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,9 @@ def prompt_fn(line, task_name: str = None):
)
```

Then, you need to choose a metric, you can either use an existing one (defined
in `lighteval/metrics/metrics.py`) or [create a custom one](adding-a-new-metric)).
Then, you need to choose a metric: you can either use an existing one (defined
in [`lighteval.metrics.metrics.Metrics`]) or [create a custom one](adding-a-new-metric)).
[//]: # (TODO: Replace lighteval.metrics.metrics.Metrics with ~metrics.metrics.Metrics once its autodoc is added)

```python
custom_metric = SampleLevelMetric(
Expand All @@ -59,7 +60,8 @@ custom_metric = SampleLevelMetric(
)
```

Then, you need to define your task. You can define a task with or without subsets.
Then, you need to define your task using [`~tasks.lighteval_task.LightevalTaskConfig`].
You can define a task with or without subsets.
To define a task with no subsets:

```python
Expand Down
10 changes: 6 additions & 4 deletions docs/source/adding-a-new-metric.mdx
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Adding a New Metric

First, check if you can use one of the parametrized functions in
[src.lighteval.metrics.metrics_corpus]() or
[src.lighteval.metrics.metrics_sample]().
[Corpus Metrics](package_reference/metrics#corpus-metrics) or
[Sample Metrics](package_reference/metrics#sample-metrics).

If not, you can use the `custom_task` system to register your new metric:

Expand Down Expand Up @@ -49,7 +49,8 @@ def agg_function(items):
return score
```

Finally, you can define your metric. If it's a sample level metric, you can use the following code:
Finally, you can define your metric. If it's a sample level metric, you can use the following code
with [`~metrics.utils.metric_utils.SampleLevelMetric`]:

```python
my_custom_metric = SampleLevelMetric(
Expand All @@ -62,7 +63,8 @@ my_custom_metric = SampleLevelMetric(
)
```

If your metric defines multiple metrics per sample, you can use the following code:
If your metric defines multiple metrics per sample, you can use the following code
with [`~metrics.utils.metric_utils.SampleLevelMetricGrouping`]:

```python
custom_metric = SampleLevelMetricGrouping(
Expand Down
6 changes: 3 additions & 3 deletions docs/source/contributing-to-multilingual-evaluations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ Browse the list of all templates [here](https://github.com/huggingface/lighteval
Then, when ready, to define your own task, you should:
1. create a Python file as indicated in the above guide
2. import the relevant templates for your task type (XNLI, Copa, Multiple choice, Question Answering, etc)
3. define one or a list of tasks for each relevant language and evaluation formulation (for multichoice) using our parametrizable `LightevalTaskConfig` class
3. define one or a list of tasks for each relevant language and evaluation formulation (for multichoice) using our parametrizable [`~tasks.lighteval_task.LightevalTaskConfig`] class

```python
your_tasks = [
Expand Down Expand Up @@ -101,7 +101,7 @@ your_tasks = [
4. then, you can go back to the guide to test if your task is correctly implemented!

> [!TIP]
> All `LightevalTaskConfig` parameters are strongly typed, including the inputs to the template function. Make sure to take advantage of your IDE's functionality to make it easier to correctly fill these parameters.
> All [`~tasks.lighteval_task.LightevalTaskConfig`] parameters are strongly typed, including the inputs to the template function. Make sure to take advantage of your IDE's functionality to make it easier to correctly fill these parameters.


Once everything is good, open a PR, and we'll be happy to review it!
Once everything is good, open a PR, and we'll be happy to review it!
2 changes: 1 addition & 1 deletion docs/source/metric-list.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ These metrics need the model to generate an output. They are therefore slower.
- `quasi_exact_match_gsm8k`: Fraction of instances where the normalized prediction matches the normalized gold (normalization done for gsm8k, where latex symbols, units, etc are removed)
- `maj_at_8_gsm8k`: Majority choice evaluation, using the gsm8k normalisation for the predictions and gold

## LLM-as-Judge:
## LLM-as-Judge
- `llm_judge_gpt3p5`: Can be used for any generative task, the model will be scored by a GPT3.5 model using the OpenAI API
- `llm_judge_llama_3_405b`: Can be used for any generative task, the model will be scored by a Llama 3.405B model using the HuggingFace API
- `llm_judge_multi_turn_gpt3p5`: Can be used for any generative task, the model will be scored by a GPT3.5 model using the OpenAI API. It is used for multiturn tasks like mt-bench.
Expand Down
3 changes: 3 additions & 0 deletions docs/source/package_reference/evaluation_tracker.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# EvaluationTracker

[[autodoc]] logging.evaluation_tracker.EvaluationTracker
12 changes: 12 additions & 0 deletions docs/source/package_reference/logging.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# Loggers

## GeneralConfigLogger
[[autodoc]] logging.info_loggers.GeneralConfigLogger
## DetailsLogger
[[autodoc]] logging.info_loggers.DetailsLogger
## MetricsLogger
[[autodoc]] logging.info_loggers.MetricsLogger
## VersionsLogger
[[autodoc]] logging.info_loggers.VersionsLogger
## TaskConfigLogger
[[autodoc]] logging.info_loggers.TaskConfigLogger
70 changes: 70 additions & 0 deletions docs/source/package_reference/metrics.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Metrics

## Metrics
[//]: # (TODO: aenum.Enum raises error when generating docs: not supported by inspect.signature. See: https://github.com/ethanfurman/aenum/issues/44)
[//]: # (### Metrics)
[//]: # ([[autodoc]] metrics.metrics.Metrics)
### Metric
[[autodoc]] metrics.utils.metric_utils.Metric
### CorpusLevelMetric
[[autodoc]] metrics.utils.metric_utils.CorpusLevelMetric
### SampleLevelMetric
[[autodoc]] metrics.utils.metric_utils.SampleLevelMetric
### MetricGrouping
[[autodoc]] metrics.utils.metric_utils.MetricGrouping
### CorpusLevelMetricGrouping
[[autodoc]] metrics.utils.metric_utils.CorpusLevelMetricGrouping
### SampleLevelMetricGrouping
[[autodoc]] metrics.utils.metric_utils.SampleLevelMetricGrouping

## Corpus Metrics
### CorpusLevelF1Score
[[autodoc]] metrics.metrics_corpus.CorpusLevelF1Score
### CorpusLevelPerplexityMetric
[[autodoc]] metrics.metrics_corpus.CorpusLevelPerplexityMetric
### CorpusLevelTranslationMetric
[[autodoc]] metrics.metrics_corpus.CorpusLevelTranslationMetric
### matthews_corrcoef
[[autodoc]] metrics.metrics_corpus.matthews_corrcoef

## Sample Metrics
### ExactMatches
[[autodoc]] metrics.metrics_sample.ExactMatches
### F1_score
[[autodoc]] metrics.metrics_sample.F1_score
### LoglikelihoodAcc
[[autodoc]] metrics.metrics_sample.LoglikelihoodAcc
### NormalizedMultiChoiceProbability
[[autodoc]] metrics.metrics_sample.NormalizedMultiChoiceProbability
### Probability
[[autodoc]] metrics.metrics_sample.Probability
### Recall
[[autodoc]] metrics.metrics_sample.Recall
### MRR
[[autodoc]] metrics.metrics_sample.MRR
### ROUGE
[[autodoc]] metrics.metrics_sample.ROUGE
### BertScore
[[autodoc]] metrics.metrics_sample.BertScore
### Extractiveness
[[autodoc]] metrics.metrics_sample.Extractiveness
### Faithfulness
[[autodoc]] metrics.metrics_sample.Faithfulness
### BLEURT
[[autodoc]] metrics.metrics_sample.BLEURT
### BLEU
[[autodoc]] metrics.metrics_sample.BLEU
### StringDistance
[[autodoc]] metrics.metrics_sample.StringDistance
### JudgeLLM
[[autodoc]] metrics.metrics_sample.JudgeLLM
### JudgeLLMMTBench
[[autodoc]] metrics.metrics_sample.JudgeLLMMTBench
### JudgeLLMMixEval
[[autodoc]] metrics.metrics_sample.JudgeLLMMixEval
### MajAtK
[[autodoc]] metrics.metrics_sample.MajAtK

## LLM-as-a-Judge
### JudgeLM
[[autodoc]] metrics.llm_as_judge.JudgeLM
12 changes: 12 additions & 0 deletions docs/source/package_reference/model_config.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# ModelConfig

[[autodoc]] models.model_config.BaseModelConfig

[[autodoc]] models.model_config.AdapterModelConfig
[[autodoc]] models.model_config.DeltaModelConfig
[[autodoc]] models.model_config.InferenceEndpointModelConfig
[[autodoc]] models.model_config.InferenceModelConfig
[[autodoc]] models.model_config.TGIModelConfig
[[autodoc]] models.model_config.VLLMModelConfig

[[autodoc]] models.model_config.create_model_config
30 changes: 30 additions & 0 deletions docs/source/package_reference/models.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Models

## Model
### LightevalModel
[[autodoc]] models.abstract_model.LightevalModel

## Accelerate and Transformers Models
### BaseModel
[[autodoc]] models.base_model.BaseModel
[//]: # (TODO: Fix import error)
[//]: # (### AdapterModel)
[//]: # ([[autodoc]] models.adapter_model.AdapterModel)
### DeltaModel
[[autodoc]] models.delta_model.DeltaModel

## Inference Endpoints and TGI Models
### InferenceEndpointModel
[[autodoc]] models.endpoint_model.InferenceEndpointModel
### ModelClient
[[autodoc]] models.tgi_model.ModelClient

[//]: # (TODO: Fix import error)
[//]: # (## Nanotron Model)
[//]: # (### NanotronLightevalModel)
[//]: # ([[autodoc]] models.nanotron_model.NanotronLightevalModel)

[//]: # (TODO: Fix import error)
[//]: # (## VLLM Model)
[//]: # (### VLLMModel)
[//]: # ([[autodoc]] models.vllm_model.VLLMModel)
13 changes: 13 additions & 0 deletions docs/source/package_reference/pipeline.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Pipeline

## Pipeline

[[autodoc]] pipeline.Pipeline

## PipelineParameters

[[autodoc]] pipeline.PipelineParameters

## ParallelismManager

[[autodoc]] pipeline.ParallelismManager
38 changes: 38 additions & 0 deletions docs/source/package_reference/tasks.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Tasks

## LightevalTask
### LightevalTaskConfig
[[autodoc]] tasks.lighteval_task.LightevalTaskConfig
### LightevalTask
[[autodoc]] tasks.lighteval_task.LightevalTask

## PromptManager

[[autodoc]] tasks.prompt_manager.PromptManager

## Registry

[[autodoc]] tasks.registry.Registry

## Requests

[[autodoc]] tasks.requests.Request

[[autodoc]] tasks.requests.LoglikelihoodRequest

[[autodoc]] tasks.requests.LoglikelihoodSingleTokenRequest

[[autodoc]] tasks.requests.LoglikelihoodRollingRequest

[[autodoc]] tasks.requests.GreedyUntilRequest

[[autodoc]] tasks.requests.GreedyUntilMultiTurnRequest

## Datasets

[[autodoc]] data.DynamicBatchDataset
[[autodoc]] data.LoglikelihoodDataset
[[autodoc]] data.LoglikelihoodSingleTokenDataset
[[autodoc]] data.GenerativeTaskDataset
[[autodoc]] data.GenerativeTaskDatasetNanotron
[[autodoc]] data.GenDistributedSampler
7 changes: 4 additions & 3 deletions docs/source/using-the-python-api.mdx
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
# Using the Python API

Lighteval can be used from a custom python script. To evaluate a model you will
need to setup an `evaluation_tracker`, `pipeline_parameters`, `model_config`
and a `pipeline`.
Lighteval can be used from a custom python script. To evaluate a model you will need to set up an
[`~logging.evaluation_tracker.EvaluationTracker`], [`~pipeline.PipelineParameters`],
a [`model`](package_reference/models) or a [`model_config`](package_reference/model_config),
and a [`~pipeline.Pipeline`].

After that, simply run the pipeline and save the results.

Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ dependencies = [
"torch>=2.0,<2.5",
"GitPython>=3.1.41", # for logging
"datasets>=2.14.0",
"numpy<2", # pinned to avoid incompatibilities
# Prettiness
"termcolor==2.3.0",
"pytablewriter",
Expand Down
Loading