Skip to content

Commit b9a2e56

Browse files
NathanHBhanouticelinaclefourrier
authored
adds inference providers support (#616)
--------- Co-authored-by: Célina <[email protected]> Co-authored-by: Clémentine Fourrier <[email protected]>
1 parent d81bafd commit b9a2e56

File tree

11 files changed

+480
-47
lines changed

11 files changed

+480
-47
lines changed

docs/source/_toctree.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,18 @@
99
- sections:
1010
- local: saving-and-reading-results
1111
title: Save and read results
12-
- local: use-litellm-as-backend
13-
title: Use LITELLM as backend
1412
- local: using-the-python-api
1513
title: Use the Python API
1614
- local: adding-a-custom-task
1715
title: Add a custom task
1816
- local: adding-a-new-metric
1917
title: Add a custom metric
18+
- local: use-inference-providers-as-backend
19+
title: Use HF's inference providers as backend
20+
- local: use-litellm-as-backend
21+
title: Use litellm as backend
2022
- local: use-vllm-as-backend
21-
title: Use VLLM as backend
23+
title: Use vllm as backend
2224
- local: use-sglang-as-backend
2325
title: Use SGLang as backend
2426
- local: evaluate-the-model-on-a-server-or-container

docs/source/index.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
backends—whether it's
55
[transformers](https://github.com/huggingface/transformers),
66
[tgi](https://github.com/huggingface/text-generation-inference),
7+
[inference providers](https://huggingface.co/docs/huggingface_hub/en/guides/inference),
78
[vllm](https://github.com/vllm-project/vllm), or
89
[nanotron](https://github.com/huggingface/nanotron)-with
910
ease. Dive deep into your model’s performance by saving and exploring detailed,
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Inference Providers as backend
2+
3+
Lighteval allows to use Hugging Face's Inference Providers to evaluate llms on supported providers such as Black Forest Labs, Cerebras, Fireworks AI, Nebius, Together AI and many more.
4+
5+
## Quick use
6+
7+
> [!WARNING]
8+
> Do not forget to set your HuggingFace API key.
9+
> You can set it using the `HF_TOKEN` environment variable or by using the `huggingface-cli` command.
10+
11+
12+
```bash
13+
lighteval endpoint inference-providers \
14+
"model=deepseek-ai/DeepSeek-R1,provider=hf-inference" \
15+
"lighteval|gsm8k|0|0"
16+
```
17+
18+
## Using a config file
19+
20+
You can use config files to define the model and the provider to use.
21+
22+
```bash
23+
lighteval endpoint inference-providers \
24+
examples/model_configs/inference_providers.yaml \
25+
"lighteval|gsm8k|0|0"
26+
```
27+
28+
with the following config file:
29+
30+
```yaml
31+
model:
32+
model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
33+
provider: "novita"
34+
timeout: null
35+
proxies: null
36+
parallel_calls_count: 10
37+
generation:
38+
temperature: 0.8
39+
top_k: 10
40+
max_new_tokens: 10000
41+
```

docs/source/use-litellm-as-backend.mdx

Lines changed: 0 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -36,46 +36,3 @@ model:
3636
repetition_penalty: 1.0
3737
frequency_penalty: 0.0
3838
```
39-
40-
## Use Hugging Face Inference Providers
41-
42-
With this you can also access HuggingFace Inference servers, let's look at how to evaluate DeepSeek-R1-Distill-Qwen-32B.
43-
44-
First, let's look at how to acess the model, we can find this from [the model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B).
45-
46-
Step 1:
47-
48-
![Step 1](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lighteval/litellm-guide-2.png)
49-
50-
Step 2:
51-
52-
![Step 2](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lighteval/litellm-guide-1.png)
53-
54-
Great ! Now we can simply copy paste the base_url and our api key to eval our model.
55-
56-
> [!WARNING]
57-
> Do not forget to prepend the provider in the `model_name`. Here we use an
58-
> openai compatible endpoint to the provider is `openai`.
59-
60-
```yaml
61-
model:
62-
base_params:
63-
model_name: "openai/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
64-
base_url: "https://router.huggingface.co/hf-inference/v1"
65-
api_key: "YOUR KEY" # remove or keep empty as needed
66-
generation:
67-
temperature: 0.5
68-
max_new_tokens: 256 # This will overide the default from the tasks config
69-
top_p: 0.9
70-
seed: 0
71-
repetition_penalty: 1.0
72-
frequency_penalty: 0.0
73-
```
74-
75-
And then, we are able to eval our model on any eval available in Lighteval.
76-
77-
```bash
78-
lighteval endpoint litellm \
79-
"examples/model_configs/litellm_model.yaml" \
80-
"lighteval|gsm8k|0|0"
81-
```
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
model:
2+
model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
3+
provider: "novita"
4+
timeout: null
5+
proxies: null
6+
parallel_calls_count: 20
7+
generation:
8+
temperature: 0.8
9+
top_k: 10
10+
max_new_tokens: 10000

src/lighteval/main_endpoint.py

Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -500,3 +500,113 @@ def litellm(
500500
pipeline.save_and_push_results()
501501

502502
return results
503+
504+
505+
@app.command(rich_help_panel="Evaluation Backends")
506+
def inference_providers(
507+
# === general ===
508+
model_args: Annotated[
509+
str,
510+
Argument(
511+
help="config file path for the inference provider model, or a comma separated string of model args (model_name={},provider={},generation={temperature: 0.6})"
512+
),
513+
],
514+
tasks: Annotated[str, Argument(help="Comma-separated list of tasks to evaluate on.")],
515+
# === Common parameters ===
516+
system_prompt: Annotated[
517+
Optional[str], Option(help="Use system prompt for evaluation.", rich_help_panel=HELP_PANEL_NAME_4)
518+
] = None,
519+
dataset_loading_processes: Annotated[
520+
int, Option(help="Number of processes to use for dataset loading.", rich_help_panel=HELP_PANEL_NAME_1)
521+
] = 1,
522+
custom_tasks: Annotated[
523+
Optional[str], Option(help="Path to custom tasks directory.", rich_help_panel=HELP_PANEL_NAME_1)
524+
] = None,
525+
num_fewshot_seeds: Annotated[
526+
int, Option(help="Number of seeds to use for few-shot evaluation.", rich_help_panel=HELP_PANEL_NAME_1)
527+
] = 1,
528+
# === saving ===
529+
output_dir: Annotated[
530+
str, Option(help="Output directory for evaluation results.", rich_help_panel=HELP_PANEL_NAME_2)
531+
] = "results",
532+
push_to_hub: Annotated[
533+
bool, Option(help="Push results to the huggingface hub.", rich_help_panel=HELP_PANEL_NAME_2)
534+
] = False,
535+
push_to_tensorboard: Annotated[
536+
bool, Option(help="Push results to tensorboard.", rich_help_panel=HELP_PANEL_NAME_2)
537+
] = False,
538+
public_run: Annotated[
539+
bool, Option(help="Push results and details to a public repo.", rich_help_panel=HELP_PANEL_NAME_2)
540+
] = False,
541+
results_org: Annotated[
542+
Optional[str], Option(help="Organization to push results to.", rich_help_panel=HELP_PANEL_NAME_2)
543+
] = None,
544+
save_details: Annotated[
545+
bool, Option(help="Save detailed, sample per sample, results.", rich_help_panel=HELP_PANEL_NAME_2)
546+
] = False,
547+
# === debug ===
548+
max_samples: Annotated[
549+
Optional[int], Option(help="Maximum number of samples to evaluate on.", rich_help_panel=HELP_PANEL_NAME_3)
550+
] = None,
551+
job_id: Annotated[
552+
int, Option(help="Optional job id for future reference.", rich_help_panel=HELP_PANEL_NAME_3)
553+
] = 0,
554+
):
555+
"""
556+
Evaluate models using LiteLLM as backend.
557+
"""
558+
559+
from lighteval.logging.evaluation_tracker import EvaluationTracker
560+
from lighteval.models.endpoints.inference_providers_model import (
561+
InferenceProvidersModelConfig,
562+
)
563+
from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters
564+
565+
env_config = EnvConfig(token=TOKEN, cache_dir=CACHE_DIR)
566+
evaluation_tracker = EvaluationTracker(
567+
output_dir=output_dir,
568+
save_details=save_details,
569+
push_to_hub=push_to_hub,
570+
push_to_tensorboard=push_to_tensorboard,
571+
public=public_run,
572+
hub_results_org=results_org,
573+
)
574+
575+
# TODO (nathan): better handling of model_args
576+
parallelism_manager = ParallelismManager.NONE
577+
578+
if model_args.endswith(".yaml"):
579+
model_config = InferenceProvidersModelConfig.from_path(model_args)
580+
else:
581+
model_args_dict: dict = {k.split("=")[0]: k.split("=")[1] if "=" in k else True for k in model_args.split(",")}
582+
model_config = InferenceProvidersModelConfig(**model_args_dict)
583+
584+
pipeline_params = PipelineParameters(
585+
launcher_type=parallelism_manager,
586+
env_config=env_config,
587+
job_id=job_id,
588+
dataset_loading_processes=dataset_loading_processes,
589+
custom_tasks_directory=custom_tasks,
590+
override_batch_size=None,
591+
num_fewshot_seeds=num_fewshot_seeds,
592+
max_samples=max_samples,
593+
use_chat_template=True,
594+
system_prompt=system_prompt,
595+
load_responses_from_details_date_id=None,
596+
)
597+
pipeline = Pipeline(
598+
tasks=tasks,
599+
pipeline_parameters=pipeline_params,
600+
evaluation_tracker=evaluation_tracker,
601+
model_config=model_config,
602+
)
603+
604+
pipeline.evaluate()
605+
606+
pipeline.show_results()
607+
608+
results = pipeline.get_results()
609+
610+
pipeline.save_and_push_results()
611+
612+
return results

0 commit comments

Comments
 (0)