huggingface
diff --git a/‎docs/source/_toctree.yml‎
Lines changed: 5 additions & 3 deletions b/‎docs/source/_toctree.yml‎
Lines changed: 5 additions & 3 deletions
diff --git a/‎docs/source/index.mdx‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/index.mdx‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/use-inference-providers-as-backend.mdx‎
Lines changed: 41 additions & 0 deletions b/‎docs/source/use-inference-providers-as-backend.mdx‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎docs/source/use-litellm-as-backend.mdx‎
Lines changed: 0 additions & 43 deletions b/‎docs/source/use-litellm-as-backend.mdx‎
Lines changed: 0 additions & 43 deletions
diff --git a/‎examples/model_configs/inference_providers.yaml‎
Lines changed: 10 additions & 0 deletions b/‎examples/model_configs/inference_providers.yaml‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎src/lighteval/main_endpoint.py‎
Lines changed: 110 additions & 0 deletions b/‎src/lighteval/main_endpoint.py‎
Lines changed: 110 additions & 0 deletions
@@ -9,16 +9,18 @@
 - sections:
   - local: saving-and-reading-results
     title: Save and read results
-  - local: use-litellm-as-backend
-    title: Use LITELLM as backend
   - local: using-the-python-api
     title: Use the Python API
   - local: adding-a-custom-task
     title: Add a custom task
   - local: adding-a-new-metric
     title: Add a custom metric
+  - local: use-inference-providers-as-backend
+    title: Use HF's inference providers as backend
+  - local: use-litellm-as-backend
+    title: Use litellm as backend
   - local: use-vllm-as-backend
-    title: Use VLLM as backend
+    title: Use vllm as backend
   - local: use-sglang-as-backend
     title: Use SGLang as backend
   - local: evaluate-the-model-on-a-server-or-container
 
@@ -4,6 +4,7 @@
 backends—whether it's
 [transformers](https://github.com/huggingface/transformers),
 [tgi](https://github.com/huggingface/text-generation-inference),
+[inference providers](https://huggingface.co/docs/huggingface_hub/en/guides/inference),
 [vllm](https://github.com/vllm-project/vllm), or
 [nanotron](https://github.com/huggingface/nanotron)-with
 ease. Dive deep into your model’s performance by saving and exploring detailed,
 
@@ -0,0 +1,41 @@
+# Inference Providers as backend
+
+Lighteval allows to use Hugging Face's Inference Providers to evaluate llms on supported providers such as Black Forest Labs, Cerebras, Fireworks AI, Nebius, Together AI and many more.
+
+## Quick use
+
+> [!WARNING]
+> Do not forget to set your HuggingFace API key.
+> You can set it using the `HF_TOKEN` environment variable or by using the `huggingface-cli` command.
+
+
+```bash
+lighteval endpoint inference-providers \
+    "model=deepseek-ai/DeepSeek-R1,provider=hf-inference" \
+    "lighteval|gsm8k|0|0"
+```
+
+## Using a config file
+
+You can use config files to define the model and the provider to use.
+
+```bash
+lighteval endpoint inference-providers \
+    examples/model_configs/inference_providers.yaml \
+    "lighteval|gsm8k|0|0"
+```
+
+with the following config file:
+
+```yaml
+model:
+  model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
+  provider: "novita"
+  timeout: null
+  proxies: null
+  parallel_calls_count: 10
+  generation:
+    temperature: 0.8
+    top_k: 10
+    max_new_tokens: 10000
+```
@@ -36,46 +36,3 @@ model:
     repetition_penalty: 1.0
     frequency_penalty: 0.0
 ```
-
-## Use Hugging Face Inference Providers
-
-With this you can also access HuggingFace Inference servers, let's look at how to evaluate DeepSeek-R1-Distill-Qwen-32B.
-
-First, let's look at how to acess the model, we can find this from [the model card](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B).
-
-Step 1:
-
-![Step 1](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lighteval/litellm-guide-2.png)
-
-Step 2:
-
-![Step 2](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/lighteval/litellm-guide-1.png)
-
-Great ! Now we can simply copy paste the base_url and our api key to eval our model.
-
-> [!WARNING]
-> Do not forget to prepend the provider in the `model_name`. Here we use an
-> openai compatible endpoint to the provider is `openai`.
-
-```yaml
-model:
-  base_params:
-    model_name: "openai/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
-    base_url: "https://router.huggingface.co/hf-inference/v1"
-    api_key: "YOUR KEY" # remove or keep empty as needed
-  generation:
-    temperature: 0.5
-    max_new_tokens: 256 # This will overide the default from the tasks config
-    top_p: 0.9
-    seed: 0
-    repetition_penalty: 1.0
-    frequency_penalty: 0.0
-```
-
-And then, we are able to eval our model on any eval available in Lighteval.
-
-```bash
-lighteval endpoint litellm \
-    "examples/model_configs/litellm_model.yaml" \
-    "lighteval|gsm8k|0|0"
-```
@@ -0,0 +1,10 @@
+model:
+  model_name: "deepseek-ai/DeepSeek-R1-Distill-Qwen-32B"
+  provider: "novita"
+  timeout: null
+  proxies: null
+  parallel_calls_count: 20
+  generation:
+    temperature: 0.8
+    top_k: 10
+    max_new_tokens: 10000
@@ -500,3 +500,113 @@ def litellm(
     pipeline.save_and_push_results()
 
     return results
+
+
+@app.command(rich_help_panel="Evaluation Backends")
+def inference_providers(
+    # === general ===
+    model_args: Annotated[
+        str,
+        Argument(
+            help="config file path for the inference provider model, or a comma separated string of model args (model_name={},provider={},generation={temperature: 0.6})"
+        ),
+    ],
+    tasks: Annotated[str, Argument(help="Comma-separated list of tasks to evaluate on.")],
+    # === Common parameters ===
+    system_prompt: Annotated[
+        Optional[str], Option(help="Use system prompt for evaluation.", rich_help_panel=HELP_PANEL_NAME_4)
+    ] = None,
+    dataset_loading_processes: Annotated[
+        int, Option(help="Number of processes to use for dataset loading.", rich_help_panel=HELP_PANEL_NAME_1)
+    ] = 1,
+    custom_tasks: Annotated[
+        Optional[str], Option(help="Path to custom tasks directory.", rich_help_panel=HELP_PANEL_NAME_1)
+    ] = None,
+    num_fewshot_seeds: Annotated[
+        int, Option(help="Number of seeds to use for few-shot evaluation.", rich_help_panel=HELP_PANEL_NAME_1)
+    ] = 1,
+    # === saving ===
+    output_dir: Annotated[
+        str, Option(help="Output directory for evaluation results.", rich_help_panel=HELP_PANEL_NAME_2)
+    ] = "results",
+    push_to_hub: Annotated[
+        bool, Option(help="Push results to the huggingface hub.", rich_help_panel=HELP_PANEL_NAME_2)
+    ] = False,
+    push_to_tensorboard: Annotated[
+        bool, Option(help="Push results to tensorboard.", rich_help_panel=HELP_PANEL_NAME_2)
+    ] = False,
+    public_run: Annotated[
+        bool, Option(help="Push results and details to a public repo.", rich_help_panel=HELP_PANEL_NAME_2)
+    ] = False,
+    results_org: Annotated[
+        Optional[str], Option(help="Organization to push results to.", rich_help_panel=HELP_PANEL_NAME_2)
+    ] = None,
+    save_details: Annotated[
+        bool, Option(help="Save detailed, sample per sample, results.", rich_help_panel=HELP_PANEL_NAME_2)
+    ] = False,
+    # === debug ===
+    max_samples: Annotated[
+        Optional[int], Option(help="Maximum number of samples to evaluate on.", rich_help_panel=HELP_PANEL_NAME_3)
+    ] = None,
+    job_id: Annotated[
+        int, Option(help="Optional job id for future reference.", rich_help_panel=HELP_PANEL_NAME_3)
+    ] = 0,
+):
+    """
+    Evaluate models using LiteLLM as backend.
+    """
+
+    from lighteval.logging.evaluation_tracker import EvaluationTracker
+    from lighteval.models.endpoints.inference_providers_model import (
+        InferenceProvidersModelConfig,
+    )
+    from lighteval.pipeline import EnvConfig, ParallelismManager, Pipeline, PipelineParameters
+
+    env_config = EnvConfig(token=TOKEN, cache_dir=CACHE_DIR)
+    evaluation_tracker = EvaluationTracker(
+        output_dir=output_dir,
+        save_details=save_details,
+        push_to_hub=push_to_hub,
+        push_to_tensorboard=push_to_tensorboard,
+        public=public_run,
+        hub_results_org=results_org,
+    )
+
+    # TODO (nathan): better handling of model_args
+    parallelism_manager = ParallelismManager.NONE
+
+    if model_args.endswith(".yaml"):
+        model_config = InferenceProvidersModelConfig.from_path(model_args)
+    else:
+        model_args_dict: dict = {k.split("=")[0]: k.split("=")[1] if "=" in k else True for k in model_args.split(",")}
+        model_config = InferenceProvidersModelConfig(**model_args_dict)
+
+    pipeline_params = PipelineParameters(
+        launcher_type=parallelism_manager,
+        env_config=env_config,
+        job_id=job_id,
+        dataset_loading_processes=dataset_loading_processes,
+        custom_tasks_directory=custom_tasks,
+        override_batch_size=None,
+        num_fewshot_seeds=num_fewshot_seeds,
+        max_samples=max_samples,
+        use_chat_template=True,
+        system_prompt=system_prompt,
+        load_responses_from_details_date_id=None,
+    )
+    pipeline = Pipeline(
+        tasks=tasks,
+        pipeline_parameters=pipeline_params,
+        evaluation_tracker=evaluation_tracker,
+        model_config=model_config,
+    )
+
+    pipeline.evaluate()
+
+    pipeline.show_results()
+
+    results = pipeline.get_results()
+
+    pipeline.save_and_push_results()
+
+    return results