@@ -21,13 +21,13 @@ Once you've chosen a benchmark, run it with `lighteval eval`. Below are examples
21211 . Evaluate a model via Hugging Face Inference Providers.
2222
2323``` bash
24- lighteval eval " hf-inference-providers/openai/gpt-oss-20b" " lighteval| gpqa:diamond|0 "
24+ lighteval eval " hf-inference-providers/openai/gpt-oss-20b" gpqa:diamond
2525```
2626
27272 . Run multiple evals at the same time.
2828
2929``` bash
30- lighteval eval " hf-inference-providers/openai/gpt-oss-20b" " lighteval| gpqa:diamond|0,lighteval| aime25|0 "
30+ lighteval eval " hf-inference-providers/openai/gpt-oss-20b" gpqa:diamond, aime25
3131```
3232
33333 . Compare providers for the same model.
@@ -37,25 +37,32 @@ lighteval eval \
3737 hf-inference-providers/openai/gpt-oss-20b:fireworks-ai \
3838 hf-inference-providers/openai/gpt-oss-20b:together \
3939 hf-inference-providers/openai/gpt-oss-20b:nebius \
40+ gpqa:diamond
41+ ```
42+
43+ You can also compare every providers serving one model in one line:
44+
45+ ``` bash
46+ hf-inference-providers/openai/gpt-oss-20b:all \
4047 " lighteval|gpqa:diamond|0"
4148```
4249
43504 . Evaluate a vLLM or SGLang model.
4451
4552``` bash
46- lighteval eval vllm/HuggingFaceTB/SmolLM-135M-Instruct " lighteval| gpqa:diamond|0 "
53+ lighteval eval vllm/HuggingFaceTB/SmolLM-135M-Instruct gpqa:diamond
4754```
4855
49565 . See the impact of few-shot on your model.
5057
5158``` bash
52- lighteval eval hf-inference-providers/openai/gpt-oss-20b " lighteval| gsm8k|0,lighteval| gsm8k|5"
59+ lighteval eval hf-inference-providers/openai/gpt-oss-20b " gsm8k|0,gsm8k|5"
5360```
5461
55626 . Optimize custom server connections.
5663
5764``` bash
58- lighteval eval hf-inference-providers/openai/gpt-oss-20b " lighteval| gsm8k|0 " \
65+ lighteval eval hf-inference-providers/openai/gpt-oss-20b gsm8k \
5966 --max-connections 50 \
6067 --timeout 30 \
6168 --retry-on-error 1 \
@@ -66,13 +73,13 @@ lighteval eval hf-inference-providers/openai/gpt-oss-20b "lighteval|gsm8k|0" \
66737 . Use multiple epochs for more reliable results.
6774
6875``` bash
69- lighteval eval hf-inference-providers/openai/gpt-oss-20b " lighteval| aime25|0 " --epochs 16 --epochs-reducer " pass_at_4"
76+ lighteval eval hf-inference-providers/openai/gpt-oss-20b aime25 --epochs 16 --epochs-reducer " pass_at_4"
7077```
7178
72798 . Push to the Hub to share results.
7380
7481``` bash
75- lighteval eval hf-inference-providers/openai/gpt-oss-20b " lighteval| hle|0 " \
82+ lighteval eval hf-inference-providers/openai/gpt-oss-20b hle \
7683 --bundle-dir gpt-oss-bundle \
7784 --repo-id OpenEvals/evals \
7885 --max-samples 100
@@ -92,17 +99,17 @@ Resulting Space:
9299You can use any argument defined in inspect-ai's API.
93100
94101``` bash
95- lighteval eval hf-inference-providers/openai/gpt-oss-20b " lighteval| aime25|0 " --temperature 0.1
102+ lighteval eval hf-inference-providers/openai/gpt-oss-20b aime25 --temperature 0.1
96103```
97104
9810510 . Use model-args to use any inference provider specific argument.
99106
100107``` bash
101- lighteval eval google/gemini-2.5-pro " lighteval| aime25|0 " --model-args location=us-east5
108+ lighteval eval google/gemini-2.5-pro aime25 --model-args location=us-east5
102109```
103110
104111``` bash
105- lighteval eval openai/gpt-4o " lighteval| gpqa:diamond|0 " --model-args service_tier=flex,client_timeout=1200
112+ lighteval eval openai/gpt-4o gpqa:diamond --model-args service_tier=flex,client_timeout=1200
106113```
107114
108115
0 commit comments