You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+83-23Lines changed: 83 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -63,7 +63,35 @@ cd lm-evaluation-harness
63
63
pip install -e .
64
64
```
65
65
66
-
We also provide a number of optional dependencies for extended functionality. A detailed table is available at the end of this document.
66
+
### Installing Model Backends
67
+
68
+
The base installation provides the core evaluation framework. **Model backends must be installed separately** using optional extras:
69
+
70
+
For HuggingFace transformers models:
71
+
72
+
```bash
73
+
pip install "lm_eval[hf]"
74
+
```
75
+
76
+
For vLLM inference:
77
+
78
+
```bash
79
+
pip install "lm_eval[vllm]"
80
+
```
81
+
82
+
For API-based models (OpenAI, Anthropic, etc.):
83
+
84
+
```bash
85
+
pip install "lm_eval[api]"
86
+
```
87
+
88
+
Multiple backends can be installed together:
89
+
90
+
```bash
91
+
pip install "lm_eval[hf,vllm,api]"
92
+
```
93
+
94
+
A detailed table of all optional extras is available at the end of this document.
67
95
68
96
## Basic Usage
69
97
@@ -75,6 +103,9 @@ A list of supported tasks (or groupings of tasks) can be viewed with `lm-eval --
75
103
76
104
### Hugging Face `transformers`
77
105
106
+
> [!Important]
107
+
> To use the HuggingFace backend, first install: `pip install "lm_eval[hf]"`
108
+
78
109
To evaluate a model hosted on the [HuggingFace Hub](https://huggingface.co/models) (e.g. GPT-J-6B) on `hellaswag` you can use the following command (this assumes you are using a CUDA-compatible GPU):
79
110
80
111
```bash
@@ -307,9 +338,9 @@ lm_eval --model vllm \
307
338
--batch_size auto
308
339
```
309
340
310
-
To use vllm, do `pip install lm_eval[vllm]`. For a full list of supported vLLM configurations, please reference our [vLLM integration](https://github.com/EleutherAI/lm-evaluation-harness/blob/e74ec966556253fbe3d8ecba9de675c77c075bce/lm_eval/models/vllm_causallms.py) and the vLLM documentation.
341
+
To use vllm, do `pip install "lm_eval[vllm]"`. For a full list of supported vLLM configurations, please reference our [vLLM integration](https://github.com/EleutherAI/lm-evaluation-harness/blob/e74ec966556253fbe3d8ecba9de675c77c075bce/lm_eval/models/vllm_causallms.py) and the vLLM documentation.
311
342
312
-
vLLM occasionally differs in output from Huggingface. We treat Huggingface as the reference implementation, and provide a [script](./scripts/model_comparator.py) for checking the validity of vllm results against HF.
343
+
vLLM occasionally differs in output from Huggingface. We treat Huggingface as the reference implementation and provide a [script](./scripts/model_comparator.py) for checking the validity of vllm results against HF.
313
344
314
345
> [!Tip]
315
346
> For fastest performance, we recommend using `--batch_size auto` for vLLM whenever possible, to leverage its continuous batching functionality!
@@ -336,14 +367,17 @@ lm_eval --model sglang \
336
367
```
337
368
338
369
> [!Tip]
339
-
> When encountering out of memory (OOM) errors (especially for multiple-choice tasks), try these solutions:
370
+
> When encountering out-of-memory (OOM) errors (especially for multiple-choice tasks), try these solutions:
340
371
>
341
372
> 1. Use a manual `batch_size`, rather than `auto`.
342
373
> 2. Lower KV cache pool memory usage by adjusting `mem_fraction_static` - Add to your model arguments for example `--model_args pretrained=...,mem_fraction_static=0.7`.
> To use API-based models, first install: `pip install "lm_eval[api]"`
380
+
347
381
Our library also supports the evaluation of models served via several commercial APIs, and we hope to implement support for the most commonly used performant local/self-hosted inference servers.
348
382
349
383
To call a hosted model, use:
@@ -581,7 +615,7 @@ To get started with development, first clone the repository and install the dev
0 commit comments