Skip to content

Commit ff41a85

Browse files
authored
Bugfix/hf tokenizer gguf override (#3098)
* fix(hf-gguf): skip gguf_file if external tokenizer is provided * docs(readme): add instructions for evaluating GGUF models with Hugging Face backend
1 parent 8c1016c commit ff41a85

File tree

2 files changed

+23
-1
lines changed

2 files changed

+23
-1
lines changed

README.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,28 @@ lm_eval --model hf \
110110
> [!Note]
111111
> Just like you can provide a local path to `transformers.AutoModel`, you can also provide a local path to `lm_eval` via `--model_args pretrained=/path/to/model`
112112
113+
#### Evaluating GGUF Models
114+
115+
`lm-eval` supports evaluating models in GGUF format using the Hugging Face (`hf`) backend. This allows you to use quantized models compatible with `transformers`, `AutoModel`, and llama.cpp conversions.
116+
117+
To evaluate a GGUF model, pass the path to the directory containing the model weights, the `gguf_file`, and optionally a separate `tokenizer` path using the `--model_args` flag.
118+
119+
**🚨 Important Note:**
120+
If no separate tokenizer is provided, Hugging Face will attempt to reconstruct the tokenizer from the GGUF file — this can take **hours** or even hang indefinitely. Passing a separate tokenizer avoids this issue and can reduce tokenizer loading time from hours to seconds.
121+
122+
**✅ Recommended usage:**
123+
124+
```bash
125+
lm_eval --model hf \
126+
--model_args pretrained=/path/to/gguf_folder,gguf_file=model-name.gguf,tokenizer=/path/to/tokenizer \
127+
--tasks hellaswag \
128+
--device cuda:0 \
129+
--batch_size 8
130+
```
131+
132+
> [!Tip]
133+
> Ensure the tokenizer path points to a valid Hugging Face tokenizer directory (e.g., containing tokenizer_config.json, vocab.json, etc.).
134+
113135
#### Multi-GPU Evaluation with Hugging Face `accelerate`
114136

115137
We support three main ways of using Hugging Face's [accelerate 🚀](https://github.com/huggingface/accelerate) library for multi-GPU evaluation.

lm_eval/models/huggingface.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -727,7 +727,7 @@ def _create_tokenizer(
727727
}
728728

729729
# gguf format embeds tokenizer and is not compatible with hf tokenizer `use_fast` param
730-
if gguf_file is not None:
730+
if not tokenizer and gguf_file is not None:
731731
kwargs["gguf_file"] = gguf_file
732732
else:
733733
kwargs["use_fast"] = use_fast_tokenizer

0 commit comments

Comments
 (0)