Bugfix/hf tokenizer gguf override (#3098)

ankush13r · web-flow · commit ff41a8563124 · 2025-07-03T16:33:45.000+05:00
* fix(hf-gguf): skip gguf_file if external tokenizer is provided

* docs(readme): add instructions for evaluating GGUF models with Hugging Face backend
diff --git a/README.md b/README.md
@@ -110,6 +110,28 @@ lm_eval --model hf \
 > [!Note]
 > Just like you can provide a local path to `transformers.AutoModel`, you can also provide a local path to `lm_eval` via `--model_args pretrained=/path/to/model`
 
+#### Evaluating GGUF Models
+
+`lm-eval` supports evaluating models in GGUF format using the Hugging Face (`hf`) backend. This allows you to use quantized models compatible with `transformers`, `AutoModel`, and llama.cpp conversions.
+
+To evaluate a GGUF model, pass the path to the directory containing the model weights, the `gguf_file`, and optionally a separate `tokenizer` path using the `--model_args` flag.
+
+**🚨 Important Note:**  
+If no separate tokenizer is provided, Hugging Face will attempt to reconstruct the tokenizer from the GGUF file — this can take **hours** or even hang indefinitely. Passing a separate tokenizer avoids this issue and can reduce tokenizer loading time from hours to seconds.
+
+**✅ Recommended usage:**
+
+```bash
+lm_eval --model hf \
+    --model_args pretrained=/path/to/gguf_folder,gguf_file=model-name.gguf,tokenizer=/path/to/tokenizer \
+    --tasks hellaswag \
+    --device cuda:0 \
+    --batch_size 8
+```
+
+> [!Tip]
+> Ensure the tokenizer path points to a valid Hugging Face tokenizer directory (e.g., containing tokenizer_config.json, vocab.json, etc.).
+
 #### Multi-GPU Evaluation with Hugging Face `accelerate`
 
 We support three main ways of using Hugging Face's [accelerate 🚀](https://github.com/huggingface/accelerate) library for multi-GPU evaluation.
diff --git a/lm_eval/models/huggingface.py b/lm_eval/models/huggingface.py
@@ -727,7 +727,7 @@ def _create_tokenizer(
         }
 
         # gguf format embeds tokenizer and is not compatible with hf tokenizer `use_fast` param
-        if gguf_file is not None:
+        if not tokenizer and gguf_file is not None:
             kwargs["gguf_file"] = gguf_file
         else:
             kwargs["use_fast"] = use_fast_tokenizer

Original file line number	Diff line number	Diff line change
`@@ -727,7 +727,7 @@ def _create_tokenizer(`
`727`	`727`	`}`
`728`	`728`
`729`	`729`	# gguf format embeds tokenizer and is not compatible with hf tokenizer `use_fast` param
`730`		`- if gguf_file is not None:`
	`730`	`+ if not tokenizer and gguf_file is not None:`
`731`	`731`	`kwargs["gguf_file"] = gguf_file`
`732`	`732`	`else:`
`733`	`733`	`kwargs["use_fast"] = use_fast_tokenizer`