Skip to content

lm_eval | RuntimeError: expected mat1 and mat2 to have the same dtype, #170

@uygarkurt

Description

@uygarkurt

Hi. I'm trying to evaluate an HQQ quantized model using lm_eval using the following command:

lm_eval --model hf --model_args pretrained={dir_path},dtype=half,device_map=auto --tasks {tasks} --batch_size auto:8 --apply_chat_template True --num_fewshot 5 --output_path {dir_path}/lm_eval --gen_kwargs do_sample=True,temperature=0.6,max_gen_toks=16000

I get the following error. Any ideas? I tried to play with dtype when running the lm_eval, but nothing worked.

Running command: lm_eval --model hf --model_args pretrained=/home/uygar/q_experiments/models/quantized_models/Phi-4-reasoning-hqq-b1-g64,dtype=half,device_map=auto --tasks mmlu --batch_size auto:8 --apply_chat_template True --num_fewshot 5 --output_path /home/uygar/q_experiments/models/quantized_models/Phi-4-reasoning-hqq-b1-g64/lm_eval --gen_kwargs do_sample=True,temperature=0.6,max_gen_toks=16000
2025-11-17:21:43:36 INFO     [__main__:446] Selected Tasks: ['mmlu']
2025-11-17:21:43:36 INFO     [evaluator:202] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-11-17:21:43:36 WARNING  [evaluator:214] generation_kwargs: {'do_sample': True, 'temperature': 0.6, 'max_gen_toks': 16000} specified through cli, these settings will update set parameters in yaml tasks. Ensure 'do_sample=True' for non-greedy decoding!
2025-11-17:21:43:36 INFO     [evaluator:240] Initializing hf model, with arguments: {'pretrained': '/home/uygar/q_experiments/models/quantized_models/Phi-4-reasoning-hqq-b1-g64',
        'dtype': 'half', 'device_map': 'auto'}
2025-11-17:21:43:37 INFO     [models.huggingface:147] Using device 'cuda'
2025-11-17:21:43:37 INFO     [models.huggingface:420] Model parallel was set to False.
`torch_dtype` is deprecated! Use `dtype` instead!
/home/miniconda3/envs/hqq/lib/python3.12/site-packages/transformers/quantizers/auto.py:239: UserWarning: You passed `quantization_config` or equivalent parameters to `from_pretrained` but the model you're loading already has a `quantization_config` attribute. The `quantization_config` from the model will be used.
Running loglikelihood requests:   0%|               | 0/56168 [00:00<?, ?it/s]Passed argument batch_size = auto:8.0. Detecting largest batch size
Traceback (most recent call last):
  File "/home/miniconda3/envs/hqq/bin/lm_eval", line 7, in <module>
    sys.exit(cli_evaluate())
             ^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/__main__.py", line 455, in cli_evaluate
    results = evaluator.simple_evaluate(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/utils.py", line 456, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/evaluator.py", line 357, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/utils.py", line 456, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/evaluator.py", line 585, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/api/model.py", line 391, in loglikelihood
    return self._loglikelihood_tokens(new_reqs, disable_tqdm=disable_tqdm)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/models/huggingface.py", line 1177, in _loglikelihood_tokens
    for chunk in chunks:
                 ^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/models/utils.py", line 440, in get_batched
    yield from batch
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/models/utils.py", line 623, in get_chunks
    if len(arr) == (fn(i, _iter) if fn else n):
                    ^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/models/huggingface.py", line 1111, in _batch_scheduler
    self.batch_sizes[sched] = self._detect_batch_size(n_reordered_requests, pos)
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/models/huggingface.py", line 820, in _detect_batch_size
    batch_size = forward_batch()
                 ^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/accelerate/utils/memory.py", line 177, in decorator
    return function(batch_size, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/models/huggingface.py", line 812, in forward_batch
    self._model_call(test_batch, **call_kwargs),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/lm_eval/models/huggingface.py", line 949, in _model_call
    return self.model(inps).logits
           ^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/transformers/utils/generic.py", line 918, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 465, in forward
    outputs: BaseModelOutputWithPast = self.model(
                                       ^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/transformers/utils/generic.py", line 1064, in wrapper
    outputs = func(self, *args, **kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 401, in forward
    hidden_states = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/transformers/modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 263, in forward
    hidden_states, self_attn_weights = self.self_attn(
                                       ^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/transformers/utils/deprecation.py", line 172, in wrapped_func
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 176, in forward
    qkv = self.qkv_proj(hidden_states)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/hqq/core/quantize.py", line 889, in forward_pytorch_backprop
    return HQQMatmulNoCacheMul.apply(x, self.matmul, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/torch/autograd/function.py", line 581, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/hqq/core/quantize.py", line 325, in forward
    out = matmul(x, transpose=True)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/hqq/lib/python3.12/site-packages/hqq/core/quantize.py", line 882, in matmul
    return torch.matmul(x, weight.t() if (transpose) else weight)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != float
Running loglikelihood requests:   0%|               | 0/56168 [00:02<?, ?it/s]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions