Skip to content

Bug: imatrix quantization failing for nvidia Nemotron 49B v1.5 #659

@erazortt

Description

@erazortt

What happened?

Trying to quantize the bf16 gguf and also using the imatrix file from here here: https://huggingface.co/bartowski/nvidia_Llama-3_3-Nemotron-Super-49B-v1_5-GGUF/tree/main

quantizing with the following command:
./llama-quantize --imatrix ../models/nvidia_Llama-3_3-Nemotron-Super-49B-v1_5-imatrix.gguf --allow-requantize --output-tensor-type q8_0 --token-embedding-type q8_0 ../models/nvidia_Llama-3_3-Nemotron-Super-49B-v1_5-bf16-00001-of-00003.gguf model-q4_k_l.gguf q4_k_m

fails with the error message:
load_imatrix: failed reading data for entry 1

quantization works when not using the imatrix flag

Name and Version

version: 1 (bb4c917)
built with MSVC 19.44.35213.0 for

What operating system are you seeing the problem on?

Windows

Relevant log output

load_imatrix: failed reading data for entry 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions