-
Notifications
You must be signed in to change notification settings - Fork 13.7k
convert: add dequant function for compressed_tensor (kimi-k2-thinking) #17064
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@ngxson still downloading the model but will test and report back! |
|
The output GGUF quantized to Q8_0 will be over 1 terabyte. Now I'm doubt is I even have enough memory to test it. |
|
over how much? :) I have ~1.1TB + 64G vram |
|
Output GGUF will be 1.09T |
|
Exciting, thanks for looking into this one y'all! Well, started off strong, but then died I'm on a CPU-only rig with 1.5TB RAM and plenty of disk space. But no GPUs. I have installed Also had to manualy push 👈 Details Command and full Logs$ numactl -N 1 -m 1 \
python \
convert_hf_to_gguf.py \
--outtype bf16 \
--split-max-size 50G \
--outfile /mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF \
/mnt/data/models/moonshotai/Kimi-K2-Thinking/
INFO:hf-to-gguf:Loading model: Kimi-K2-Thinking
WARNING:hf-to-gguf:Failed to load model config from /mnt/data/models/moonshotai/Kimi-K2-Thinking: The repository /mnt/data/models/moonshotai/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/data/models/moonshotai/Kimi-K2-Thinking .
You can inspect the repository content at https://hf.co//mnt/data/models/moonshotai/Kimi-K2-Thinking.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:Model architecture: DeepseekV3ForCausalLM
WARNING:hf-to-gguf:Failed to load model config from /mnt/data/models/moonshotai/Kimi-K2-Thinking: The repository /mnt/data/models/moonshotai/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/data/models/moonshotai/Kimi-K2-Thinking .
You can inspect the repository content at https://hf.co//mnt/data/models/moonshotai/Kimi-K2-Thinking.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.
WARNING:hf-to-gguf:Trying to load config.json instead
INFO:hf-to-gguf:gguf: loading model weight map from 'model.safetensors.index.json'
INFO:hf-to-gguf:gguf: indexing model part 'model-00001-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00002-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00003-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00004-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00005-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00006-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00007-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00008-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00009-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00010-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00011-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00012-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00013-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00014-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00015-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00016-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00017-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00018-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00019-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00020-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00021-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00022-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00023-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00024-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00025-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00026-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00027-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00028-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00029-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00030-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00031-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00032-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00033-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00034-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00035-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00036-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00037-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00038-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00039-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00040-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00041-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00042-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00043-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00044-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00045-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00046-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00047-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00048-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00049-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00050-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00051-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00052-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00053-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00054-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00055-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00056-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00057-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00058-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00059-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00060-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00061-of-000062.safetensors'
INFO:hf-to-gguf:gguf: indexing model part 'model-00062-of-000062.safetensors'
INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only
INFO:hf-to-gguf:Exporting model...
INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> BF16, shape = {18432, 7168}
INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> BF16, shape = {7168, 18432}
INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.0.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.0.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.0.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.0.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.0.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.0.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.0.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.0.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.1.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.1.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.1.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.1.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.1.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.1.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.1.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.1.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.1.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.1.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.1.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.2.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.2.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.2.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.2.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.2.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.2.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.2.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.2.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.2.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.2.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.2.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.2.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.3.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.3.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.3.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.3.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.3.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.3.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.3.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.3.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.3.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.3.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.3.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.3.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.3.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.4.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.4.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.4.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.4.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.4.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.4.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.4.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.4.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.4.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.4.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.4.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.4.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.4.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.5.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.5.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.5.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.5.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.5.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.5.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.5.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.5.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.5.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.5.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.5.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.5.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.6.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.6.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.6.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.6.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.6.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.6.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.6.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.6.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.6.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.6.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.6.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.6.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.6.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.7.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.7.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.7.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.7.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.7.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.7.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.7.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.7.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.7.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.7.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.7.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.7.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.7.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.8.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.8.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.8.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.8.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.8.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.8.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.8.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.8.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.8.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.8.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.8.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.8.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.9.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.9.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.9.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.9.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.9.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.9.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.9.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.9.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.9.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.9.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.9.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.9.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.9.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.10.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.10.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.10.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.10.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.10.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.10.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.10.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.10.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.10.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.10.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.10.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.10.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.11.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.11.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.11.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.11.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.11.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.11.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.11.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.11.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.11.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.11.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.11.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.11.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.11.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.12.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.12.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.12.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.12.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.12.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.12.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.12.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.12.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.12.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.12.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.12.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.12.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.13.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.13.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.13.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.13.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.13.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.13.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.13.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.13.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.13.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.13.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.13.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.13.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.13.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.14.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.14.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.14.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.14.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.14.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.14.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.14.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.14.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.14.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.14.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.14.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.14.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.15.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.15.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.15.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.15.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.15.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.15.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.15.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.15.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.15.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.15.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.15.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.15.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.15.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.16.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.16.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.16.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.16.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.16.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.16.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.16.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.16.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.16.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.16.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.16.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.16.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.16.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.16.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.16.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.17.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.17.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.17.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.17.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.17.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.17.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.17.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.17.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.17.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.17.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.17.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.17.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.17.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.17.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.17.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.18.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.18.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.18.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.18.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.18.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.18.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.18.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.18.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.18.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.18.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.18.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.18.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.18.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.18.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.18.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.19.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.19.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.19.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.19.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.19.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.19.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.19.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.19.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.19.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.19.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.19.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.19.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.19.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.19.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.19.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.20.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.20.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.20.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.20.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.20.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.20.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.20.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.20.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.20.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.20.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.20.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.20.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.20.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.20.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.20.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.21.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.21.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.21.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.21.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.21.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.21.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.21.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.21.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.21.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.21.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.21.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.21.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.21.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.21.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.21.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.22.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.22.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.22.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.22.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.22.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.22.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.22.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.22.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.22.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.22.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.22.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.22.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.22.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.22.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.22.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.23.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.23.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.23.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.23.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.23.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.23.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.23.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.23.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.23.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.23.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.23.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.23.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.23.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.23.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.23.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.24.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.24.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.24.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.24.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.24.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.24.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.24.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.24.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.24.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.24.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.24.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.24.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.24.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.24.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.24.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.25.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.25.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.25.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.25.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.25.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.25.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.25.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.25.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.25.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.25.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.25.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.25.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.25.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.25.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.25.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.26.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.26.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.26.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.26.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.26.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.26.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.26.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.26.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.26.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.26.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.26.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.26.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.26.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.26.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.26.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.27.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.27.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.27.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.27.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.27.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.27.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.27.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.27.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.27.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.27.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.27.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.27.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.27.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.27.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.27.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.28.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.28.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.28.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.28.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.28.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.28.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.28.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.28.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.28.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.28.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.28.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.28.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.28.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.28.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.28.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.29.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.29.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.29.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.29.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.29.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.29.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.29.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.29.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.29.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.29.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.29.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.29.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.29.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.29.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.29.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.30.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.30.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.30.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.30.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.30.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.30.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.30.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.30.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.30.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.30.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.30.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.30.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.30.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.30.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.30.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.31.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.31.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.31.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.31.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.31.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.31.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.31.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.31.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.31.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.31.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.31.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.31.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.31.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.31.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.31.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.32.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.32.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.32.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.32.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.32.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.32.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.32.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.32.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.32.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.32.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.32.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.32.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.32.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.32.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.32.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.33.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.33.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.33.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.33.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.33.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.33.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.33.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.33.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.33.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.33.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.33.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.33.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.33.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.33.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.33.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.34.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.34.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.34.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.34.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.34.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.34.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.34.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.34.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.34.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.34.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.34.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.34.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.34.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.34.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.34.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.35.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.35.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.35.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.35.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.35.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.35.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.35.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.35.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.35.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.35.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.35.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.35.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.35.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.35.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.35.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.36.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.36.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.36.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.36.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.36.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.36.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.36.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.36.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.36.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.36.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.36.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.36.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.36.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.36.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.36.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.37.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.37.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.37.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.37.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.37.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.37.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.37.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.37.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.37.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.37.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.37.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.37.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.37.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.37.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.37.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.38.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.38.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.38.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.38.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.38.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.38.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.38.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.38.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.38.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.38.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.38.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.38.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.38.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.38.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.38.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.39.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.39.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.39.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.39.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.39.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.39.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.39.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.39.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.39.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.39.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.39.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.39.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.39.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.39.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.39.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.40.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.40.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.40.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.40.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.40.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.40.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.40.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.40.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.40.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.40.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.40.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.40.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.40.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.40.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.40.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.41.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.41.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.41.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.41.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.41.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.41.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.41.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.41.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.41.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.41.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.41.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.41.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.41.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.41.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.41.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.42.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.42.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.42.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.42.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.42.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.42.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.42.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.42.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.42.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.42.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.42.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.42.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.42.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.42.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.42.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.43.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.43.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.43.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.43.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.43.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.43.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.43.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.43.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.43.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.43.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.43.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.43.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.43.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.43.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.43.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.44.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.44.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.44.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.44.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.44.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.44.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.44.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.44.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.44.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.44.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.44.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.44.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.44.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.44.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.44.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.45.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.45.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.45.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.45.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.45.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.45.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.45.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.45.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.45.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.45.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.45.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.45.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.45.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.45.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.45.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.46.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.46.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.46.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.46.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.46.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.46.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.46.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.46.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.46.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.46.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.46.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.46.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.46.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.46.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.46.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.47.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.47.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.47.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.47.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.47.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.47.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.47.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.47.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.47.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.47.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.47.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.47.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.47.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.47.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.47.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.48.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.48.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.48.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.48.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.48.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.48.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.48.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.48.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.48.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.48.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.48.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.48.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.48.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.48.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.48.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.49.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.49.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.49.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.49.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.49.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.49.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.49.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.49.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.49.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.49.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.49.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.49.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.49.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.49.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.49.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.50.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.50.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.50.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.50.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.50.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.50.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.50.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.50.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.50.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.50.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.50.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.50.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.50.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.50.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.50.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.51.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.51.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.51.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.51.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.51.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.51.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.51.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.51.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.51.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.51.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.51.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.51.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.51.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.51.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.51.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.52.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.52.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.52.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.52.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.52.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.52.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.52.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.52.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.52.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.52.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.52.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.52.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.52.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.52.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.52.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.53.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.53.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.53.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.53.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.53.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.53.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.53.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.53.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.53.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.53.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.53.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.53.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.53.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.53.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.53.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.54.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.54.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.54.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.54.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.54.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.54.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.54.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.54.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.54.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.54.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.54.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.54.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.54.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.54.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.54.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.55.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.55.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.55.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.55.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.55.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.55.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.55.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.55.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.55.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.55.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.55.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.55.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.55.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.55.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.55.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.56.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.56.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.56.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.56.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.56.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.56.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.56.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.56.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.56.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.56.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.56.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.56.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.56.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.56.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.56.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.57.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.57.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.57.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.57.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.57.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.57.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.57.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.57.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.57.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.57.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.57.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.57.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.57.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.57.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.57.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.58.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.58.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.58.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.58.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.58.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.58.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.58.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.58.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.58.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.58.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.58.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.58.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.58.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.58.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.58.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.59.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.59.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.59.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.59.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.59.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.59.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.59.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.59.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.59.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.59.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.59.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.59.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.59.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.59.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.59.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:blk.60.attn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.60.exp_probs_b.bias, torch.float32 --> F32, shape = {384}
INFO:hf-to-gguf:blk.60.ffn_gate_inp.weight, torch.bfloat16 --> F32, shape = {7168, 384}
INFO:hf-to-gguf:blk.60.ffn_down_shexp.weight, torch.bfloat16 --> BF16, shape = {2048, 7168}
INFO:hf-to-gguf:blk.60.ffn_gate_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.60.ffn_up_shexp.weight, torch.bfloat16 --> BF16, shape = {7168, 2048}
INFO:hf-to-gguf:blk.60.ffn_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.60.attn_kv_a_norm.weight, torch.bfloat16 --> F32, shape = {512}
INFO:hf-to-gguf:blk.60.attn_kv_a_mqa.weight, torch.bfloat16 --> BF16, shape = {7168, 576}
INFO:hf-to-gguf:blk.60.attn_k_b.weight, torch.bfloat16 --> BF16, shape = {128, 512, 64}
INFO:hf-to-gguf:blk.60.attn_v_b.weight, torch.bfloat16 --> BF16, shape = {512, 128, 64}
INFO:hf-to-gguf:blk.60.attn_output.weight, torch.bfloat16 --> BF16, shape = {8192, 7168}
INFO:hf-to-gguf:blk.60.attn_q_a_norm.weight, torch.bfloat16 --> F32, shape = {1536}
INFO:hf-to-gguf:blk.60.attn_q_a.weight, torch.bfloat16 --> BF16, shape = {7168, 1536}
INFO:hf-to-gguf:blk.60.attn_q_b.weight, torch.bfloat16 --> BF16, shape = {1536, 12288}
INFO:hf-to-gguf:output.weight, torch.bfloat16 --> BF16, shape = {7168, 163840}
INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> BF16, shape = {7168, 163840}
INFO:hf-to-gguf:output_norm.weight, torch.bfloat16 --> F32, shape = {7168}
INFO:hf-to-gguf:blk.1.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.1.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.1.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.2.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.2.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.3.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.3.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.4.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.4.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.4.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.5.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.5.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.5.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.6.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.6.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.6.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.7.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.7.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.7.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.8.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.8.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.8.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.9.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.9.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.9.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.10.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.10.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.10.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.11.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.11.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.11.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.12.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.12.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.12.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.13.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.13.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.13.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.14.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.14.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.14.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.15.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.15.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.15.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.16.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.16.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.16.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.17.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.17.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.17.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.18.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.18.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.18.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.19.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.19.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.19.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.20.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.20.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.20.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.21.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.21.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.21.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.22.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.22.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.22.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.23.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.23.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.23.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.24.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.24.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.24.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.25.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.25.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.25.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.26.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.26.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.26.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.27.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.27.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.27.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.28.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.28.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.28.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.29.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.29.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.29.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.30.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.30.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.30.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.31.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.31.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.31.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.32.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.32.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.32.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.33.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.33.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.33.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.34.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.34.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.34.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.35.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.35.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.35.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.36.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.36.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.36.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.37.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.37.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.37.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.38.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.38.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.38.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.39.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.39.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.39.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.40.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.40.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.40.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.41.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.41.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.41.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.42.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.42.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.42.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.43.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.43.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.43.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.44.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.44.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.44.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.45.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.45.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.45.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.46.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.46.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.46.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.47.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.47.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.47.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.48.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.48.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.48.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.49.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.49.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.49.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.50.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.50.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.50.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.51.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.51.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.51.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.52.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.52.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.52.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.53.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.53.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.53.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.54.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.54.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.54.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.55.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.55.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.55.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.56.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.56.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.56.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.57.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.57.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.57.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.58.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.58.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.58.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.59.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.59.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.59.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.60.ffn_down_exps.weight, torch.float32 --> BF16, shape = {2048, 7168, 384}
INFO:hf-to-gguf:blk.60.ffn_gate_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:blk.60.ffn_up_exps.weight, torch.float32 --> BF16, shape = {7168, 2048, 384}
INFO:hf-to-gguf:Set meta model
INFO:hf-to-gguf:Set model parameters
INFO:hf-to-gguf:gguf: context length = 262144
INFO:hf-to-gguf:gguf: embedding length = 7168
INFO:hf-to-gguf:gguf: feed forward length = 18432
INFO:hf-to-gguf:gguf: head count = 64
INFO:hf-to-gguf:gguf: key-value head count = 1
INFO:hf-to-gguf:gguf: rope theta = 50000.0
INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05
INFO:hf-to-gguf:gguf: experts used count = 8
INFO:hf-to-gguf:gguf: expert groups count = 1
INFO:hf-to-gguf:gguf: expert groups used count = 1
INFO:hf-to-gguf:gguf: file type = 32
INFO:hf-to-gguf:Set model quantization version
INFO:hf-to-gguf:Set model tokenizer
The repository /mnt/data/models/moonshotai/Kimi-K2-Thinking contains custom code which must be executed to correctly load the model. You can inspect the repository content at /mnt/data/models/moonshotai/Kimi-K2-Thinking .
You can inspect the repository content at https://hf.co//mnt/data/models/moonshotai/Kimi-K2-Thinking.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
Do you wish to run the custom code? [y/N] INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:Reloaded tiktoken model from /mnt/data/models/moonshotai/Kimi-K2-Thinking/tiktoken.model
INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:Reloaded tiktoken model from /mnt/data/models/moonshotai/Kimi-K2-Thinking/tiktoken.model
INFO:transformers_modules.Kimi_hyphen_K2_hyphen_Thinking.tokenization_kimi:#words: 163842 - BOS ID: 163584 - EOS ID: 163585
INFO:gguf.vocab:Setting special token type bos to 163584
INFO:gguf.vocab:Setting special token type eos to 163586
INFO:gguf.vocab:Setting special token type pad to 163839
INFO:gguf.vocab:Setting chat_template to {%- macro render_content(msg) -%}
{%- set c = msg.get('content') -%}
{%- if c is string -%}
{{ c }}
{%- elif c is not none -%}
{% for content in c -%}
{% if content['type'] == 'image' or 'image' in content or 'image_url' in content -%}
<|media_start|>image<|media_content|><|media_pad|><|media_end|>
{% else -%}
{{ content['text'] }}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
{%- endmacro -%}
{% macro set_roles(message) -%}
{%- set role_name = message.get('name') or message['role'] -%}
{%- if message['role'] == 'user' -%}
<|im_user|>{{role_name}}<|im_middle|>
{%- elif message['role'] == 'assistant' -%}
<|im_assistant|>{{role_name}}<|im_middle|>
{%- else -%}
<|im_system|>{{role_name}}<|im_middle|>
{%- endif -%}
{%- endmacro -%}
{%- macro render_toolcalls(message) -%}
<|tool_calls_section_begin|>
{%- for tool_call in message['tool_calls'] -%}
{%- set formatted_id = tool_call['id'] -%}
<|tool_call_begin|>{{ formatted_id }}<|tool_call_argument_begin|>{% if tool_call['function']['arguments'] is string %}{{ tool_call['function']['arguments'] }}{% else %}{{ tool_call['function']['arguments'] | tojson }}{% endif %}<|tool_call_end|>
{%- endfor -%}
<|tool_calls_section_end|>
{%- endmacro -%}
{# Find last non-tool-call assisitant message #}
{%- set ns = namespace(last_non_tool_call_assistant_msg=-1) -%}
{%- for idx in range(messages|length-1, -1, -1) -%}
{%- if messages[idx]['role'] == 'assistant' and not messages[idx].get('tool_calls') -%}
{%- set ns.last_non_tool_call_assistant_msg = idx -%}
{%- break -%}
{%- endif -%}
{%- endfor -%}
{# split all messages into history & suffix, reasoning_content in suffix should be reserved.#}
{%- set hist_msgs = messages[:ns.last_non_tool_call_assistant_msg+1] -%}
{%- set suffix_msgs = messages[ns.last_non_tool_call_assistant_msg+1:] -%}
{%- if tools -%}
<|im_system|>tool_declare<|im_middle|>{{ tools | tojson(separators=(',', ':')) }}<|im_end|>
{%- endif -%}
{%- for message in hist_msgs -%}
{%- if loop.first and messages[0]['role'] != 'system' -%}
<|im_system|>system<|im_middle|>You are Kimi, an AI assistant created by Moonshot AI.<|im_end|>
{%- endif -%}
{{set_roles(message)}}
{%- if message['role'] == 'assistant' -%}
<think></think>{{render_content(message)}}
{%- if message.get('tool_calls') -%}
{{render_toolcalls(message)}}
{%- endif -%}
{%- elif message['role'] == 'tool' -%}
{%- set tool_call_id = message.tool_call_id -%}
## Return of {{ tool_call_id }}
{{render_content(message)}}
{%- elif message['content'] is not none -%}
{{render_content(message)}}
{%- endif -%}
<|im_end|>
{%- endfor -%}
{%- for message in suffix_msgs -%}
{{set_roles(message)}}
{%- if message['role'] == 'assistant' -%}
{%- set rc = message.get('reasoning_content', '') -%}
<think>{{rc}}</think>{{render_content(message)}}
{%- if message.get('tool_calls') -%}
{{render_toolcalls(message)}}
{%- endif -%}
{%- elif message['role'] == 'tool' -%}
{%- set tool_call_id = message.tool_call_id -%}
## Return of {{ tool_call_id }}
{{render_content(message)}}
{%- elif message['content'] is not none -%}
{{render_content(message)}}
{%- endif -%}
<|im_end|>
{%- endfor -%}
{%- if add_generation_prompt -%}
<|im_assistant|>assistant<|im_middle|>
{%- endif -%}
INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00001-of-00046.gguf: n_tensors = 918, total_size = 46.3G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00002-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00003-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00004-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00005-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00006-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00007-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00008-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00009-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00010-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00011-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00012-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00013-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00014-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00015-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00016-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00017-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00018-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00019-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00020-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00021-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00022-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00023-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00024-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00025-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00026-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00027-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00028-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00029-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00030-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00031-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00032-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00033-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00034-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00035-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00036-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00037-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00038-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00039-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00040-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00041-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00042-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00043-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00044-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00045-of-00046.gguf: n_tensors = 4, total_size = 45.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x14B-BF16-00046-of-00046.gguf: n_tensors = 2, total_size = 22.5G
Shard (0/46): 0.00byte [00:00, ?byte/s]
Writing: 1%| | 21.4G/2.05T [00:30<45:06, 751Mbyte/s]�[A
Shard (1/46): 51%|█████▏ | 23.8G/46.3G [00:33<00:32, 684Mbyte/s]
Writing: 1%| | 23.8G/2.05T [00:33<49:27, 684Mbyte/s]�[ATraceback (most recent call last):
File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 10314, in <module>
main()
File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 10308, in main
model_instance.write()
File "/home/w/projects/llama.cpp/convert_hf_to_gguf.py", line 634, in write
self.gguf_writer.write_tensors_to_file(progress=True)
File "/home/w/projects/llama.cpp/gguf-py/gguf/gguf_writer.py", line 456, in write_tensors_to_file
ti.tensor.tofile(fout)
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 220, in tofile
eager = LazyNumpyTensor.to_eager(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 179, in to_eager
return cls._recurse_apply(t, simple_to_eager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
return fn(o)
^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
_t._args = cls._recurse_apply(_t._args, simple_to_eager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
L.append(LazyBase._recurse_apply(item, fn))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
return fn(o)
^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
_t._args = cls._recurse_apply(_t._args, simple_to_eager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
L.append(LazyBase._recurse_apply(item, fn))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
return fn(o)
^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
_t._args = cls._recurse_apply(_t._args, simple_to_eager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
L.append(LazyBase._recurse_apply(item, fn))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
L.append(LazyBase._recurse_apply(item, fn))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
return fn(o)
^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 169, in simple_to_eager
_t._args = cls._recurse_apply(_t._args, simple_to_eager)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 100, in _recurse_apply
L.append(LazyBase._recurse_apply(item, fn))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 105, in _recurse_apply
return fn(o)
^^^^^
File "/home/w/projects/llama.cpp/gguf-py/gguf/lazy.py", line 170, in simple_to_eager
_t._data = _t._func(*_t._args, **_t._kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/wrappers.py", line 309, in _fn
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_compile.py", line 53, in inner
return disable_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/wrappers.py", line 149, in _fn
result = fn(**bound.arguments)
^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_refs/__init__.py", line 1139, in _ref
output = prim(a, b)
^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_refs/__init__.py", line 1746, in mul
return prims.mul(a, b)
^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_ops.py", line 841, in __call__
return self._op(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_library/fake_impl.py", line 109, in meta_kernel
return fake_impl_holder.kernel(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_library/utils.py", line 22, in __call__
return self.func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/library.py", line 1430, in inner
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_library/custom_ops.py", line 627, in fake_impl
return self._abstract_fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims/__init__.py", line 404, in _prim_elementwise_meta
utils.check_same_device(*args_, allow_cpu_scalar_tensors=True)
File "/home/w/projects/llama.cpp/venv/lib/python3.12/site-packages/torch/_prims_common/__init__.py", line 878, in check_same_device
raise RuntimeError(msg)
RuntimeError: Tensor on device cpu is not on the expected device meta!
Shard (1/46): 51%|█████▏ | 23.8G/46.3G [00:36<00:34, 655Mbyte/s]
Writing: 1%| | 23.8G/2.05T [00:36<51:36, 655Mbyte/s] |
This comment was marked as outdated.
This comment was marked as outdated.
|
Last commit should fix the error. I successfully converted the first layer of the model to GGUF. |
|
yes I'm trying another way: directly mapping the quantization to Q4_0. the only disadvantage is that this will downcast the scale bf16 to f16 |
| ".scales", | ||
| ) | ||
| ] | ||
| elif quant_method == "compressed-tensors": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to check for quant_config["format"] == "pack-quantized" near here instead of in dequant_compressed_tensors, because the compressed-tensors method has multiple formats which could technically be supported eventually (notably, float-quantized seems relatively similar to (but not quite like) the fp8 method).
|
Q8 same as for @ubergarm memory balooned |
| else: | ||
| unpacked = unpacked.to(weight.device) # is this needed? | ||
| for i in range(pack_factor): | ||
| unpacked[:, i::pack_factor] = (weight >> (num_bits * i)) & mask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lazy tensors don't handle __setitem__ correctly, I think (or it causes eager evaluation). That's because the function returns None and so the change tree can't really be updated with how it's currently implemented.
Prefer explicit concatenation instead if possible (like with torch.cat, torch.stack, etc.). (this should help with memory usage)
Alternatively, there are other ways to unpack without concatenation, like the broadcasting shifts done in gguf-py/gguf/quants.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm yeah I need to go offline in next few minutes. Feel free to push directly to this branch if you have any suggestions!
This reverts commit caf0e42.
|
Made a hack for repacking int4 to Q4_0, I pushed it in another branch: https://github.com/ngxson/llama.cpp/tree/xsn/convert_kimi_k2_quant_repack IMPORTANT: This requires deleting the
|
|
Running The output splits are missing the name maybe, which it was on this PR branch too psure: INFO:gguf.gguf_writer:Writing the following files:
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf: n_tensors = 99, total_size = 49.9G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00002-of-00013.gguf: n_tensors = 95, total_size = 49.2G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00003-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00004-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00005-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00006-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00007-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00008-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00009-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00010-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00011-of-00013.gguf: n_tensors = 90, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00012-of-00013.gguf: n_tensors = 89, total_size = 49.1G
INFO:gguf.gguf_writer:/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00013-of-00013.gguf: n_tensors = 3, total_size = 4.7G
Shard (3/13): 13%|█▎ | 6.34G/49.1G [00:08<00:55, 766Mbyte/s]
Writing: 18%|█▊ | 105G/595G [01:15<11:12, 727Mbyte/s]Regarding casting bf16 -> f16 for the block scales, i added a quick Have to go for now to play DND, will check later. If this finishes I'll try to generate imatrix and see how the numbers look. Thanks for all the help! |
|
btw @ubergarm I've just pushed a small fix to the repack branch: ngxson@505f8be what I worry is that the packed layout of a fun story: I wrote the code to repack GPT-OSS to GGML's MXFP4, just a 2 days before its release. repacking nibble layout was a real pain |
|
I'm trying with your latest changes now |
|
Aye, it generates a roughly correct size looking output gguf, but got errors trying to start it up: edit to be clear I was using srv load_model: loading model '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf'
gguf_init_from_file_impl: tensor 'blk.1.ffn_gate_exps.weight' has offset 4165955584, expected 13678637056
gguf_init_from_file_impl: failed to read tensor data
llama_model_load: error loading model: llama_model_loader: failed to load model from /mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf', try reducing --n-gpu-layers if you're running out of VRAM
srv load_model: failed to load model, '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf'
srv operator(): operator(): cleaning up before exit...
main: exiting due to model loading errorFor funzies I tried to start it on ik's fork too with errors there too: llama_model_load: error loading model: tensor 'blk.5.ffn_down_exps.weight' data is not within the file bounds, model is corrupted or incomplete
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model '/mnt/data/models/ubergarm/Kimi-K2-Thinking-GGUF/-384x22B-BF16-00001-of-00013.gguf'
main : failed to initGood run though! My impression is there is really one main quant of this, q8_0 attn/shexp/first dense layer and q4_0 all routed experts. Maybe one could shrink the non routed experts a little bit, but historically they were best left q8_0 imo. So I hope DevQuasar and bartowski have better luck with the more recent PR! Gotta run tho 🫶 |
|
Similar as mentioned above (with the Q4 hack): |
|
Conversion succeeded, but when loaded it doesn't give coherent responses, just endlessly repeats tokens |
|
@bartowski1182 you haven't had memory ballooning issue? Or you've just have enough memory? |
|
I've got 768gb |
|
Closing this in favor of #17069 |
The block size table is in QK_K = 256
GGML_QUANT_SIZES: dict[GGMLQuantizationType, tuple[int, int]] = {
GGMLQuantizationType.F32: (1, 4),
GGMLQuantizationType.F16: (1, 2),
GGMLQuantizationType.Q4_0: (32, 2 + 16),
GGMLQuantizationType.Q4_1: (32, 2 + 2 + 16),
GGMLQuantizationType.Q5_0: (32, 2 + 4 + 16),
GGMLQuantizationType.Q5_1: (32, 2 + 2 + 4 + 16),
GGMLQuantizationType.Q8_0: (32, 2 + 32),
GGMLQuantizationType.Q8_1: (32, 4 + 4 + 32),
GGMLQuantizationType.Q2_K: (256, 2 + 2 + QK_K // 16 + QK_K // 4),
GGMLQuantizationType.Q3_K: (256, 2 + QK_K // 4 + QK_K // 8 + 12),
GGMLQuantizationType.Q4_K: (256, 2 + 2 + QK_K // 2 + 12),
GGMLQuantizationType.Q5_K: (256, 2 + 2 + QK_K // 2 + QK_K // 8 + 12),
GGMLQuantizationType.Q6_K: (256, 2 + QK_K // 2 + QK_K // 4 + QK_K // 16),
GGMLQuantizationType.Q8_K: (256, 4 + QK_K + QK_K // 8),
GGMLQuantizationType.IQ2_XXS: (256, 2 + QK_K // 4),
GGMLQuantizationType.IQ2_XS: (256, 2 + QK_K // 4 + QK_K // 32),
GGMLQuantizationType.IQ3_XXS: (256, 2 + QK_K // 4 + QK_K // 8),
GGMLQuantizationType.IQ1_S: (256, 2 + QK_K // 8 + QK_K // 16),
GGMLQuantizationType.IQ4_NL: (32, 2 + 16),
GGMLQuantizationType.IQ3_S: (256, 2 + QK_K // 4 + QK_K // 8 + QK_K // 32 + 4),
GGMLQuantizationType.IQ2_S: (256, 2 + QK_K // 4 + QK_K // 16),
GGMLQuantizationType.IQ4_XS: (256, 2 + 2 + QK_K // 2 + QK_K // 64),
GGMLQuantizationType.I8: (1, 1),
GGMLQuantizationType.I16: (1, 2),
GGMLQuantizationType.I32: (1, 4),
GGMLQuantizationType.I64: (1, 8),
GGMLQuantizationType.F64: (1, 8),
GGMLQuantizationType.IQ1_M: (256, QK_K // 8 + QK_K // 16 + QK_K // 32),
GGMLQuantizationType.BF16: (1, 2),
GGMLQuantizationType.TQ1_0: (256, 2 + 4 * 13),
GGMLQuantizationType.TQ2_0: (256, 2 + 64),
GGMLQuantizationType.MXFP4: (32, 1 + 16),
} |
Does that PR directly convert the |
|
If it doesn't, then looking at the source, it looks like if there is no // reference implementation for deterministic creation of model files
void quantize_row_q4_0_ref(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k) {
static const int qk = QK4_0;
assert(k % qk == 0);
const int nb = k / qk;
for (int i = 0; i < nb; i++) {
float amax = 0.0f; // absolute max
float max = 0.0f;
for (int j = 0; j < qk; j++) {
const float v = x[i*qk + j];
if (amax < fabsf(v)) {
amax = fabsf(v);
max = v;
}
}
const float d = max / -8;
const float id = d ? 1.0f/d : 0.0f;
y[i].d = GGML_FP32_TO_FP16(d);
for (int j = 0; j < qk/2; ++j) {
const float x0 = x[i*qk + 0 + j]*id;
const float x1 = x[i*qk + qk/2 + j]*id;
const uint8_t xi0 = MIN(15, (int8_t)(x0 + 8.5f));
const uint8_t xi1 = MIN(15, (int8_t)(x1 + 8.5f));
y[i].qs[j] = xi0;
y[i].qs[j] |= xi1 << 4;
}
}
}Can we prove this is going to convert back to the original values when we do a round trip via
What will happen in this case? I have a feeling the |
|
I think I've made it work with alternative way. Both Q3 and Q2 GGUF seems working: Experimental quants uploading (please allow some more time for the upload) here: Feel free to test the quants and the converter |
This will work, but not losslessly for the same reason as the other PR. The fundamental problem here is that the QAT-trained blocks of 32 nibbles might not take up the full range of values. If the original block has a range of It's easier to see if you look at 2bit version, eg: If the original quant only had the half-nibbles |
Is there any reasons it cannot take the full range? IIUC from the original compressed-tensor dequant code, it should take up the full int4 range |
|
hmm nevermind, I think I understand what you're saying now. Did you mean that it's possible that the training code prevents using |
Yeah, if we were just to use something like this to first quantise: // reference implementation for deterministic creation of model files
void quantize_row_q4_0_ref(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k) {
static const int qk = QK4_0;
assert(k % qk == 0);
const int nb = k / qk;
for (int i = 0; i < nb; i++) {
float amax = 0.0f; // absolute max
float max = 0.0f;
for (int j = 0; j < qk; j++) {
const float v = x[i*qk + j];
if (amax < fabsf(v)) {
amax = fabsf(v);
max = v;
}
}
const float d = max / -8;
const float id = d ? 1.0f/d : 0.0f;
y[i].d = GGML_FP32_TO_FP16(d);
for (int j = 0; j < qk/2; ++j) {
const float x0 = x[i*qk + 0 + j]*id;
const float x1 = x[i*qk + qk/2 + j]*id;
const uint8_t xi0 = MIN(15, (int8_t)(x0 + 8.5f));
const uint8_t xi1 = MIN(15, (int8_t)(x1 + 8.5f));
y[i].qs[j] = xi0;
y[i].qs[j] |= xi1 << 4;
}
}
}turn these back into const float d = max / -8;is implicitly assuming that there will be a lower value of But because the QAT was trained, it's quite likely that not every block of 32 will necessarily maintain a lower value of It's not something you can change after the QAT training (which likely used some form of regularisation term on the intervals and/or stochastic rounding), so the only way to maintain the full range for all blocks would be to adjust it during training (which would probably be really hard/awkward to do for "Adam-like" optimisers with extra the "memory" parameters, as you would have to keep adjusting these too). If it can be shown that somehow they have done this, and like the output of the The maximum relative error from converting from |
It's definitely worth testing if this is the case as it saves a lot of hassle if it is like this! I'm about another day away from getting the model at 4MB/s sadly 😦 |
|
I've found it is using a symmetric quant (assuming #17069 is working correctly): as can be seen by the mirrored positive and negative What seems (very) strange is that there are also zero values here though, so it almost looks like it is a symmetric quant with two 4bit values that map to zero (which seems very odd IMO?). If this is the case, then there is a direct/lossless conversion to llama.cpp/ggml/src/ggml-quants.c Line 55 in 655cddd
void quantize_row_q4_0_ref(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k) {
static const int qk = QK4_0;
assert(k % qk == 0);
const int nb = k / qk;
for (int i = 0; i < nb; i++) {
float amax = 0.0f; // absolute max
float max = 0.0f;
for (int j = 0; j < qk; j++) {
const float v = x[i*qk + j];
if (amax < fabsf(v)) {
amax = fabsf(v);
max = v;
}
}
const float d = max / -8;
const float id = d ? 1.0f/d : 0.0f;
y[i].d = GGML_FP32_TO_FP16(d);
for (int j = 0; j < qk/2; ++j) {
const float x0 = x[i*qk + 0 + j]*id;
const float x1 = x[i*qk + qk/2 + j]*id;
const uint8_t xi0 = MIN(15, (int8_t)(x0 + 8.5f));
const uint8_t xi1 = MIN(15, (int8_t)(x1 + 8.5f));
y[i].qs[j] = xi0;
y[i].qs[j] |= xi1 << 4;
}
}
}This seems to give the lowest error for all blocks I have tested so far (equivalent to and looking at this, I suspect this is quite close to lossless (and also seems to confirm the original quant might have used two zeros). The epsilon for bfloat16 is For anyone interested (or who wants to double check this!), here is my full hacked code I used: details// reference implementation for deterministic creation of model files
/*
void quantize_row_q4_0_ref(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k) {
static const int qk = QK4_0;
assert(k % qk == 0);
const int nb = k / qk;
for (int i = 0; i < nb; i++) {
float amax = 0.0f; // absolute max
float max = 0.0f;
for (int j = 0; j < qk; j++) {
const float v = x[i*qk + j];
if (amax < fabsf(v)) {
amax = fabsf(v);
max = v;
}
}
const float d = max / -8;
const float id = d ? 1.0f/d : 0.0f;
y[i].d = GGML_FP32_TO_FP16(d);
for (int j = 0; j < qk/2; ++j) {
const float x0 = x[i*qk + 0 + j]*id;
const float x1 = x[i*qk + qk/2 + j]*id;
const uint8_t xi0 = MIN(15, (int8_t)(x0 + 8.5f));
const uint8_t xi1 = MIN(15, (int8_t)(x1 + 8.5f));
y[i].qs[j] = xi0;
y[i].qs[j] |= xi1 << 4;
}
}
}
*/
static void quantize_q4_0_block(const float * GGML_RESTRICT x, int i, int qk, float d, block_q4_0* out) {
const float id = d ? 1.0f/d : 0.0f;
out->d = GGML_FP32_TO_FP16(d);
for (int j = 0; j < qk/2; ++j) {
const float x0 = x[i*qk + 0 + j]*id;
const float x1 = x[i*qk + qk/2 + j]*id;
const uint8_t xi0 = MIN(15, (int8_t)(x0 + 8.5f));
const uint8_t xi1 = MIN(15, (int8_t)(x1 + 8.5f));
out->qs[j] = xi0;
out->qs[j] |= xi1 << 4;
}
}
static void dequantize_q4_0_block(const block_q4_0* block, float* dequant, int qk) {
dequantize_row_q4_0(block, dequant, qk);
}
static float measure_q4_0_error(const float * GGML_RESTRICT x, int i, int qk, const block_q4_0* block) {
float dequant[QK4_0];
dequantize_q4_0_block(block, dequant, qk);
float error = 0.0f;
for (int j = 0; j < qk; j++) {
error += fabsf(x[i*qk + j] - dequant[j]);
}
return error/qk;
}
static void print_q4_0_block_errors(const float * GGML_RESTRICT x, int i, int qk, const block_q4_0* block) {
float dequant[QK4_0];
dequantize_q4_0_block(block, dequant, qk);
printf(" Errors for each element:\n");
for (int j = 0; j < qk; j++) {
float error = x[i*qk + j] - dequant[j];
printf(" [%d] original=%.6f, dequant=%.6f, error=%.6f\n", j, x[i*qk + j], dequant[j], error);
}
}
void quantize_row_q4_0_ref(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k) {
static const int qk = QK4_0;
assert(k % qk == 0);
const int nb = k / qk;
for (int i = 0; i < nb; i++) {
float amax = 0.0f; // absolute max
float max = 0.0f;
for (int j = 0; j < qk; j++) {
const float v = x[i*qk + j];
if (amax < fabsf(v)) {
amax = fabsf(v);
max = v;
}
}
float best_error = FLT_MAX;
int best_lattice_offset = 0;
block_q4_0 best_block;
for (int lattice_offset = 0; lattice_offset <= 7; lattice_offset++) {
block_q4_0 temp_block;
const float d = max / -(8 - lattice_offset);
quantize_q4_0_block(x, i, qk, d, &temp_block);
float error = measure_q4_0_error(x, i, qk, &temp_block);
//printf("- Block %d: error=%.6f, lattice_offset=%d\n", i, error, lattice_offset);
if (error < best_error) {
best_error = error;
best_lattice_offset = lattice_offset;
best_block = temp_block;
}
}
if (best_error > 0.000001 && best_lattice_offset != 1) {
printf("Block %d: best_error=%.6f, best_lattice_offset=%d\n", i, best_error, best_lattice_offset);
print_q4_0_block_errors(x, i, qk, &best_block);
}
y[i] = best_block;
}
}I will run this using the |
|
Yeah, it seems that with "quantization_config": {
"config_groups": {
"group_0": {
"input_activations": null,
"output_activations": null,
"targets": [
"Linear"
],
"weights": {
"actorder": null,
"block_structure": null,
"dynamic": false,
"group_size": 32,
"num_bits": 4,
"observer": "minmax",
"observer_kwargs": {},
"strategy": "group",
"symmetric": true,
"type": "int"
}
}
},https://huggingface.co/moonshotai/Kimi-K2-Thinking/blob/main/config.json It really does create two zero bits: # 1. Generate scale and zero-point
if quantization_args.symmetric:
max_val_pos = torch.max(torch.abs(min_vals), torch.abs(max_vals))
scales = max_val_pos / (float(bit_range) / 2)
zero_points = torch.zeros(scales.shape, device=device, dtype=min_vals.dtype)So this is remarkably lucky for the |
|
I am a little suspicious of this now though as I'm about 1/2 way through the tensors for This looks suspiciously like they have taken their QAT floating point values and passed them into that If this is the case then it brings up a couple of worrying points:
|
|
Hopefully they will reply: https://huggingface.co/moonshotai/Kimi-K2-Thinking/discussions/26 There could be some good reason for using two zeros to do with the gradients during QAT, but it seems a little odd and unexpected to me still... |
llama.cpp/ggml/src/ggml-quants.c Line 55 in 655cddd
~/llama.cpp/build/bin/llama-quantize \
--tensor-type attn_kv_a_mqa=q8_0 \
--tensor-type attn_k_b=q8_0 \
--tensor-type attn_v_b=q8_0 \
--tensor-type _exps=q4_0 \
Kimi-K2-Thinking-BF16.gguf Kimi-K2-Thinking-Q4_X.gguf Q6_K 44~/llama.cpp/build/bin/llama-perplexity \
--model ./Kimi-K2-Thinking-Q4_X.gguf \
--n-gpu-layers 99 \
--numa distribute \
--threads "$(nproc)" \
--override-tensor exps=CPU \
--flash-attn 1 \
--no-op-offload \
--file ./wiki.test.raw(I'll leave it running and update this post later...) |
|
It might be worth somebody like @compilade who understands more about the inner workings of the quants to see if a similar hack can be applied to |
|
|
It seems we can do slightly better by using an iterative algorithm to refine the scale slightly: static void dequantize_q4_0_block(const block_q4_0* block, float* dequant, int qk) {
dequantize_row_q4_0(block, dequant, qk);
}
static float measure_q4_0_error(const float * GGML_RESTRICT x, int i, int qk, const block_q4_0* block) {
float dequant[QK4_0];
dequantize_q4_0_block(block, dequant, qk);
float error = 0.0f;
for (int j = 0; j < qk; j++) {
error += fabsf(x[i*qk + j] - dequant[j]);
}
return error/qk;
}
static void print_q4_0_block_errors(const float * GGML_RESTRICT x, int i, int qk, const block_q4_0* block) {
float dequant[QK4_0];
dequantize_q4_0_block(block, dequant, qk);
printf(" Errors for each element:\n");
for (int j = 0; j < qk; j++) {
float error = x[i*qk + j] - dequant[j];
printf(" [%d] original=%.6f, dequant=%.6f, error=%.6f\n", j, x[i*qk + j], dequant[j], error);
}
}
void quantize_row_q4_0_ref(const float * GGML_RESTRICT x, block_q4_0 * GGML_RESTRICT y, int64_t k) {
static const int qk = QK4_0;
assert(k % qk == 0);
const int nb = k / qk;
const int max_iter = 10; // note: seems to mostly converge after 1 iteration...
const float epsilon = 1e-6f; // note: float16_epsilon = 0.00097656, 0.00097656^2 = ~1e-6
for (int i = 0; i < nb; i++) {
// Find max absolute value for initialization
float amax = 0.0f;
for (int j = 0; j < qk; j++) {
const float v = fabsf(x[i*qk + j]);
if (v > amax) {
amax = v;
}
}
// Initialize scale for range -7 to +7 (15 quantization levels)
float s = amax / 7.0f;
if (s == 0.0f) {
s = 1.0f;
}
int8_t q[QK4_0];
for (int iter = 0; iter < max_iter; iter++) {
const float s_old = s;
// Step 1: Assignment - quantize each element to range -7 to +7
for (int j = 0; j < qk; j++) {
const float q_float = x[i*qk + j] / s;
int8_t q_int = (int8_t)roundf(q_float);
// Clip to range -7 to +7 (kimi-k2-thinking constraint)
if (q_int < -7) q_int = -7;
if (q_int > 7) q_int = 7;
q[j] = q_int;
}
// Step 2: Update scale
float numerator = 0.0f;
float denominator = 0.0f;
for (int j = 0; j < qk; j++) {
numerator += x[i*qk + j] * q[j];
denominator += (float)(q[j] * q[j]);
}
if (denominator > 0.0f) {
const float s_new = numerator / denominator;
if (s_new > 0.0f) {
s = s_new;
}
}
// Check convergence
float delta_s = fabsf(s - s_old);
//printf("- Iter %i: delta_s=%.6f\n", iter, delta_s);
if (delta_s < epsilon) {
break;
}
}
// Store the scale
y[i].d = GGML_FP32_TO_FP16(s);
// Pack quantized values: map -7..+7 to stored values 1..15
// (stored value 0 is unused due to kimi-k2-thinking's ±7 constraint)
for (int j = 0; j < qk/2; j++) {
const uint8_t q0 = (uint8_t)(q[j] + 8);
const uint8_t q1 = (uint8_t)(q[qk/2 + j] + 8);
y[i].qs[j] = q0 | (q1 << 4);
}
float error = measure_q4_0_error(x, i, qk, &y[i]);
if (error > 1e-3f) { // note: float16_epsilon = 0.00097656 = ~1e-3
printf("- Block %d: error=%.6f\n", i, error);
print_q4_0_block_errors(x, i, qk, &y[i]);
}
}
}I think this is called the "Lloyd-Max " algorithm, although not sure as that seems to be for general split points. It's a fair bit slower so will be tomorrow by the time I have the results for this. |
this seems really good to me and very close to the "full Q8_0" that I measured while making: https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF it is better than the unpatched approach i started with: original safetensors -> bf16 w/ PR#17069 (same as you did) -> Q8_0-Q4_0 ~544GB model:
my perplexity command on ik matches yours including testing sha1sum of wiki.test.raw corpus and we both use default 512 context as is the convention. $ wget https://huggingface.co/datasets/ikawrakow/validation-datasets-for-llama.cpp/resolve/main/wiki.test.raw.gz
$ gunzip wiki.test.raw.gz
$ du -h wiki.test.raw
1.3M wiki.test.raw
$ sha1sum wiki.test.raw
6f1fe2054a940eebfc76b284b09680763b37f5ea wiki.test.raw
$ numactl -N ${SOCKET} -m ${SOCKET} \
./build/bin/llama-perplexity \
-m "$model" \
-f wiki.test.raw \
-mla 3 \
--ctx-size 512 \
-ub 4096 -b 4096 \
--numa numactl \
--threads 96 \
--threads-batch 128 \
--no-mmapso your patch seems promising to unlock more of the QAT quality potential it seems |
|
@ubergarm I would hold off releasing any versions using this just yet, as @compilade is going to test the raw safetensors files to make sure there really are only 15 out of 16 bit combinations getting used... If this is wrong then we can likely get a (much) better perplexity due to the most extreme/important bit (in terms of least-square error criteria) getting rounded down. |
|
Thanks! I was able to at least confirm that using your one line patch improves perplexity to match the full Q8_0 i tested:
- const float d = max / -8;
+ const float d = max / -7; |
This didn't gain anything worthwhile: |
compilade provided a script and running against the original moonshotai safetensors it looks like there isn't any other block-wise absmax than 7. full log of that script is available at https://ubergarm.com/images/Kimi-K2-Thinking-safetensors-ranges.zip ~3.5MB zip and almost 20MB log file including some histogram data. And given your iterative algorithm didn't give anything worthwhile, I've uploaded the Thanks and curious to hear what moonshotai says on your https://huggingface.co/moonshotai/Kimi-K2-Thinking/discussions/26 Cheers! |
|
I'm still hoping to get a version for
Not sure if I will get round to it today, but will hopefully test the stock |
|
I still can't work out how the existing // DEFINE THIS FOR KIMI-K2-THINKING INIT/CLIP LOGIC
#define IS_KIMI
// Helper: set scale and min in packed format
static inline void set_scale_min_k4(int j, uint8_t * GGML_RESTRICT q, uint8_t d, uint8_t m) {
assert(d < 64 && m < 64);
if (j < 4) {
q[j] = (q[j] & 0xC0) | (d & 0x3F);
q[j + 4] = (q[j + 4] & 0xC0) | (m & 0x3F);
} else {
const int j2 = j - 4;
q[j2] = (q[j2] & 0x3F) | ((d & 0x30) << 2);
q[j + 4] = (d & 0x0F) | ((m & 0x0F) << 4);
q[j] = (q[j] & 0x3F) | ((m & 0x30) << 2);
}
}
void quantize_row_q4_K_ref(const float * GGML_RESTRICT x, block_q4_K * GGML_RESTRICT y, int64_t k) {
assert(k % QK_K == 0);
#ifdef IS_KIMI
const int max_iter = 50; // note: takes more iterations to converge for this setup...
const float epsilon = 1e-7f; // note: use an even lower lambda as doesn't decrease as consistently...
#else
const int max_iter = 10;
const float epsilon = 1e-6f;
#endif
const int nb = k / QK_K;
const int num_subblocks = QK_K / 32;
for (int i = 0; i < nb; i++) {
memset(y[i].scales, 0, K_SCALE_SIZE);
float scales[num_subblocks];
float mins[num_subblocks];
// Initialization: compute initial scales and mins per sub-block
for (int j = 0; j < num_subblocks; j++) {
float xmin = x[i*QK_K + j*32];
float xmax = x[i*QK_K + j*32];
for (int l = 1; l < 32; l++) {
const float v = x[i*QK_K + j*32 + l];
xmin = v < xmin ? v : xmin;
xmax = v > xmax ? v : xmax;
}
#ifdef IS_KIMI
scales[j] = (xmax - xmin) / 14.0f;
mins[j] = -7.0f * scales[j];
#else
scales[j] = (xmax - xmin) / 15.0f;
mins[j] = xmin;
#endif
if (scales[j] == 0.0f) scales[j] = 1.0f;
}
// Initialize super-block scales
float d = 0.0f;
float dmin_abs = 0.0f;
for (int j = 0; j < num_subblocks; j++) {
d = scales[j] > d ? scales[j] : d;
const float mins_abs = fabsf(mins[j]);
dmin_abs = mins_abs > dmin_abs ? mins_abs : dmin_abs;
}
d = d / 63.0f;
float dmin = dmin_abs / 63.0f;
if (d == 0.0f) d = 1.0f;
if (dmin == 0.0f) dmin = 1.0f;
// Quantize initial sub-block scales and mins
uint8_t sc[num_subblocks];
uint8_t m[num_subblocks];
for (int j = 0; j < num_subblocks; j++) {
sc[j] = (uint8_t)(nearest_int(scales[j] / d));
sc[j] = sc[j] > 63 ? 63 : sc[j];
const int m_int = nearest_int(mins[j] / dmin);
m[j] = (uint8_t)(m_int < 0 ? -m_int : m_int);
m[j] = m[j] > 63 ? 63 : m[j];
set_scale_min_k4(j, y[i].scales, sc[j], m[j]);
}
// Adjust dmin sign based on typical min values
float avg_min = 0.0f;
for (int j = 0; j < num_subblocks; j++) avg_min += mins[j];
avg_min /= num_subblocks;
if (avg_min > 0.0f) dmin = -dmin;
// Temporary storage for 4-bit codes
uint8_t q[QK_K];
// Lloyd-Max iteration
for (int iter = 0; iter < max_iter; iter++) {
const float d_old = d;
const float dmin_old = dmin;
// Step 1: Assignment - quantize to 4-bit codes
for (int j = 0; j < num_subblocks; j++) {
const float scale = d * sc[j];
const float offset = -dmin * m[j];
if (scale == 0.0f) {
for (int l = 0; l < 32; ++l) {
q[j*32 + l] = 0;
}
continue;
}
for (int l = 0; l < 32; l++) {
const float v = x[i*QK_K + j*32 + l];
const int q_int = nearest_int((v - offset) / scale);
#ifdef IS_KIMI
q[j*32 + l] = (uint8_t)(q_int < 0 ? 0 : (q_int > 14 ? 14 : q_int));
#else
q[j*32 + l] = (uint8_t)(q_int < 0 ? 0 : (q_int > 15 ? 15 : q_int));
#endif
}
}
// Step 2: Update sub-block scales and mins (2D least squares per sub-block)
for (int j = 0; j < num_subblocks; j++) {
float sum_x = 0.0f;
float sum_q = 0.0f;
float sum_xq = 0.0f;
float sum_qq = 0.0f;
for (int l = 0; l < 32; l++) {
const float xv = x[i*QK_K + j*32 + l];
const float qv = (float)q[j*32 + l];
sum_x += xv;
sum_q += qv;
sum_xq += xv * qv;
sum_qq += qv * qv;
}
const float n = 32.0f;
const float det = n * sum_qq - sum_q * sum_q;
if (det > 0.0f) {
const float a = (n * sum_xq - sum_x * sum_q) / det;
const float b = (sum_x - a * sum_q) / n;
if (a > 0.0f && d > 0.0f) {
const int sc_new = nearest_int(a / d);
sc[j] = (uint8_t)(sc_new < 0 ? 0 : (sc_new > 63 ? 63 : sc_new));
}
if (dmin != 0.0f) {
const int m_new = nearest_int(-b / dmin);
m[j] = (uint8_t)(m_new < 0 ? 0 : (m_new > 63 ? 63 : m_new));
}
set_scale_min_k4(j, y[i].scales, sc[j], m[j]);
}
}
// Step 3: Update super-block scales (2D least squares across all sub-blocks)
float A = 0.0f; // Σ(sc*q)²
float B = 0.0f; // Σ(m*sc*q)
float C = 0.0f; // Σm²
float X_d = 0.0f; // Σ(x*sc*q)
float X_m = 0.0f; // Σ(x*m)
for (int j = 0; j < num_subblocks; j++) {
float sum_sq = 0.0f;
float sum_q = 0.0f;
float sum_xq = 0.0f;
float sum_x = 0.0f;
for (int l = 0; l < 32; l++) {
const float xv = x[i*QK_K + j*32 + l];
const float qv = (float)q[j*32 + l];
sum_sq += qv * qv;
sum_q += qv;
sum_xq += xv * qv;
sum_x += xv;
}
const float sc_f = (float)sc[j];
const float m_f = (float)m[j];
A += sc_f * sc_f * sum_sq;
B += m_f * sc_f * sum_q;
C += m_f * m_f * 32.0f;
X_d += sc_f * sum_xq;
X_m += m_f * sum_x;
}
const float det = A * C - B * B;
if (det > 0.0f) {
const float d_new = (C * X_d - B * X_m) / det;
const float dmin_new = (B * X_d - A * X_m) / det;
if (d_new > 0.0f) {
d = d_new;
}
if (dmin_new != 0.0f) {
dmin = dmin_new;
}
}
// Check convergence
const float delta_d = fabsf(d - d_old);
const float delta_dmin = fabsf(dmin - dmin_old);
//printf("- Iter %i: delta_d=%.6f, delta_dmin=%.6f\n", iter, delta_d, delta_dmin);
if (delta_d < epsilon && delta_dmin < epsilon) {
break;
}
}
// Final assignment with converged parameters
for (int j = 0; j < num_subblocks; j++) {
const float scale = d * sc[j];
const float offset = -dmin * m[j];
for (int l = 0; l < 32; l++) {
const float v = x[i*QK_K + j*32 + l];
const int q_int = scale != 0.0f ? nearest_int((v - offset) / scale) : 0;
#ifdef IS_KIMI
q[j*32 + l] = (uint8_t)(q_int < 0 ? 0 : (q_int > 14 ? 14 : q_int));
#else
q[j*32 + l] = (uint8_t)(q_int < 0 ? 0 : (q_int > 15 ? 15 : q_int));
#endif
}
}
// Store final super-block scales
y[i].d = GGML_FP32_TO_FP16(d);
y[i].dmin = GGML_FP32_TO_FP16(dmin);
// Pack 4-bit quantized values (layout expected by dequant)
uint8_t *qs = y[i].qs;
for (int base = 0, out = 0; base < QK_K; base += 64, out += 32) {
for (int l = 0; l < 32; ++l) {
qs[out + l] = (q[base + l] & 0x0F) | ((q[base + 32 + l] & 0x0F) << 4);
}
}
/*
// Dequantize and check error
float y_dequant[QK_K];
dequantize_row_q4_K(&y[i], y_dequant, QK_K);
printf("Block %d errors:\n", i);
float sum_error = 0.0f;
float sum_abs_error = 0.0f;
for (int j = 0; j < QK_K; j++) {
const float error = y_dequant[j] - x[i*QK_K + j];
printf(" [%d] original=%.6f dequant=%.6f error=%.6f\n", j, x[i*QK_K + j], y_dequant[j], error);
sum_error += error;
sum_abs_error += fabsf(error);
}
const float mean_error = sum_error / QK_K;
const float mean_abs_error = sum_abs_error / QK_K;
printf("- Mean error : %.6f\n", mean_error);
printf("- Mean absolute error: %.6f\n\n", mean_abs_error);
*/
}
}It's a bit of a hacky mess currently, but it's definitely producing much lower errors than the stock It's gonna take a long time to do as bumped the iterations as it doesn't seem to converge as smoothly as the stock Lloyd-Max version... I should have the results tomorrow or late tonight though. It is called the "Lloyd-Max" algorithm (even when the bins are equally spaced like this). The original papers are here: https://cs.nyu.edu/home/people/in_memoriam/roweis/csc2515-2006/readings/max60.pdf It looks like Lloyd reinvented it 20 years later (as he doesn't reference Max). |
The code above using
|
|
Found an even better initialisation now: #ifdef IS_KIMI
scales[j] = MAX(fabsf(xmin), fabsf(xmax)) / 7.0f;
mins[j] = -7.0f * scales[j];
#else
scales[j] = (xmax - xmin) / 15.0f;
mins[j] = xmin;
#endif |






Need help for testing this
Model: https://huggingface.co/moonshotai/Kimi-K2-Thinking