Skip to content

Research: imatrix on MLA DeepSeek/Kimi-K2 for attn_k_b and attn_v_b #651

@ubergarm

Description

@ubergarm

Previous existing literature and research

We've had some discussions on how to use imatrix on DeepSeek/Kimi-K2 models with MLA to make sure it applies to attn_k_b and attn_v_b tensors.

When using an imatrix to quantize these messages are expected:

====== llama_model_quantize_internal: did not find weights for token_embd.weight
====== llama_model_quantize_internal: did not find weights for blk.0.attn_kv_b.weight

This is fine as the token_embd does not use imatrix data, and the strategy is to set attn_kv_b always to Q8_0 as it is only used for PP assuming end user mode of -mla 3.

However the following messages not okay and mean there maybe was an issue with how imatrix data was collected:

====== llama_model_quantize_internal: did not find weights for blk.0.attn_k_b.weight
====== llama_model_quantize_internal: did not find weights for blk.0.attn_v_b.weight

Given these tensors are good to quantize to speed up TG, ideally there should be imatrix data there.

In the past I had mistakenly left off -mla 1 while creating imatrix and so was missing the data for attn_k_b and attn_v_b oops!

More recently with Kimi-K2 I did use -mla 1 but still was missing data for both tensors. One tensors seems because it is named attn_k_b.weight (reshaped). And attn_v_b does not appear in the verbose logs. For now I am just quantizing attn_kv_b as designed, but now also attn_k_b, and attn_v_b to Q8_0 on Kimi-K2-Instruct models to preserve perplexity at the cost of some TG speed.

See attached log files, too long to add into details fold:

imat-kimi-no-mla.log

imat-kimi-mla-1.log

A few things I can check would be:

1. Try it on DeepSeek-V2-Lite

So I just tried llama-imatrix on this model and it seems to use that attn_k_b.weight (reshaped) name and I don't ever see attn_v_b, though when actually quantizing it doesn't throw a message about missing attn_v_b.

# llama-quantize
[  11/ 431]                blk.0.attn_k_b.weight - [  128,  8192,     1,     1], type =   bf16, Using custom type q4_0 for tensor blk.0.attn_k_b.weight
====== llama_model_quantize_internal: did not find weights for blk.0.attn_k_b.weight
converting to q4_0 .. size =     2.00 MiB ->     0.56 MiB
[  12/ 431]                blk.0.attn_v_b.weight - [  512,  2048,     1,     1], type =   bf16, Using custom type q4_0 for tensor blk.0.attn_v_b.weight
converting to q4_0 .. size =     2.00 MiB ->     0.56 MiB
CUDA_VISIBLE_DEVICES="0" \
./build/bin/llama-imatrix \
    -mla 1 \
    --verbosity 2 \
    --layer-similarity \
    -m /mnt/raid/models/ubergarm/DeepSeek-V2-Lite-GGUF/DeepSeek-V2-Lite-64x1.6B-BF16.gguf \
    -f ubergarm-imatrix-calibration-corpus-v02.txt \
    -o /tmp/imatrix-tmp.dat \
    -ngl 99 \
    --ctx-size 512 \
    --threads 1

...

collect_imatrix[1]:     blk.20.ffn_down_shexp.weight, MUL_MAT,  2816 x   512, 0
collect_imatrix[1]:      blk.21.attn_kv_a_mqa.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:             blk.21.attn_q.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]: blk.21.attn_k_b.weight (reshaped), MUL_MAT,   128 x   512, 0
collect_imatrix[1]:        blk.21.attn_output.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:       blk.21.ffn_gate_inp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:      blk.21.ffn_gate_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:        blk.21.ffn_up_exps.weight, MUL_MAT_ID,  2048 x   512, 0
collect_imatrix[1]:      blk.21.ffn_down_exps.weight, MUL_MAT_ID,  1408 x   512, 0
collect_imatrix[1]:     blk.21.ffn_gate_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:       blk.21.ffn_up_shexp.weight, MUL_MAT,  2048 x   512, 0
collect_imatrix[1]:     blk.21.ffn_down_shexp.weight, MUL_MAT,  2816 x   512, 0
collect_imatrix[1]:      blk.22.attn_kv_a_mqa.weight, MUL_MAT,  2048 x   512, 0

We've discussed this topic across a number of discussions and PRs hah, here is most recent relevent comments:

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions