-
Notifications
You must be signed in to change notification settings - Fork 155
Fixup kimi-k2 convert indentation #617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Still running, 8 hours later at 50%. There is Why do you need |
Thanks for running this long job and testing! Check here for some more info: #601 (comment) Based on that discussion I've changed my recipes a bit for Kimi and future deepseek models. |
Thanks, you already pointed to that PR. Looks like it's for imatrix. There is so much activity I'm having hard time keeping up 😅 |
Ooops, I'm so scattered sometimes! I've been trying to understand more clearly myself as well! tl;dr;given you use q8_0 for all attn in your quants, it probably doesn't matter to you much. haha... also i think you are the source of ramblingsThe reason I pointed at that comment again is this specific bit, regarding "Why do you need attn_kv_b anyway":
Also from #477 (reply in thread)
Also this discussion on MLA and comments: #354 (reply in thread) There is a little bit about it too in one of the original mainline MLA PRs by fairydreaming which is was not merged, but possibly a bit more similar to how it is done here psure: ggml-org/llama.cpp#11446 So all that to say my limited understanding of having the both the
But yeah as you use q8_0 for all of it, probably not a bit deal on your quants and also why mainline uses q8_0 for all that as compilades new imatrix/gguf stuff that properly handles those tensors is not yet merged. My latest recipes I've been leaving attn_kv_b at q8_0 now and only quantizing attn_k_b and attn_v_b. Unfortunately though, attn_k_b is not divisible by 256 so I'm stuck with q5_0 or iq4_nl. I hope this is somewhat accurate 😅 |
It is. Basically, you don't need to have the |
I did some direct comparisons a long while back, and there was a measurable (but small) impact on my system (and this was with q8_0 attn tensors which matches the size they are created at if not present). So I can say that when it comes to my system it matters enough to be measured. |
Fixup a copy-paste python indent bug on the convert_hf_to_gguf.py script for kimi-k2-instruct. Thanks @anikifoss for testing and if you have success let me know here to confirm this patch is good.
#612 (comment)