-
Notifications
You must be signed in to change notification settings - Fork 13.4k
ggml : repack block_iq4_nlx8 (AVX) #14904
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@Srihari-mcw Since you have access to AVX512, could you run this branch with an |
Sure, will check and get back on the same. Thanks |
e2661ed
to
d1788b7
Compare
Hi @ggerganov , we tested the model for perplexity with meta llama2 7B model quantized to 'IQ4_NL' and observed the following perplexity in AVX512 Machine (AMD Ryzen 5 7600X). The perplexity seem close enough
|
ggml-ci
d1788b7
to
0de01ed
Compare
This reverts commit 00f35d5.
Hello @ggerganov , I believe that this commit breaks CPU repacked q4_0 models when compiled with gcc/g++ and w64devkit. Oddly, the CI binaries produced with MSVC seem perfectly fine. I am able to reproduce this when building from the latest version of llama.cpp, getting a segmentation fault when running it. Model used: gemma-3-4b-it-Q4_0.gguf Let me know if you need more information. There might be some small modifications you need to make to get it to compile (such as #14953) or setting up CURL, but they are unrelated to this issue. Once built, running |
I can't reproduce on my Ryzen: gcc --version
gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
make -j && ./bin/llama-cli -hf ggml-org/gemma-3-4b-it-qat-GGUF -p "hello"
system_info: n_threads = 16 (n_threads_batch = 16) / 32 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 | |
Can you try using q4_0? I think -hf defaults to q4_k_m Edit: https://huggingface.co/unsloth/gemma-3-4b-it-GGUF/resolve/main/gemma-3-4b-it-Q4_0.gguf?download=true |
It is |
Alright, let me see if I can figure it out. It might be a Windows thing, as it happens for me on the model you linked too
|
Try to run it with Debug build to see if you hit any asserts. What CPU do you use? Does it have AVX512? |
Hi @ggerganov , I am using a laptop with a i9-13980HX CPU. I don't think it has AVX512 support.
I then obtain the backtrace.
I ran it again with
When run without gdb, the terminal output fails at the same place as my previous comment. Somehow when using gdb it seems to be truncated earlier (maybe just a flush issue?) |
On Occam's advice I have created a standalone issue for this #16479 |
Repack 8x
block_iq4_nl
intoblock_iq4_nlx8
+ add AVX implementationblock_q4_0x8
GEMV/GEMM implementation (the logic is the same, just the lookup table for nibbles -> bytes is different)UNUSED
macros (not exhaustive)TODOs:
__AVX512F__
path after the refactoring