Skip to content

Conversation

swetha097
Copy link

  • This implements the GGML_OP_GET_ROWS operation specifically for repacked (block interleaved) 4-bit quantized format (q4_Kx8)

  • The following gains were observed by the changes made in the PR - The changes allow for increased usage of the GEMM function (ggml_gemm_q4_K_8x8_q8_K) for q4_K type

image

master branch commit - 17bece1885047dcf338c557a49f393677e265a91

development (get_rows) branch commit - 70cf05ae71fbabb1f943e13e4d27f41f890197f7

image

The PR was tested in AMD Raphael 7600X which supports the following flags :

system_info: n_threads = 4 / 12 | WHISPER : COREML = 0 | OPENVINO = 0 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | LLAMAFILE = 1 | REPACK = 1 |

Model for performance tests Downloaded from : https://huggingface.co/ggerganov/whisper.cpp/blob/main/ggml-base.en.bin and quantized to q4_0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant