Skip to content

Conversation

@gcp
Copy link
Contributor

@gcp gcp commented Feb 22, 2025

Modeled after the CUDA implementations.

Because of the use of type4x4 I had no idea how to reuse the existing dequantize functions, so those are repeated here in float form.

Fixes issue #10976.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Feb 22, 2025
@gcp gcp force-pushed the cpy_metal_quants branch from a43f2fb to 9d00bc2 Compare February 22, 2025 00:08
@ggerganov ggerganov force-pushed the cpy_metal_quants branch 2 times, most recently from c642a56 to be1542e Compare February 22, 2025 09:51
@ggerganov
Copy link
Member

The dequantize functions return a group of 16 elements from a given block of quants. The short il argument specifies the index of the group. I.e il == 0 will return the first 16 elements, il == 1 will return the second 16 elements and so on. For quantizations with block size of 32, il = [0..1] while for quantizations with block size of 256, il = [0..15].

I pushed an implementation that uses the dequantize functions and also supports copy to F16 although the latter is not yet implemented on the CPU, so it's currently not tested.

metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <[email protected]>
@gcp gcp force-pushed the cpy_metal_quants branch from be1542e to bfc305a Compare February 23, 2025 18:03
@gcp
Copy link
Contributor Author

gcp commented Feb 23, 2025

All OK from my side.

@ggerganov ggerganov merged commit 58d07a8 into ggml-org:master Feb 25, 2025
47 checks passed
orca-zhang pushed a commit to orca-zhang/llama.cpp that referenced this pull request Feb 26, 2025
metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <[email protected]>
mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025
metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <[email protected]>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Mar 19, 2025
metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <[email protected]>
mostlyuseful pushed a commit to mostlyuseful/llama.cpp that referenced this pull request May 12, 2025
metal: use dequantize_q templates

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants