Skip to content

Conversation

@gcp
Copy link
Contributor

@gcp gcp commented Feb 20, 2025

Fixes #10976.

@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Feb 20, 2025
@ggerganov
Copy link
Member

Could you make a separate PR just for the Metal changes? Btw, I think the copy kernels could be implemented by reusing the dequantize_qX_X functions, likely with a single template + 4 instantiations. Would result in much smaller code change and allows to generalize in the future to other quantizations.

@gcp
Copy link
Contributor Author

gcp commented Feb 21, 2025

Btw, I think the copy kernels could be implemented by reusing the dequantize_qX_X functions, likely with a single template + 4 instantiations. Would result in much smaller code change and allows to generalize in the future to other quantizations.

Reusing the dequantize_qX_Y functions works, but doing it with templates is a bit tricky because dequantize_q8_0 swizzles its results differently than all the others (boo!). It would've been nice if this had just been a ggml_get_to_fp32_cuda call but that doesn't deal with the permutations the CPY code is expected to handle 😢

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Unsupported op "CPY" / Segmentation fault on Metal

2 participants