Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Sep 24, 2025

  • src1->type == GGML_TYPE_F16 (for use with IM2COL)
  • Remove the ne00 % 32 == 0 requirement
  • Compile-time bounds checks
  • Reduce shared memory in some cases

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Sep 24, 2025
@ggerganov ggerganov force-pushed the gg/metal-mul-mm-extend branch from ea92828 to af28459 Compare September 25, 2025 17:24
@github-actions github-actions bot added the testing Everything test related label Sep 26, 2025
@ggerganov ggerganov marked this pull request as ready for review September 26, 2025 13:32
@ggerganov ggerganov requested a review from slaren as a code owner September 26, 2025 13:32
@ggerganov ggerganov force-pushed the gg/metal-mul-mm-extend branch from acac821 to 989a348 Compare September 27, 2025 07:45
@ggerganov ggerganov merged commit 6a2c614 into master Sep 28, 2025
62 of 67 checks passed
@ggerganov ggerganov deleted the gg/metal-mul-mm-extend branch September 28, 2025 06:34
yael-works pushed a commit to yael-works/llama.cpp that referenced this pull request Oct 15, 2025
* metal : support mul_mm with src1->type == GGML_TYPE_F16

* metal : support mul_mm_id with src1->type == GGML_TYPE_F16

[no ci]

* metal : mul_mm support ne00 % 32 != 0

* metal : support mul_mm_id with ne00 % 32 != 0

* cont : remove unnecessary unrolls

* cont : simplify data loading

* metal : optimize mul_mm when output bounds checks are not needed
pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025
* metal : support mul_mm with src1->type == GGML_TYPE_F16

* metal : support mul_mm_id with src1->type == GGML_TYPE_F16

[no ci]

* metal : mul_mm support ne00 % 32 != 0

* metal : support mul_mm_id with ne00 % 32 != 0

* cont : remove unnecessary unrolls

* cont : simplify data loading

* metal : optimize mul_mm when output bounds checks are not needed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants