metal: optimize matrix multiplication kernel #13941

mikek498 · 2025-05-31T07:48:35Z

Fixed field access in ggml_metal_kargs_mul_mm struct to use correct dimensions
Improved shared memory access patterns in tiled matrix multiplication
Added proper bounds checking for edge cases
Enhanced thread synchronization for better performance

Performance improvements on M1 Max:

Prompt processing (pp512): 437.30 → 5426.66 tokens/s (1140% increase!)
Token generation (tg128): 58.58 → 56.56 tokens/s (stable)

Build: eb39499 (5549)

Make sure to read the contributing guidelines before submitting a PR

- Fixed field access in ggml_metal_kargs_mul_mm struct to use correct dimensions - Improved shared memory access patterns in tiled matrix multiplication - Added proper bounds checking for edge cases - Enhanced thread synchronization for better performance Performance improvements on M1 Max: - Prompt processing (pp512): 437.30 → 5426.66 tokens/s (1140% increase) - Token generation (tg128): 58.58 → 56.56 tokens/s (stable) Build: eb39499 (5549)

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels May 31, 2025

mikek498 closed this May 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal: optimize matrix multiplication kernel #13941

metal: optimize matrix multiplication kernel #13941

Uh oh!

mikek498 commented May 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

metal: optimize matrix multiplication kernel #13941

metal: optimize matrix multiplication kernel #13941

Uh oh!

Conversation

mikek498 commented May 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant