Skip to content

Conversation

@mikek498
Copy link

  • Fixed field access in ggml_metal_kargs_mul_mm struct to use correct dimensions
  • Improved shared memory access patterns in tiled matrix multiplication
  • Added proper bounds checking for edge cases
  • Enhanced thread synchronization for better performance

Performance improvements on M1 Max:

  • Prompt processing (pp512): 437.30 → 5426.66 tokens/s (1140% increase!)
  • Token generation (tg128): 58.58 → 56.56 tokens/s (stable)

Build: eb39499 (5549)

Make sure to read the contributing guidelines before submitting a PR

- Fixed field access in ggml_metal_kargs_mul_mm struct to use correct dimensions
- Improved shared memory access patterns in tiled matrix multiplication
- Added proper bounds checking for edge cases
- Enhanced thread synchronization for better performance

Performance improvements on M1 Max:
- Prompt processing (pp512): 437.30 → 5426.66 tokens/s (1140% increase)
- Token generation (tg128): 58.58 → 56.56 tokens/s (stable)

Build: eb39499 (5549)
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels May 31, 2025
@mikek498 mikek498 closed this May 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant