Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Jul 9, 2025

target #14629

Fuse GGML_OP_ADD and GGML_OP_MUL

LLAMA_SET_ROWS=1 ./scripts/compare-commits.sh master gg/metal-fuse-add -m ./models/qwen3-30b-a3b/ggml-model-q8_0.gguf -m models/gemma-3-4b/ggml-model-q8_0.gguf -fa 1 -t 1
Model Test t/s master t/s gg/metal-fuse-add Speedup
gemma3 4B Q8_0 pp512 2444.84 2494.63 1.02
gemma3 4B Q8_0 tg128 90.39 96.76 1.07
qwen3moe 30B.A3B Q8_0 pp512 1362.92 1420.74 1.04
qwen3moe 30B.A3B Q8_0 tg128 70.12 76.68 1.09

Testing

make -j && GGML_METAL_FUSION_DEBUG=2 ./bin/test-backend-ops -o RMS_NORM_MUL_ADD -b Metal
Backend 1/3: Metal
  Device description: Apple M4 Max
  Device memory: 28753 MB (28747 MB free)

ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000000): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000001): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.000100): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=0.100000): OK
ggml_metal_encode_node: fuse: RMS_NORM + MUL + ADD
  RMS_NORM_MUL_ADD(type=f32,ne=[64,5,4,3],eps=1.000000): OK
  6543/6543 tests passed
  Backend Metal: OK
ggml_backend_metal_device_rel: fused ADD: 5
ggml_backend_metal_device_rel: fused MUL: 5

  • Disable with env variable
  • Print fuse stats
  • Fuse with norms, cpys, etc.
  • Cleaner kernel impl?

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Jul 9, 2025
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch 2 times, most recently from 23bc8a3 to b61796c Compare July 11, 2025 11:05
@ggerganov ggerganov changed the base branch from master to gg/graph-context-refactor July 11, 2025 11:05
@ggerganov ggerganov force-pushed the gg/graph-context-refactor branch 2 times, most recently from 5a220cc to bc0a20c Compare July 12, 2025 19:51
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch 3 times, most recently from 6e07c3e to 067d04a Compare July 13, 2025 19:11
@github-actions github-actions bot added the testing Everything test related label Jul 13, 2025
@ggerganov ggerganov marked this pull request as ready for review July 14, 2025 10:28
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch from fc3a162 to 474041f Compare July 14, 2025 10:35
@ggerganov ggerganov changed the title metal : fuse add metal : fuse add, mul Jul 14, 2025
@ggerganov ggerganov force-pushed the gg/graph-context-refactor branch 2 times, most recently from 20010c4 to ae2fb57 Compare July 18, 2025 05:00
Base automatically changed from gg/graph-context-refactor to master July 18, 2025 05:29
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch 2 times, most recently from 012fb71 to 04d0349 Compare July 18, 2025 11:39
@ggerganov ggerganov force-pushed the gg/metal-fuse-add branch from 04d0349 to effa72e Compare July 18, 2025 11:46
@ggerganov ggerganov merged commit bf9087f into master Jul 18, 2025
53 of 55 checks passed
@ggerganov ggerganov deleted the gg/metal-fuse-add branch July 18, 2025 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants