metal : improve F32, F16 and BF16 mat-vec multiplication #16057

ggerganov · 2025-09-17T17:58:51Z

Better threadgroup utilization for the floating-point mat-vec kernels:

Model	Test	t/s master	t/s gg/metal-mul-mv-opt-2	Speedup
qwen3 1.7B BF16	tg32	127.14	141.96	1.12
qwen3 1.7B F16	tg32	127.44	142.26	1.12
qwen3 1.7B all F32	tg32	83.96	86.97	1.04

Also some TG gains for MoE models of any quantization since they use F32 matrix multiplication in the FFN:

Model	Test	t/s master	t/s gg/metal-mul-mv-opt-2	Speedup
gpt-oss 20B MXFP4 MoE	tg32	128.57	132.93	1.03
qwen3moe 30B.A3B Q4_0	tg32	100.39	102.86	1.02

ggml-ci

) * metal : improve F32, F16 and BF16 mat-vec multiplication ggml-ci * metal : make the NSG a function constant in mul_mv kernels ggml-ci

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Sep 17, 2025

ggerganov added 2 commits September 18, 2025 10:57

metal : improve F32, F16 and BF16 mat-vec multiplication

320f029

ggml-ci

metal : make the NSG a function constant in mul_mv kernels

64c6dcb

ggml-ci

ggerganov force-pushed the gg/metal-mul-mv-opt-2 branch from 5fbb485 to 64c6dcb Compare September 18, 2025 08:32

ggerganov merged commit b213fce into master Sep 18, 2025
61 of 62 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

metal : improve F32, F16 and BF16 mat-vec multiplication #16057

metal : improve F32, F16 and BF16 mat-vec multiplication #16057

Uh oh!

ggerganov commented Sep 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

metal : improve F32, F16 and BF16 mat-vec multiplication #16057

metal : improve F32, F16 and BF16 mat-vec multiplication #16057

Uh oh!

Conversation

ggerganov commented Sep 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants