opencl: add fused `rms_norm_mul` #14841

lhez · 2025-07-23T21:42:35Z

This PR adds fused rms_norm_mul following the pattern in #14800.

Qwen2.5-1.5B on Adreno 750

with fusion

model	size	params	backend	ngl	test	t/s
qwen2 1.5B Q4_0	828.59 MiB	1.54 B	OpenCL	99	pp512	252.30 ± 0.44
qwen2 1.5B Q4_0	828.59 MiB	1.54 B	OpenCL	99	tg128	24.10 ± 0.85

without fusion

model	size	params	backend	ngl	test	t/s
qwen2 1.5B Q4_0	828.59 MiB	1.54 B	OpenCL	99	pp512	252.02 ± 0.51
qwen2 1.5B Q4_0	828.59 MiB	1.54 B	OpenCL	99	tg128	22.94 ± 0.76

Qwen2.5-3B on Adreno 750

with fusion

model	size	params	backend	ngl	test	t/s
qwen2 3B Q4_0	1.62 GiB	3.09 B	OpenCL	99	pp512	136.21 ± 0.10
qwen2 3B Q4_0	1.62 GiB	3.09 B	OpenCL	99	tg128	21.41 ± 0.28

without fusion

model	size	params	backend	ngl	test	t/s
qwen2 3B Q4_0	1.62 GiB	3.09 B	OpenCL	99	pp512	135.85 ± 0.13
qwen2 3B Q4_0	1.62 GiB	3.09 B	OpenCL	99	tg128	19.39 ± 2.65

CISC · 2025-07-25T10:53:13Z

LGTM, but I can't test.

rmatif · 2025-07-25T14:16:00Z

Thank you! I was thinking about adding this one

LGTM. Results on Adreno 830:

model	size	params	backend	ngl	test	t/s
llama 1B F16	2.30 GiB	1.24 B	OpenCL	99	pp512	170.73 ± 0.16
llama 1B F16	2.30 GiB	1.24 B	OpenCL	99	tg128	22.64 ± 0.05

lhez added 2 commits July 23, 2025 11:31

opencl: add fused rms_norm + mul

4e524fd

opencl: improve workgroup size for rms_norm_mul

7b8568f

github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Jul 23, 2025

lhez marked this pull request as ready for review July 24, 2025 03:52

lhez requested review from CISC, max-krasnyansky and rmatif July 24, 2025 04:16

CISC removed their request for review July 25, 2025 10:53

rmatif approved these changes Jul 25, 2025

View reviewed changes

rmatif merged commit ce111d3 into ggml-org:master Jul 25, 2025
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

opencl: add fused `rms_norm_mul` #14841

opencl: add fused `rms_norm_mul` #14841

Uh oh!

lhez commented Jul 23, 2025

Uh oh!

CISC commented Jul 25, 2025

Uh oh!

rmatif commented Jul 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

opencl: add fused rms_norm_mul #14841

opencl: add fused rms_norm_mul #14841

Uh oh!

Conversation

lhez commented Jul 23, 2025

Qwen2.5-1.5B on Adreno 750

with fusion

without fusion

Qwen2.5-3B on Adreno 750

with fusion

without fusion

Uh oh!

CISC commented Jul 25, 2025

Uh oh!

rmatif commented Jul 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

opencl: add fused `rms_norm_mul` #14841

opencl: add fused `rms_norm_mul` #14841

rmatif commented Jul 25, 2025 •

edited

Loading