opencl: transposed gemm/gemv moe kernel with mxfp4,f32 #16602

shawngu-quic · 2025-10-15T23:22:03Z

Added redesigned moe-mxfp4 kernels optimized for Adreno:

Added pref-transpose for experts, fused with SOA convert kernel.
Preprocess router table on CPU side.
Separated decoding and prefill new kernels for MoE-mxfp4.

Achieved large perf uplift for prefill, especially for long prompts.

max-krasnyansky

Nice bump in perf on Gen5 devices!

max-krasnyansky · 2025-10-17T21:57:05Z

@shawngu-quic can you please fix the EditorConfig checker.
https://github.com/ggml-org/llama.cpp/actions/runs/18604195140/job/53054739378?pr=16602

lhez · 2025-10-17T22:13:31Z

@shawngu-quic There are also some compilation warnings that have to be fixed.

ggml/src/ggml-opencl/ggml-opencl.cpp

* opencl: transposed gemm/gemv moe kernel with mxfp4,f32 * add restore kernel for moe transpose * fix trailing whitespaces * resolve compilation warnings

shawngu-quic requested review from lhez and max-krasnyansky as code owners October 15, 2025 23:22

github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Oct 15, 2025

shawngu-quic force-pushed the moe-mxfp4-trans-reorder branch 2 times, most recently from 0fb80c7 to 7e84cc9 Compare October 17, 2025 19:13

shawngu-quic added 2 commits October 17, 2025 13:32

opencl: transposed gemm/gemv moe kernel with mxfp4,f32

50da777

add restore kernel for moe transpose

61dedfa

shawngu-quic force-pushed the moe-mxfp4-trans-reorder branch from 7e84cc9 to 61dedfa Compare October 17, 2025 20:34

max-krasnyansky approved these changes Oct 17, 2025

View reviewed changes

fix trailing whitespaces

8c4b648

lhez reviewed Oct 17, 2025

View reviewed changes

ggml/src/ggml-opencl/ggml-opencl.cpp Show resolved Hide resolved

ggml/src/ggml-opencl/ggml-opencl.cpp Outdated Show resolved Hide resolved

resolve compilation warnings

0ccc262

max-krasnyansky merged commit 8138785 into ggml-org:master Oct 18, 2025
69 of 70 checks passed

lhez mentioned this pull request Oct 20, 2025

opencl: fix warnings and clean up profiling #16688

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

opencl: transposed gemm/gemv moe kernel with mxfp4,f32 #16602

opencl: transposed gemm/gemv moe kernel with mxfp4,f32 #16602

Uh oh!

shawngu-quic commented Oct 15, 2025

Uh oh!

max-krasnyansky left a comment

Uh oh!

max-krasnyansky commented Oct 17, 2025

Uh oh!

lhez commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

opencl: transposed gemm/gemv moe kernel with mxfp4,f32 #16602

opencl: transposed gemm/gemv moe kernel with mxfp4,f32 #16602

Uh oh!

Conversation

shawngu-quic commented Oct 15, 2025

Uh oh!

max-krasnyansky left a comment

Choose a reason for hiding this comment

Uh oh!

max-krasnyansky commented Oct 17, 2025

Uh oh!

lhez commented Oct 17, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants