OpenCL: MoE MXFP4 kernel optimizations #16037

shawngu-quic · 2025-09-16T21:31:53Z

This PR optimized MXFP4 MoE/non-MoE kernels for OpenCL.

- Q4_0 tranpose fix for Adreno - SOA support for mxfp4 & packed fp4->fp16 bit-wise convert function (replace lut) - Moe kernel optimizations & Clean up

- flatten mxfp4 and packed fp4->fp16 bit-wise convert function (replace lut) - MoE kernel optimizations --------- Co-authored-by: Li He <[email protected]>

shawngu-quic and others added 14 commits September 8, 2025 20:27

Q4_0 fix and MXFP4 optimizations

9710ef4

- Q4_0 tranpose fix for Adreno - SOA support for mxfp4 & packed fp4->fp16 bit-wise convert function (replace lut) - Moe kernel optimizations & Clean up

SOA support for non-MoE mxfp4 gemm

9280010

clean up

29b73d4

Keep GGML_OPENCL_SOA_Q default

76d3e84

opencl: clean up

374c3b7

opencl: fix kernel_restore_block_mxfp4

464ebeb

opencl: fix non adreno GPU

36676c0

opencl: recover broadcast semantic for mul_mv_mxfp4_f32_flat

7184682

opencl: use broadcast semantic for mul_mv_id_mxfp4_f32_flat

7a15e0e

opencl: fix ndst for mul_mv_mxfp4_f32_flat for adreno

7aa67ce

opencl: use original mxfp4 mv for structs

fe12b20

opencl: fix whitespace

b742329

opencl: fix whitespace

a69591a

opencl: fix size calculation when creating image1d_buffer_t

dbe0c3b

github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Sep 16, 2025

lhez requested review from lhez and max-krasnyansky September 16, 2025 21:32

opencl: fix unused variable

7eb7e0d

lhez approved these changes Sep 18, 2025

View reviewed changes

lhez merged commit 3edd87c into ggml-org:master Sep 18, 2025
48 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenCL: MoE MXFP4 kernel optimizations #16037

OpenCL: MoE MXFP4 kernel optimizations #16037

Uh oh!

shawngu-quic commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

OpenCL: MoE MXFP4 kernel optimizations #16037

OpenCL: MoE MXFP4 kernel optimizations #16037

Uh oh!

Conversation

shawngu-quic commented Sep 16, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants