[CUDA] Merge fp_qmv into qmv by zcbenz · Pull Request #3239 · ml-explore/mlx

zcbenz · 2026-03-10T10:20:40Z

Use the QMV kernel for fp quantizations.

Did a simple benchmarking and it is about 9% faster on A100.

Details

import time
import mlx.core as mx

M,N,K = (1, 16384, 16384)

x = mx.random.normal(shape=(M, K), dtype=mx.float16)
w = mx.random.normal(shape=(N, K), dtype=mx.float16)

w_q, scales = mx.quantize(w, mode='mxfp4')
y = mx.quantized_matmul(x, w_q, scales, transpose=True, mode='mxfp4')
mx.eval(y)

def fun():
    y = mx.quantized_matmul(x, w_q, scales, transpose=True, mode='mxfp4')
    mx.eval(y)

for _ in range(100):
    fun()

iterations = 1000
tic = time.time()
for _ in range(iterations):
    fun()
toc = time.time()

s = toc - tic
gb = iterations * (x.nbytes + w_q.nbytes + scales.nbytes + y.nbytes) / 1e9

print("{:5.2f}".format(gb / s))

angeloskath

Perfect! Do you think we should test on Hopper and Blackwell as well?

[CUDA] Merge fp_qmv into qmv

f43c87c

angeloskath approved these changes Mar 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Merge fp_qmv into qmv#3239

[CUDA] Merge fp_qmv into qmv#3239
zcbenz wants to merge 1 commit intoml-explore:mainfrom
zcbenz:remove-fp-qmv

zcbenz commented Mar 10, 2026

Uh oh!

angeloskath left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zcbenz commented Mar 10, 2026

Uh oh!

angeloskath left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants