Remove quantized_utils.cuh by zcbenz · Pull Request #3237 · ml-explore/mlx

zcbenz · 2026-03-10T07:21:36Z

We no longer really need a quantized_utils.cuh after removing the __nv_fp8 classes in #3212:

Move get_pack_factor/get_bytes_per_pack to backend/common/quantized.h: share the code between CPU and CUDA backends.
Move dispatch_groups/dispatch_bits to quantized/affine_quantize.cu: there are several QMM kernels in CUDA backend optimized for different quantizations so each kernel has their own version of dispatch_groups.
Move absmax_x2 to quantized/fp_quantize.cu: it is only used there.

Remove quantized_utils.cuh

49365a0

nastya236 self-requested a review March 10, 2026 10:46

Provide feedback