Skip to content

Remove quantized_utils.cuh#3237

Open
zcbenz wants to merge 1 commit intoml-explore:mainfrom
zcbenz:one-less-utils-h
Open

Remove quantized_utils.cuh#3237
zcbenz wants to merge 1 commit intoml-explore:mainfrom
zcbenz:one-less-utils-h

Conversation

@zcbenz
Copy link
Collaborator

@zcbenz zcbenz commented Mar 10, 2026

We no longer really need a quantized_utils.cuh after removing the __nv_fp8 classes in #3212:

  • Move get_pack_factor/get_bytes_per_pack to backend/common/quantized.h: share the code between CPU and CUDA backends.
  • Move dispatch_groups/dispatch_bits to quantized/affine_quantize.cu: there are several QMM kernels in CUDA backend optimized for different quantizations so each kernel has their own version of dispatch_groups.
  • Move absmax_x2 to quantized/fp_quantize.cu: it is only used there.

@nastya236 nastya236 self-requested a review March 10, 2026 10:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant