Add RDNA3+ quantized integer tensor core support

mcgrof · mcgrof · commit 558fe27cb02e · 2025-10-17T17:51:02.000-07:00
Add tensor core shape support for integer quantization on RDNA3+ GPUs.

RDNA3 introduced WMMA instructions that enable efficient quantized model
inference. This adds kernel-layer support for the quantized operations
enabled by previous stdlib commits.

Supported quantized operations (RDNA3+):
  - INT8/UINT8 with INT32 accumulation (16x16x16 shape)
  - UINT4 with INT32 accumulation (16x16x16 shape)

Implementation adds shape definitions in get_mma_shape() to route quantized
dtypes to correct WMMA intrinsics. No changes to FP8 paths - FP8 support for
RDNA4+ will be added separately once proper loading code exists.
diff --git a/max/kernels/src/layout/tensor_core.mojo b/max/kernels/src/layout/tensor_core.mojo
@@ -1421,6 +1421,12 @@ fn get_mma_shape[
                 return shape_16x16x16
             elif accum_type is DType.float32 and input_type.is_float8():
                 return shape_16x16x32
+            elif accum_type is DType.int32 and (
+                input_type is DType.int8 or input_type is DType.uint8
+            ):
+                return shape_16x16x16
+            elif accum_type is DType.int32 and (input_type is DType._uint4):
+                return shape_16x16x16
             else:
                 constrained[False, "Unsupported RDNA mma shape."]()
                 return shape_null