You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add tensor core shape support for integer quantization on RDNA3+ GPUs.
RDNA3 introduced WMMA instructions that enable efficient quantized model
inference. This adds kernel-layer support for the quantized operations
enabled by previous stdlib commits.
Supported quantized operations (RDNA3+):
- INT8/UINT8 with INT32 accumulation (16x16x16 shape)
- UINT4 with INT32 accumulation (16x16x16 shape)
Implementation adds shape definitions in get_mma_shape() to route quantized
dtypes to correct WMMA intrinsics. No changes to FP8 paths - FP8 support for
RDNA4+ will be added separately once proper loading code exists.
0 commit comments