You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add MMA (Matrix Multiply-Accumulate) tensor core support for AMD RDNA3
and RDNA4 GPUs by splitting AMD GPU handling into separate RDNA and CDNA
code paths with architecture-appropriate shapes.
RDNA1 and RDNA2 are explicitly blocked with compile-time constraints as
they have limited tensor core capabilities and require fallback
implementations not yet implemented.
This enables RDNA3 GPUs (RX 7000 series, W7900) and RDNA4 GPUs to use
their tensor cores for matrix operations.
0 commit comments