HGEMM Warp Swizzle/Reg Buffers
What's Changed
- [HGEMM] HGEMM MMA with Reg Double Buffers by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/99
- [HGEMM] ldmatrix.x4.trans with reg double buffers by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/100
- [HGEMM] collective store via warp shfl® reuse by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/101
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.15...v2.4.16