v2.4.12 SGEMM TF32 Swizzle
What's Changed
- [SGEMM] SGEMM TF32 Thread Block Swizzle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/84
- [HGEMM] mma4x4_warp4x4_stages with swizzle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/86
- [SWISH] support Swish F32/F16 kernel by @wangzijian1010 in https://github.com/DefTruth/CUDA-Learn-Notes/pull/85
- [SGEMM] Update SGEMM TF32 Benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/87
New Contributors
- @wangzijian1010 made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/85
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.11...v2.4.12