v2.4.17
What's Changed
- [NMS] Add nms f32 cuda kernel. by @bear-zd in https://github.com/DefTruth/CUDA-Learn-Notes/pull/102
- [HGEMM] Add some note to collective store by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/103
- [HGEMM] Add HGEMM MMA Col Major Kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/104
- [HGEMM] Update HGEMM benchmark scripts by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/105
- [HGEMM] Add Warp Swizzle as template param by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/106
- [HGEMM] add -Xptxas -v compile flag by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/107
- [HGEMM] Try reduce registers usage by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/108
- [HGEMM] Update HGEMM MMA/WMMA Usage by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/109
- [HGEMM][Docs] Add HGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/110
- [HGEMM] Add M=N=K option for benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/111
- [HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/112
- [README] Update HGEMM/SGEMM Supported matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/113
- [Docs] Update HGEMM/SGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/114
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.16...v2.4.17