v2.4.17

DefTruth released this 29 Oct 06:39

· 355 commits to main since this release

a65f1f6

What's Changed

[NMS] Add nms f32 cuda kernel. by @bear-zd in https://github.com/DefTruth/CUDA-Learn-Notes/pull/102
[HGEMM] Add some note to collective store by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/103
[HGEMM] Add HGEMM MMA Col Major Kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/104
[HGEMM] Update HGEMM benchmark scripts by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/105
[HGEMM] Add Warp Swizzle as template param by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/106
[HGEMM] add -Xptxas -v compile flag by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/107
[HGEMM] Try reduce registers usage by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/108
[HGEMM] Update HGEMM MMA/WMMA Usage by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/109
[HGEMM][Docs] Add HGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/110
[HGEMM] Add M=N=K option for benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/111
[HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/112
[README] Update HGEMM/SGEMM Supported matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/113
[Docs] Update HGEMM/SGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/114

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.16...v2.4.17

Contributors

DefTruth and bear-zd

Assets 2