v2.4.1 Pack LayerNorm
What's Changed
- [Nsight] Add nsys/ncu usage, ptx/sass by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/44
- [DotProd][FP16] support f16x8_pack kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/45
- [LayerNorm][FP16] Add pack support for f16x8 LD/ST by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/46
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4...v2.4.1