Releases: xlite-dev/LeetCUDA
Releases · xlite-dev/LeetCUDA
v2.3.1 f16x8 Pack Elementwise
What's Changed
- [FA2][Half] Add FA2 f16_mma_m16n8k16 kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/35
- [Refactor][7/N] CUDA Learn Notes refactor Part-7 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/36
- Clamped input range in Sigmoid kernel to prevent overflow by @Phoenix8215 in https://github.com/DefTruth/CUDA-Learn-Notes/pull/37
- [Sigmoid][F16] Add f16x8_pack kernel, boost 1.5x ~ by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/39
- [Elementwise][Half] support f16x8_pack kernel, boost 1.1x by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/40
- [FlashAttention] replace FLOAT4 with LDST128BITS macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/41
- [RELU][FP16] Add f16x8_pack kernel, boost 2.1x by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/42
New Contributors
- @Phoenix8215 made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/37
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.3...v2.3.1
v2.3 Refactor 6/N
What's Changed
- [Refactor][6/N] CUDA Learn Notes refactor Part-6 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/17
- [Refactor][5/N] CUDA Learn Notes refactor Part-6 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/18
- [LayerNorm][Half] support fp16x8 packed LayerNorm by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/19
- [Reduce][Half] add HALF2 & BFLOAT2 macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/21
- [RMSNorm][Half] support fp16x8 packed RMSNorm by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/22
- [Bugfix][Kernel] fixed some kernel blocks calculate errors by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/23
- [Elementwise][Half] support fp16x8 packed Elementwise by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/24
- [Elementwise][Half] support fp16x8 packed Elementwise by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/25
- [RELU][Half] support fp16x8 RELU kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/26
- [RMSNorm] support f16x8_f32 RMSNorm by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/28
- [RMSNorm][Kernel] Add FLOAT2/HALF2_VARIANCE macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/29
- [LayerNorm][Kernel] Add HALF2 SUM/SUB/VAR macro by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/30
- [HGEMM] Add slicked_k&t_8x8_sliced_k_f16x4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/31
- [HGEMV][Half] support hgemv k32/k128/f16 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/32
- [FlashAttention] Refactor flash_attn_1_fwd_f32 kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/33
- Bump up to v2.3 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/34
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.2...v2.3
v2.2 Refactor 5/N
What's Changed
- [Refactor][5/N] CUDA Learn Notes refactor Part-5 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/15
- Bump up to v2.2 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/16
Full Changelog: DefTruth/CUDA-Learn-Notes@2.1...v2.2
v2.1 Refactor 4/N Part-4
What's Changed
- [Refactor][4/N] CUDA Learn Notes refactor Part-4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/10
- [Refactor][4/N] CUDA Learn Notes refactor Part-4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/11
- [Refactor][4/N] CUDA Learn Notes refactor Part-4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/12
- [Refactor][4/N] CUDA Learn Notes refactor Part-4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/13
- [Refactor][4/N] CUDA Learn Notes refactor Part-4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/14
Full Changelog: DefTruth/CUDA-Learn-Notes@v2.0...2.1
v2.0 Refactor 4/N
Full Changelog: DefTruth/CUDA-Learn-Notes@v0.8...v2.0
v0.8
What's Changed
- Bump up to v0.8 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/9
Full Changelog: DefTruth/CUDA-Learn-Notes@v0.7...v0.8
CUDA Learn Note v0.7
Full Changelog: DefTruth/CUDA-Learn-Notes@v0.5...v0.6
What's Changed
- Bump up to v0.7 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/8
New Contributors
- @DefTruth made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/8
Full Changelog: DefTruth/CUDA-Learn-Notes@v0.6...v0.7
CUDA Learn Notes v0.5
Full Changelog: DefTruth/CUDA-Learn-Notes@v0.3...v0.5
v0.3 flash_attn-1 fwd f32
Full Changelog: v0.2...v0.3
CUDA Learn Note v0.2
Full Changelog: v0.1...v0.2