Releases · xlite-dev/LeetCUDA

14 Nov 11:50

DefTruth

v2.6

d53ab23

v2.6 Refactor 7/N

What's Changed

[HGEMM] Update NVIDIA L20/4090 Perf plots by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/126
[Blog]图解DeepSpeed-Ulysses&Megatron-LM TP/SP by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/127
[README] Add contents lists by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/128
[README] Update README by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/129
[README] Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/130
Bump up to v2.6 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/131

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.5...v2.6

Contributors

DefTruth

Assets 2

05 Nov 02:41

DefTruth

v2.5

a66cc2f

v2.5

What's Changed

[HGEMM] Update HGEMM README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/120
[HGEMM] Add plot tflops function by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/121
[HGEMM] Add NVIDIA RTX 3090 Laptop perf plot by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/122
[PERF] Update HGEMM benchmark scripts by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/123
[HGEMM] Add HGEMM L20/4090 benchmark figures by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/124
Bump up to v2.5 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/125

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.18...v2.5

Contributors

DefTruth

Assets 2

01 Nov 01:20

DefTruth

v2.4.18

28c12bd

v2.4.18

What's Changed

Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/115
[HGEMM] Update HGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/116
[HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/117
[README] Update HGEMM/SGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/118
[HGEMM] Add NVIDIA RTX 4090 benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/119

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.17...v2.4.18

Contributors

DefTruth

Assets 2

29 Oct 06:39

DefTruth

v2.4.17

a65f1f6

v2.4.17

What's Changed

[NMS] Add nms f32 cuda kernel. by @bear-zd in https://github.com/DefTruth/CUDA-Learn-Notes/pull/102
[HGEMM] Add some note to collective store by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/103
[HGEMM] Add HGEMM MMA Col Major Kernel by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/104
[HGEMM] Update HGEMM benchmark scripts by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/105
[HGEMM] Add Warp Swizzle as template param by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/106
[HGEMM] add -Xptxas -v compile flag by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/107
[HGEMM] Try reduce registers usage by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/108
[HGEMM] Update HGEMM MMA/WMMA Usage by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/109
[HGEMM][Docs] Add HGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/110
[HGEMM] Add M=N=K option for benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/111
[HGEMM] Update HGEMM/SGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/112
[README] Update HGEMM/SGEMM Supported matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/113
[Docs] Update HGEMM/SGEMM Supported Matrix by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/114

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.16...v2.4.17

Contributors

DefTruth and bear-zd

Assets 2

25 Oct 05:59

DefTruth

v2.4.16

6c89595

HGEMM Warp Swizzle/Reg Buffers

What's Changed

[HGEMM] HGEMM MMA with Reg Double Buffers by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/99
[HGEMM] ldmatrix.x4.trans with reg double buffers by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/100
[HGEMM] collective store via warp shfl&reg reuse by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/101

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.15...v2.4.16

Contributors

DefTruth

Assets 2

21 Oct 12:55

DefTruth

v2.4.15

a2934b9

HGEMM Up to 115 TFLOPS:L20

What's Changed

[HGEMM] Add MMA 16816 swizzle, Up to 115 TFLOPS by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/98

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.13...v2.4.15

Contributors

DefTruth

Assets 2

21 Oct 01:56

DefTruth

v2.4.13

0aeb450

HGEMM Up to 113 TFLOPS:L20

What's Changed

[Mat][Trans] Add f32/f32x4 row/col first kernel by @bear-zd in https://github.com/DefTruth/CUDA-Learn-Notes/pull/89
[Docs][Contribute] Add How to contribute Notes by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/90
[HGEMM] optimize SMEM padding, up to 113 TFLOPS by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/92
[Mat][Trans] Add f32x4_shared/bcf row/col first kernel. by @bear-zd in https://github.com/DefTruth/CUDA-Learn-Notes/pull/91
[Docs] rename mat_transpose -> mat-transpose by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/93
[HGEMM] Add GeForce RTX 3080 Laptop benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/94
[HGEMM] update HGEMM benchmark option by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/95
[HGEMM] Refactor HGEMM WMMA 161616 kernels by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/96
[HGEMM] Update HGEMM WMMA Benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/97

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.12...v2.4.13

Contributors

DefTruth and bear-zd

Assets 2

17 Oct 02:24

DefTruth

v2.4.12

8c6922b

v2.4.12 SGEMM TF32 Swizzle

What's Changed

[SGEMM] SGEMM TF32 Thread Block Swizzle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/84
[HGEMM] mma4x4_warp4x4_stages with swizzle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/86
[SWISH] support Swish F32/F16 kernel by @wangzijian1010 in https://github.com/DefTruth/CUDA-Learn-Notes/pull/85
[SGEMM] Update SGEMM TF32 Benchmark by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/87

New Contributors

@wangzijian1010 made their first contribution in https://github.com/DefTruth/CUDA-Learn-Notes/pull/85

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.11...v2.4.12

Contributors

DefTruth and wangzijian1010

Assets 2

16 Oct 03:04

DefTruth

v2.4.11

bc3d78e

v2.4.11 HGEMM Block Swizzle

What's Changed

[Docs] Update README.md by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/81
[HEGMM] HGEMM WMMA Thread Block Swizzle by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/82
[HGEMM] make thread block swizzle stride as N/4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/83

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.10...v2.4.11

Contributors

DefTruth

Assets 2

15 Oct 02:04

DefTruth

v2.4.10

2906e78

v2.4.10 SGEMM TF32 Stage 2/3

What's Changed

[HGEMM] HGEMM WMMA Stage mma4x2+warp4x4 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/76
[SGEMM] Add SGEMM WMMA TF32 Stage2/3 by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/77
[SGEMM] Add cuBLAS SGEMM F32/TF32 baseline by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/78
[SGEMM] Add Kernel cudaFuncSetAttribute hint by @DefTruth in https://github.com/DefTruth/CUDA-Learn-Notes/pull/79
[RoPE] Add minimal RoPE f32/f32x4 pack impl by @bear-zd in https://github.com/DefTruth/CUDA-Learn-Notes/pull/80

Full Changelog: DefTruth/CUDA-Learn-Notes@v2.4.9...v2.4.10

Contributors

DefTruth and bear-zd

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Uh oh!

Releases: xlite-dev/LeetCUDA

v2.6 Refactor 7/N

What's Changed

Contributors

Uh oh!

v2.5

What's Changed

Contributors

Uh oh!

v2.4.18

What's Changed

Contributors

Uh oh!

v2.4.17

What's Changed

Contributors

Uh oh!

HGEMM Warp Swizzle/Reg Buffers

What's Changed

Contributors

Uh oh!

HGEMM Up to 115 TFLOPS:L20

What's Changed

Contributors

Uh oh!

HGEMM Up to 113 TFLOPS:L20

What's Changed

Contributors

Uh oh!

v2.4.12 SGEMM TF32 Swizzle

What's Changed

New Contributors

Contributors

Uh oh!

v2.4.11 HGEMM Block Swizzle

What's Changed

Contributors

Uh oh!

v2.4.10 SGEMM TF32 Stage 2/3

What's Changed

Contributors

Uh oh!