Commit c56ed50
TinySemVer
Release: v0.6.0 [skip ci]
### Minor
- Add: Binary BMMA kernels for GPU (6a609a0)
- Add: Tensor Core intrinsic benchmarks (1bdb5df)
- Add: cuBLAS benchmarks (2f791fe)
- Add: Precompiled CUDA C++ kernels (c1a6f3e)
- Add: Using CUDA Driver API to JIT `.ptx` (82cb684)
- Add: PTX and `.cuh` kernels (824e473)
- Add: Sorting with `thrust` and `cub` (df3b2c1)
- Add: Thrust, CUB, CUDA sorting (551402d)
- Add: Thrust, CUB, CUDA sorting (8481114)
### Patch
- Make: Drop OpenBLAS (3c92c36)
- Fix: Use `f16` MMA (141d285)
- Fix: Lower PTX version for JIT (eff3854)
- Fix: Working PTX kernel (514db0f)
- Docs: Introduce Warp-Group-MMA on Hopper (400f294)
- Make: Build CUDA for multiple platforms (3283ab0)
- Fix: Avoid optimizing-out SASS code (986b8bc)
- Fix: Compiling `cuBLAS` calls (312409a)
- Make: Don't compile PTX (53202e6)
- Make: Silence NVCC warnings (a6cdc74)
- Fix: NVCC compilation issues (494e705)
- Make: Upgrade `fmt` for NVCC builds (88277bf)
- Fix: Ranges require `constexpr` on NVCC (c1d7b2f)
- Make: Switch to CUDA Toolkit for GPU libs (2589a40)
- Make: Options for CUDA & TBB in CMake (4d03c08)1 parent 2aa088d commit c56ed50
2 files changed
+2
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
5 | | - | |
| 5 | + | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
0 commit comments