Skip to content

Commit 15ed377

Browse files
author
TinySemVer
committed
Release: v0.8.0 [skip ci]
### Minor - Add: Warp-Group Binary MMA (d6daf3a) - Add: Larger `m64n256k8` WGMMA variant (3e3530e) - Add: Warp-Group Async kernels (6cc7e34) - Add: `f64` MMA PTX variant (ae450e5) - Add: CuTe draft (fdea727) - Add: CUTLASS placeholders (b1ab93d) - Add: Hopper `sm90a` PTX kernels (4bcf74a) ### Patch - Improve: `CUresult` error handling (d74d430) - Improve: Logging CUDA errors (953a696) - Fix: Synchronize TCs (494ba52) - Improve: Impossible `%tid` condition against NVCC (8a9c9c5) - Make: Temporarily block CUTLASS (df1b39c) - Improve: Cleaner PTX code (71dea0c) - Improve: Avoid NVCC-specific features (3d65c7f) - Fix: Re-creating a CUDA stream (e831650) - Make: Compile in parallel by default (8e671c6) - Make: Separate host-only code (f751fbf) - Docs: Counter-intuitive PTX facts (822fa2f) - Docs: H200 vs MI 300X vs GB200 specs (cc36bcd) - Make: CUTLASS dependency (f272c40) - Fix: Synchronize cuBLAS for profiling (4077f26) - Docs: Blackwell tensor cores (ec35b35) - Fix: Missing `_Float16` in NVCC, use `half` (71cadca) - Improve: Same size range for GEMM (d914fce) - Fix: Different output size for `cublasGemmEx` (304c880)
1 parent 21cf516 commit 15ed377

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ cmake_minimum_required(VERSION 3.25.2 FATAL_ERROR)
88
# Project Setup
99
# ------------------------------------------------------------------------------
1010
project(less_slow
11-
VERSION 0.7.0
11+
VERSION 0.8.0
1212
LANGUAGES C CXX ASM
1313
DESCRIPTION "Learning how to write Less Slow code, from numerical micro-kernels and SIMD to coroutines, ranges, and polymorphic state machines"
1414
HOMEPAGE_URL "https://github.com/ashvardanian/less_slow.cpp")

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
0.7.0
1+
0.8.0

0 commit comments

Comments
 (0)