You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Support ROCM builds from source distribution, and improve error handling (Dao-AILab#1446)
* Always update both submodules to include them in sdist
Always update both submodules, irrespectively of whether a CUDA
or a ROCM build is being done, to ensure that the necessary files
from both are present in sdist. Otherwise, attempt to perform a ROCM
build from sdist fails because of missing `composable_kernel` srouces.
* Include `*.py` files from composable_kernel in sdist
Include the `*.py` files from `csrc` in sdist, to ensure that
the `generate.py` script is present.
* Replace the `os.system()` calls in `setup.py` with `subprocess.run()`
* Add error checking to `subprocess.run()` calls in `setup.py`
Add error checking to ensure that `setup.py` fails immediately if one
of the commands fail. Otherwise, the failures result only in messages
to stderr that could be missed, and could lead to more confusing errors
later in the build process.
* Call git in `setup.py` only when working in a git repository
Call git commands in `setup.py` only when the `.git` directory is
present, indicating that we are working in a git checkout. Otherwise,
just assert that the needed files are there. With this, building
from a source distribution no longer attempts to call git
in an incorrect directory.
* [Build] Update version of setuptools used to generate core package (Dao-AILab#1460)
* Don't compile for CUDA 11, compile for official pytorch 2.6.0
* Bump to v2.7.4
* Drop Pytorch 2.1
* [FA3] Compile with nvcc 12.8 instead of 12.3
* Fix comment in assert
* [CE] Assert logit_scale > 0
* Implement HeadDim_V != HeadDim_QK, support hdimQK=192, hdimV=128
* Fix shape_O in epilogue params when kHeadDimV != kHeadDim
* Remove old combine.h
* Fix loading paged V when kHeadDimV != kHeadDim
* Fix shape_V for storing new KV when kHeadDimV != kHeadDim
* Implement the case of LargeHeadDimV
* Rename Mma0->MmaQK, Mma1->MmaPV, use Cluster only if hdimV >= 192
* Pass _1 or _0 to cute::aligned_struct
* Fix compilation for FP8 when kHeadDimV != kHeadDim
* Support Qv
* Test varlen_q=True by default for kvcache
* Fix num_splits heuristic being called before get_pack_gqa
* Fix num_splits heuristic again when PackGQA
* Tile fwd_combine kernel along headdim, don't need kBlockM > 128
* Use bf16 instead of fp16 in benchmark_gemm.py
* Update Cutlass to 3.7
* Use nvcc 12.6 but ptxas 12.8
* cicc uses the same version as ptxas
* Split hdimdiff into a separate translation unit
* Update benchmark script
* Update Cutlass to 3.8
* Adjust tile size for hdim 64
* Adjust ninja build file
* build head diff + fix build errors
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
---------
Signed-off-by: Lucas Wilkinson <lwilkinson@neuralmagic.com>
Co-authored-by: Michał Górny <mgorny@gentoo.org>
Co-authored-by: Aman Karmani <aman@tmm1.net>
Co-authored-by: Tri Dao <tridpq@gmail.com>
- Linux. Might work for Windows starting v2.3.2 (we've seen a few positive [reports](https://github.com/Dao-AILab/flash-attention/issues/595)) but Windows compilation still requires more testing. If you have ideas on how to set up prebuilt CUDA wheels for Windows, please reach out via Github issue.
0 commit comments