A minimal Flash Attention implementation in Triton — includes both forward and backward kernels. This is just a small demo project — not intended for real-world use.
# Install dependencies using uv
uv sync
# Run all tests
uv run pytest
- Python ≥ 3.10
- PyTorch ≥ 2.7.1
- Triton ≥ 3.3.1
- CUDA-compatible GPU