Cuda Examples

Collection of example kernels:

Tiled MatMul: A simple implementation of tiled multiplication
1D Softmax: different implementations of 1D softmax with some profiling
Flash Atetntion: Implementation of fused matmul and softmax and then flash attention.
Reduce: Simple implementations of the sum/reduce kernel.

Setup

make setup

Tested on:

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt
utils.py		utils.py