From_Zero_To_Sgemm

This project implements and optimizes several CUDA SGEMM kernels and compares their performance against cuBLAS.

Layout

All executables are built with nvcc:

make gemm_test
make bench_gemm
make profile_kernel
make query_gpu_properties

./gemm_test m n k

./bench_gemm 4096 4096 256,512,1024,2048,4096 gpu_tiling 50 benchmark.csv

python3 benchmark.py --impl gpu_tiling --csv benchmark.csv

./query_gpu_properties

bench_gemm benchmarks the requested implementation plus gpu_cublas.
If nvidia-smi is available, make picks the GPU compute capability automatically; otherwise it falls back to sm_70.