This project implements and optimizes several CUDA SGEMM kernels and compares their performance against cuBLAS.
include/: CUDA kernel declarations and implementations in.cuhsrc/: kernel compilation units and CPU helpersapps/: runnable executables (tests, benchmarks, utilities)benchmark.py: plot performance from CSV output
All executables are built with nvcc:
make gemm_test
make bench_gemm
make profile_kernel
make query_gpu_properties./gemm_test m n k./bench_gemm 4096 4096 256,512,1024,2048,4096 gpu_tiling 50 benchmark.csvpython3 benchmark.py --impl gpu_tiling --csv benchmark.csv./query_gpu_propertiesbench_gemmbenchmarks the requested implementation plusgpu_cublas.- If
nvidia-smiis available,makepicks the GPU compute capability automatically; otherwise it falls back tosm_70.