From_Zero_To_Sgemm

This project implements and optimizes several CUDA SGEMM kernels and compares their performance against cuBLAS.

Layout

include/: CUDA kernel declarations and implementations in .cuh
src/: kernel compilation units and CPU helpers
apps/: runnable executables (tests, benchmarks, utilities)
benchmark.py: plot performance from CSV output

Build

All executables are built with nvcc:

make gemm_test
make bench_gemm
make profile_kernel
make query_gpu_properties

Run

Functional + performance test

./gemm_test m n k

Benchmark sweep (CSV output)

./bench_gemm 4096 4096 256,512,1024,2048,4096 gpu_tiling 50 benchmark.csv

Plot from CSV

python3 benchmark.py --impl gpu_tiling --csv benchmark.csv

Query GPU properties

./query_gpu_properties

Notes

bench_gemm benchmarks the requested implementation plus gpu_cublas.
If nvidia-smi is available, make picks the GPU compute capability automatically; otherwise it falls back to sm_70.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
apps		apps
include		include
src		src
README.md		README.md
benchmark.py		benchmark.py
benchmark.sh		benchmark.sh
makefile		makefile
profile.sh		profile.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

From_Zero_To_Sgemm

Layout

Build

Run

Functional + performance test

Benchmark sweep (CSV output)

Plot from CSV

Query GPU properties

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

From_Zero_To_Sgemm

Layout

Build

Run

Functional + performance test

Benchmark sweep (CSV output)

Plot from CSV

Query GPU properties

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages