Skip to content

jin-yc10/sparse_gemm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

A simple gemm kernel for sparse convolution. Mainly an implicit GEMM cuda kernel with naive tensor-core. Used following tricks to improve overall runtime

  1. tensor-core
  2. float16 arithmetic
  3. and half2 intrinsic ( hfma2, hmul2, hadd2 )
  4. software pipelining
  5. combined memory access ( ldg128, stg128 ). Note that for A100, we should consider using ldgsts intrinsic

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages