This sample demonstrates the usage of cuSPARSELt library and cusparseMatMul APIs for performing structured matrix - dense matrix multiplication by exploiting NVIDIA Sparse Tensor Cores, where the structured matrix is compressed with 50% sparsity ratio.
The sample also demonstrates the usage of batched computation, Split-K, ReLU activation function, and bias.
C_i = ReLU(A_i * B_i + C_i + bias)
where A is an structured matrix, and B, C, D are dense matrices
-
Linux
make CUSPARSELT_PATH=<cusparseLt_path> CUDA_TOOLKIT_PATH=<cuda_toolkit_path>
-
or in alternative:
mkdir build cd build cmake -DCUSPARSELT_PATH=<cusparseLt_path> -DCMAKE_CUDA_COMPILER=<nvcc_path> .. make
- Supported SM Architectures: SM 8.0, SM 8.6, SM 8.9, SM 9.0
- Supported OSes: Linux, Windows
- Supported CPU Architectures: x86_64, arm64
- Supported Compilers: gcc, clang, Microsoft msvc, Nvidia HPC SDK nvc
- Language:
C++14
- CUDA 12.0 toolkit (or above) and compatible driver (see CUDA Driver Release Notes).
- cusparseLt 0.6.1 or above
- CMake 3.18 or above