Skip to content

Commit 679deb8

Browse files
committed
debug
1 parent 9bf5e70 commit 679deb8

File tree

1 file changed

+24
-0
lines changed

1 file changed

+24
-0
lines changed

chapter_accelerator/Programming_Methods.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,27 @@ while the primitives provided by task-specific hardware units provide a
5151
more detailed interface to hardware operations, and low-level assembly
5252
languages like PTX ISA provide the most detailed, low-level control over
5353
accelerator behavior.
54+
55+
## Programming Examples
56+
57+
We exemplify different programming methods by implementing the General
58+
Matrix Multiplication (GEMM) with each approach. The implementation
59+
targets an NVIDIA Volta GPU. GEMM follows the equation
60+
$\bf{C} = \alpha \bf{A}\times \bf{B} + \beta \bf{C}$, where
61+
$\bf{A}\in\mathbb{R}^{M\times K}, \bf{B}\in\mathbb{R}^{K\times N}, \bf{C}\in\mathbb{R}^{M\times N}$,
62+
and $\alpha$ and $\beta$ are parameters provided by users.
63+
64+
### High-level Computation Operators {#sec-accelerator-use-cublas}
65+
66+
Using an operator acceleration library directly is the most
67+
straightforward method. NVIDIA offers two types of operator libraries:
68+
cuBLAS and cuDNN. cuBLAS provides an interface for leveraging Tensor
69+
Cores to accelerate GEMM operations, while cuDNN offers an interface to
70+
hasten neural network operations. To utilize Tensor Cores via cuBLAS
71+
doing GEMM, we can use function `cublasGemmEx`, its signature is shown
72+
in Code `lst:cublasGemmEx`.
73+
74+
**lst:cublasGemmEx**
75+
```cpp
76+
cublasStatus_t cublasGemmEx(cublasHandle_t handle, cublasOperation_t transa, cublasOperation_t transb, int m, int n, int k, const void *alpha, const void *A, cudaDataType_t Atype, int lda, const void *B, cudaDataType_t Btype, int ldb, const void *beta, void *C, cudaDataType_t Ctype, int ldc, cublasComputeType_t computeType, cublasGemmAlgo_t algo)
77+
```

0 commit comments

Comments
 (0)