Skip to content

Commit ba7e5fd

Browse files
authored
Update README.md
1 parent 72245e9 commit ba7e5fd

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
<img src='https://github.com/user-attachments/assets/c7d65fe5-9fb9-49a8-b962-a6c09bcc030a' height="225px" width="400px">
2121
</div>
2222

23-
Currently, on NVIDIA L20, RTX 4090 and RTX 3090 Laptop, compared with cuBLAS's default Tensor Cores math algorithm `CUBLAS_GEMM_DEFAULT_TENSOR_OP`, the `HGEMM (WMMA/MMA)` implemented in this repo(`light blue`) can achieve `95%~98%` of its(`orange`) performance. Please check [hgemm benchmark](./hgemm) for more details.
23+
Currently, on NVIDIA L20, RTX 4090 and RTX 3090 Laptop, compared with cuBLAS's default Tensor Cores math algorithm `CUBLAS_GEMM_DEFAULT_TENSOR_OP`, the `HGEMM (WMMA/MMA)` implemented in this repo(`light blue`) can achieve `95%~99%` of its(`orange`) performance. Please check [hgemm benchmark](./hgemm) for more details.
2424

2525
|CUDA Cores|Sliced K(Loop over K)|Tile Block|Tile Thread|
2626
|:---:|:---:|:---:|:---:|

0 commit comments

Comments
 (0)