Commit c0bfc57
authored
CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32 (#16277)
* CUDA: mul_mat_id for mmf for bs <= 64 for f16 and bs <= 32 for f32
This commit adds mul_mat_id support for ncols_dst >= 16. It does this by
packing ncols_dst tiles into the blockDim.y.
My tests on a RTX 3090 show that this is faster than the cuBLAS fallback
for f16 till bs=64, and for f32 till bs=32
* Review: refactor if statement1 parent 75a3a6c commit c0bfc57
File tree
4 files changed
+92
-47
lines changed- ggml/src/ggml-cuda
- tests
4 files changed
+92
-47
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2031 | 2031 | | |
2032 | 2032 | | |
2033 | 2033 | | |
2034 | | - | |
| 2034 | + | |
2035 | 2035 | | |
2036 | 2036 | | |
2037 | 2037 | | |
2038 | 2038 | | |
2039 | 2039 | | |
2040 | 2040 | | |
2041 | 2041 | | |
2042 | | - | |
| 2042 | + | |
2043 | 2043 | | |
2044 | 2044 | | |
2045 | 2045 | | |
| |||
2111 | 2111 | | |
2112 | 2112 | | |
2113 | 2113 | | |
2114 | | - | |
| 2114 | + | |
2115 | 2115 | | |
2116 | 2116 | | |
2117 | 2117 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
84 | 84 | | |
85 | 85 | | |
86 | 86 | | |
87 | | - | |
| 87 | + | |
88 | 88 | | |
89 | 89 | | |
90 | 90 | | |
| |||
96 | 96 | | |
97 | 97 | | |
98 | 98 | | |
99 | | - | |
100 | | - | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
101 | 111 | | |
102 | 112 | | |
103 | 113 | | |
| |||
0 commit comments