Commit b6e067b
committed
CUDA: Improve flash decoding kernel occupancy for BS=1 case
Adds the following optimizations to the CUDA flash decoding code:
- Find out active blocks per SM using cudaOccupancyMaxActiveBlocksPerMultiprocessor API. Use this value to determine the optimal parallel_blocks value.
- Prefer vector flash attention kernels over MMA kernel for BS=1
This results in upto 15% perf improvement in gen phase throughput for large seq lengths.
Issue: #121821 parent f6711ce commit b6e067b
File tree
4 files changed
+21
-8
lines changed- ggml/src/ggml-cuda
- vendors
4 files changed
+21
-8
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
703 | 703 | | |
704 | 704 | | |
705 | 705 | | |
706 | | - | |
| 706 | + | |
707 | 707 | | |
708 | 708 | | |
709 | 709 | | |
| |||
756 | 756 | | |
757 | 757 | | |
758 | 758 | | |
| 759 | + | |
| 760 | + | |
759 | 761 | | |
760 | 762 | | |
761 | 763 | | |
| |||
777 | 779 | | |
778 | 780 | | |
779 | 781 | | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
780 | 797 | | |
781 | 798 | | |
782 | 799 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
244 | 244 | | |
245 | 245 | | |
246 | 246 | | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | 247 | | |
251 | 248 | | |
252 | 249 | | |
| |||
296 | 293 | | |
297 | 294 | | |
298 | 295 | | |
299 | | - | |
300 | | - | |
301 | | - | |
302 | | - | |
| 296 | + | |
303 | 297 | | |
304 | 298 | | |
305 | 299 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
129 | 129 | | |
130 | 130 | | |
131 | 131 | | |
| 132 | + | |
132 | 133 | | |
133 | 134 | | |
134 | 135 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
| 136 | + | |
136 | 137 | | |
137 | 138 | | |
0 commit comments