Commit ac77045
authored
Support pack_gqa for ffa fwd (#185)
* add packgqa template
* smem 3d copy done
* add 2d copy
* load with qhead_per_khead=4 correct
* support packgqa/nopackgqa done
* add qhead_per_khead as template arg
* update bench for packgqa
* support lse writeback for no q overlap
* support packgqa o write back without q overlap
* support fwd_epilogue with q overlap
* support pack_gqa with full attention
* update bench and test for uniform block sparse with packgqa
* add packgqa bench
* update profile_ffa
* fix profile_ffa for packgqa
* support pack_gqa with variable block sparse
* enhance test_block_sparse_attn without lse
* fix lse in test_block_sparse_attn
* support all mask type for packgqa
* support deterministic for packgqa fwd
* fix packgqa bench
* simple change tile_scheduler
* change fwd tile_scheduler
* seperate fwd and bwd tilescheduler
* add bwd tilescheduler
* support deterministic for fwd tile_scheduler
* support deterministic for new tile_scheduler and packgqa
* fix lint
* format for python code
* combine packgqa with swapab
* format
* refactor fwd_tile_scheduelr
* format
* fix copyright
* add more comments
* fix bench
* fix test_flex_flash_attn
* fix packgqa default value
* fix Jit param of packgqa
* format
* format1 parent dec7246 commit ac77045
File tree
23 files changed
+2107
-388
lines changed- .github/workflows
- exps/attn
- profile_ffa
- magi_attention
- csrc/flexible_flash_attention
- functional
- testing
- utils
- tests/test_attn
23 files changed
+2107
-388
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
109 | 109 | | |
110 | 110 | | |
111 | 111 | | |
112 | | - | |
| 112 | + | |
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
| 10 | + | |
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| |||
23 | 23 | | |
24 | 24 | | |
25 | 25 | | |
26 | | - | |
27 | | - | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
28 | 31 | | |
29 | 32 | | |
30 | 33 | | |
| |||
45 | 48 | | |
46 | 49 | | |
47 | 50 | | |
| 51 | + | |
| 52 | + | |
48 | 53 | | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
49 | 58 | | |
50 | 59 | | |
51 | 60 | | |
| |||
126 | 135 | | |
127 | 136 | | |
128 | 137 | | |
129 | | - | |
130 | | - | |
| 138 | + | |
| 139 | + | |
131 | 140 | | |
132 | 141 | | |
133 | 142 | | |
| |||
142 | 151 | | |
143 | 152 | | |
144 | 153 | | |
145 | | - | |
| 154 | + | |
146 | 155 | | |
147 | | - | |
| 156 | + | |
148 | 157 | | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
0 commit comments