Commit a97a266
[TLX] Initial warp-specialized flash attention bwd kernel (#623)
Summary:
```
fused-attention-ws-pipelined-persistent-batch4-head32-d128:
N_CTX Triton [FP16]
0 1024.0 322.140956
1 2048.0 394.705387
2 4096.0 440.500436
3 8192.0 460.003597
4 16384.0 472.748830
```
Next steps would be :
- Enable mma pipelining
- Enable cooperative compute
- Enable persistent
Pull Request resolved: #623
Reviewed By: manman-ren
Differential Revision: D86109519
Pulled By: htyu
fbshipit-source-id: 82af2c4b97ea62b49f4296ace607233df82d8c331 parent f10024b commit a97a266
File tree
1 file changed
+383
-23
lines changed- third_party/tlx/tutorials
1 file changed
+383
-23
lines changed
0 commit comments