Commit a97a266

authored and

committed

[TLX] Initial warp-specialized flash attention bwd kernel (#623)

Summary: ``` fused-attention-ws-pipelined-persistent-batch4-head32-d128: N_CTX Triton [FP16] 0 1024.0 322.140956 1 2048.0 394.705387 2 4096.0 440.500436 3 8192.0 460.003597 4 16384.0 472.748830 ``` Next steps would be : - Enable mma pipelining - Enable cooperative compute - Enable persistent Pull Request resolved: #623 Reviewed By: manman-ren Differential Revision: D86109519 Pulled By: htyu fbshipit-source-id: 82af2c4b97ea62b49f4296ace607233df82d8c33

1 parent f10024b commit a97a266Copy full SHA for a97a266

1 file changed

+383

-23

lines changed

third_party/tlx/tutorials
- blackwell-fa-ws-pipelined-persistent_test.py

1 file changed

+383

-23

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit a97a266

1 file changed

1 file changed

File tree

1 file changed

1 file changed

0 commit comments