-
Notifications
You must be signed in to change notification settings - Fork 236
Commit d9a171c
[Example] Add examples to support efficient attention sink forward process (#853)
* [Example] Add a new example to support attention sink for MHA
- Introduced a new example script for multi-head attention (MHA) with sliding window attention and sink tokens.
- Added a reference attention function to validate the implementation against PyTorch.
- Included argument parsing for command-line execution of the example.
* [Example] Replace MHA sink forward example with updated implementation
- Removed the old example script for multi-head attention (MHA) with sliding window attention and sink tokens.
- Introduced a new example script that modifies the attention mechanism to enhance performance and maintainability.
- Updated argument parsing and reference functions to align with the new implementation.
* Enhance MHA sink example with sliding window support
- Added a `window_size` parameter to the `flashattn` function to enable sliding window attention.
- Implemented assertions to ensure `window_size` is compatible with `block_N`.
- Updated the main function to include a `tune` option for performance tuning.
- Introduced a new test file to validate both full attention and sliding window scenarios.
- Adjusted FLOPS calculation to account for the sliding window configuration.
* lint
* [Fix] Add checkinf process to fix the bug of swa
* Migrate to BSHD layout to align with triton baselines
* lint
* fix typo
* Refactor MHA sink example to use seq_q and seq_kv parameters to accommodate the new sequence length parameters.
* Add GQA sink example for optimized attention mechanism & lint fix
* fix several typos and bugs
* lint
* fix speed issues of swa
* Update examples/attention_sink/example_gqa_sink_fwd_bhsd_wgmma_pipelined.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Update examples/attention_sink/example_mha_sink_fwd_bhsd_wgmma_pipelined.py
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>1 parent b448309 commit d9a171cCopy full SHA for d9a171c
File tree
Expand file treeCollapse file tree
4 files changed
+1233
-0
lines changedFilter options
- examples/attention_sink
Expand file treeCollapse file tree
4 files changed
+1233
-0
lines changed
0 commit comments