Skip to content

Commit 74f5dcc

Browse files
yzh119IwakuraReinyyihuang
authored
refactor: refactor trtllm-gen attention kernel integration code (#1289)
<!-- .github/pull_request_template.md --> ## 📌 Description <!-- What does this PR do? Briefly describe the changes and why they’re needed. --> Simplify and unify the interface for trtllm-gen decode/prefill/mla kernels, and add support for shared-kv (in MLA, #1273). ## 🔍 Related Issues <!-- Link any related issues here --> ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [x] All tests are passing (`unittest`, etc.). ## Reviewer Notes <!-- Optional: anything you'd like reviewers to focus on, concerns, etc. --> --------- Co-authored-by: siyuanf <[email protected]> Co-authored-by: Avery Yingyi Huang <[email protected]> Co-authored-by: Zihao <[email protected]>
1 parent 3f99f18 commit 74f5dcc

16 files changed

+483
-834
lines changed

csrc/trtllm_fmha_kernel_launcher.cu

Lines changed: 212 additions & 280 deletions
Large diffs are not rendered by default.

csrc/trtllm_mla_kernel_launcher.cu

Lines changed: 0 additions & 196 deletions
This file was deleted.

0 commit comments

Comments
 (0)