You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- .github/pull_request_template.md -->
## 📌 Description
1. Per discussion with @haochengxi and @Radioheading, this PR moves the
`plan` function in `VariableBlockSparseAttentionWrapper` to the GPU
side, to avoid expensive (hundreds ms) host operations.
2. This PR also enlarges the default internal buffer size to accommodate
video DiT use cases.
3. This PR fixes the **INT overflow** during offset calculation in
attention map. This causes errors in `customized_mask` mode of FA2
prefill template. E.g., with a `kv_len=128K`, the last element of the
attention map will be `128*128*1e6=1e10`, which is larger than
`INT32_MAX`.
<!-- What does this PR do? Briefly describe the changes and why they’re
needed. -->
## 🔍 Related Issues
This PR should solve
#1271
<!-- Link any related issues here -->
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).
## Reviewer Notes
<!-- Optional: anything you'd like reviewers to focus on, concerns, etc.
-->
0 commit comments