You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Perf] Cache device property functions to avoid recomputation (#1824)
<!-- .github/pull_request_template.md -->
## 📌 Description
Reduce overhead among cuda launches by caching device property
functions. In small batches, we observed GPU bubbles which means for
certain cases, CPU workload (e.g. cuda launch preparations) delays GPU
kernel launches.
In this PR, we simply cache device property functions to reduce the CPU
workload overhead.
**Before**
<img width="2192" height="861" alt="Screenshot 2025-09-30 at 3 41 21 PM"
src="https://github.com/user-attachments/assets/762d9334-da03-4359-91a1-8af9368a8bb5"
/>
**After**
<img width="1910" height="231" alt="Screenshot 2025-09-30 at 3 54 54 PM"
src="https://github.com/user-attachments/assets/9c5389d4-eae8-4722-b117-ba6e822f1c43"
/>
## 🔍 Related Issues
N/A
## 🚀 Pull Request Checklist
Thank you for contributing to FlashInfer! Before we review your pull
request, please make sure the following items are complete.
### ✅ Pre-commit Checks
- [x] I have installed `pre-commit` by running `pip install pre-commit`
(or used your preferred method).
- [x] I have installed the hooks with `pre-commit install`.
- [x] I have run the hooks manually with `pre-commit run --all-files`
and fixed any reported issues.
> If you are unsure about how to set up `pre-commit`, see [the
pre-commit documentation](https://pre-commit.com/).
## 🧪 Tests
- [x] Tests have been added or updated as needed.
- [x] All tests are passing (`unittest`, etc.).
## Reviewer Notes
N/A
---------
Signed-off-by: Jialin Ouyang <[email protected]>
0 commit comments