-
I've been trying to identify how to utilize flashinfer on my vLLM instance with CUDA 12.8 and PyTorch 2.7 for about an hour now. Ideally, I would be able to use the quick start method seen here to do this, but that is clearly not available currently. I am curious as to why it is not though, as in my searching I came across this PR which, based on the code changes, does add support for them and includes them in the pre-built wheels. I also see these changes live in the Main branch's source, so I'm just curious when that will be available? TIA! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
The JIT package should be available to use for any cuda + torch versions:
We plan to forsake the AOT wheels, and only keeps the JIT package, because in the long term binary size would exceed limits of pypi wheel size even if we apply for growth. Currently user can try:
to compile all kernels ahead-of-time. We are building the artifactory so that user can try (it's not ready yet but should be available soon @yongwww ): python -m flashinfer --download all to download all pre-built binaries. |
Beta Was this translation helpful? Give feedback.
-
Thank you for the response, installing now! If I may, I would suggest that you guys add this as a temporary banner or notice to the document page that I referenced - possibly at the top - to save those who find myself in the same position as I did the time and searching. |
Beta Was this translation helpful? Give feedback.
The JIT package should be available to use for any cuda + torch versions:
You can try:
We plan to forsake the AOT wheels, and only keeps the JIT package, because in the long term binary size would exceed limits of pypi wheel size even if we apply for growth.
Currently user can try:
to compile all kernels ahead-of-time.
We are building the artifactory so that user can try (it's not ready yet but should be available soon @yongwww ):
to download all pre-built binaries.