fix: include fp8_blockscale_gemm_90 in AOT jit-cache (#2533)

Edward-lyz · web-flow · commit 292f9be3f5f6 · 2026-02-12T20:29:43.000-08:00
## Summary - Add fp8_blockscale_gemm_90 (gen_fp8_blockscale_gemm_sm90_module) to the AOT build list when SM90 is enabled. - Avoid runtime JIT compilation for fp8_blockscale_gemm_sm90 in environments without CUDA dev headers, which can fail with cublasLt.h not found. ## Changes - flashinfer/aot.py: append gen_fp8_blockscale_gemm_sm90_module() under add_moe + has_sm90 gating. ## Related Issues - Fixes #2527 - #2527 ## Tests  ## 📌 Description  ## 🔍 Related Issues  ## 🚀 Pull Request Checklist Thank you for contributing to FlashInfer! Before we review your pull request, please make sure the following items are complete. ### ✅ Pre-commit Checks - [x] I have installed `pre-commit` by running `pip install pre-commit` (or used your preferred method). - [x] I have installed the hooks with `pre-commit install`. - [x] I have run the hooks manually with `pre-commit run --all-files` and fixed any reported issues. > If you are unsure about how to set up `pre-commit`, see [the pre-commit documentation](https://pre-commit.com/). ## 🧪 Tests - [x] Tests have been added or updated as needed. - [] All tests are passing (`unittest`, etc.). ## Reviewer Notes   ## Summary by CodeRabbit * **New Features** * Added support for FP8 blockscale matrix multiplication operations on SM90 GPU architecture.
diff --git a/flashinfer/aot.py b/flashinfer/aot.py
@@ -52,6 +52,7 @@
 from .jit.gemm import (
     gen_gemm_module,
     gen_gemm_sm90_module,
+    gen_fp8_blockscale_gemm_sm90_module,
     gen_gemm_sm100_module,
     gen_gemm_sm100_module_cutlass_fp4,
     gen_gemm_sm100_module_cutlass_fp8,
@@ -477,6 +478,8 @@ def gen_all_modules(
         jit_specs.append(gen_gemm_module())
         if has_sm90:
             jit_specs.append(gen_gemm_sm90_module())
+            # fp8 blockscale GEMM (SM90)
+            jit_specs.append(gen_fp8_blockscale_gemm_sm90_module())
             jit_specs.append(gen_fp4_quantization_sm90_module())
             jit_specs.append(gen_cutlass_fused_moe_sm90_module())
         if has_sm100: