-
Notifications
You must be signed in to change notification settings - Fork 426
Pull requests: flashinfer-ai/flashinfer
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Fix TRTLLM NVFP4-out attention kernel scale factor dim issue
#1460
opened Aug 11, 2025 by
elvischenv
Loading…
4 of 5 tasks
feat(attention): add RoPE offset support for batch prefill
#1457
opened Aug 11, 2025 by
MengAiDev
Loading…
3 tasks done
Fix cuda-python v13.0 import compatibility
#1455
opened Aug 11, 2025 by
yongwww
Loading…
3 of 5 tasks
refactor: unify autotuner for fp4 gemm backends
#1439
opened Aug 8, 2025 by
ttyio
Loading…
3 of 5 tasks
gpt-oss: Add MXFP8 x MXFP4 CUTLASS MOE for SM100 and BF16 x MXFP4 CUTLASS for SM90 + SwigluBias Activation
#1396
opened Aug 6, 2025 by
djmmoss
Loading…
4 of 5 tasks
Allow BatchPrefillPagedWrapper to call cudnn API
#1384
opened Aug 5, 2025 by
Anerudhan
Loading…
4 tasks done
misc: Customize kv lens buffer size for sparse attention
#1383
opened Aug 5, 2025 by
Edenzzzz
Loading…
5 tasks
Removes MPI dependency from MNNVL AllReduce
#1379
opened Aug 4, 2025 by
pranavm-nvidia
Loading…
5 tasks
Unify and modularize decode and prefill test.
#1375
opened Aug 4, 2025 by
weireweire
Loading…
5 tasks done
feat: Support sliding window for persistent kernel
#1368
opened Aug 3, 2025 by
Edenzzzz
Loading…
5 tasks
refactor: Improved metainfo for trtllm-gen kernels
#1328
opened Jul 25, 2025 by
cyx-6
Loading…
5 tasks
Add k_scale and v_scale to persistent attention
#1322
opened Jul 24, 2025 by
Edenzzzz
Loading…
5 tasks
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.