-
-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Open
Labels
deepseekRelated to DeepSeek modelsRelated to DeepSeek modelsfeature requestNew feature or requestNew feature or requestperformancePerformance-related issuesPerformance-related issues
Description
🚀 The feature, motivation and pitch
Nonexhaustive list of new flashinfer features that may be integrated into vLLM.
- [DSV3] Optimized Router Gemm flashinfer-ai/flashinfer#2019
- [DSV3] Optimized routing kernels dsv3 flashinfer-ai/flashinfer#2099 - wip: [kernel] use fused_topk of flashinfer #33890
- feat: Fused RMSNorm + FP4 Quantization Kernels in CuTe-DSL flashinfer-ai/flashinfer#2233 - wip: [Feat] Add RMSNorm NvFp4 Quant Operator (#32612) #32957
- feat: Add flashinfer.rope.rope_quantize_fp8_append_paged_kv_cache (fused RoPE + Q + KV cache, supports MLA/GQA/MHA) flashinfer-ai/flashinfer#2037
- feat: cuteDSL fp4 moe for better DSR1 performance. flashinfer-ai/flashinfer#2398
- Added an initial implementation of Q and KV Cache in fp8 and to use t… flashinfer-ai/flashinfer#2035 and Added the cudnn backend Ragged KV Cache wrapper flashinfer-ai/flashinfer#2352
- Port TRT-LLM communication kernels to flashinfer flashinfer-ai/flashinfer#2102 - wip: [Draft][Kernel] Add new flashinfer A2A kernel #32217
- [comm] TRT-LLM's Multi-Node NVLink All-Reduce Kernel flashinfer-ai/flashinfer#1213 - wip: @hjjq
Alternatives
No response
Additional context
No response
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
deepseekRelated to DeepSeek modelsRelated to DeepSeek modelsfeature requestNew feature or requestNew feature or requestperformancePerformance-related issuesPerformance-related issues