[Tracking] Qwen3.5/Qwen3-Next Optimizations

## Hybrid Liear Attention
- [x] hybrid_linear_attn attention refactor https://github.com/sgl-project/sglang/pull/10816

## GDN Kernels

Not arch specific
- [x] Optimize Triton GDN decode 1 https://github.com/sgl-project/sglang/pull/17094
- [x] Optimize Triton GDN decode 2 https://github.com/sgl-project/sglang/pull/18271

SM90

- [ ] Prefill and decode in Cutlass (from FlashInfer): https://github.com/sgl-project/sglang/pull/18361

SM100

- [ ] Prefill in Gluon https://github.com/sgl-project/sglang/pull/17983
- [x] FlashInfer CuteDSL decode kernel https://github.com/sgl-project/sglang/pull/15631
- [ ] Another CuteDSL decode kernel https://github.com/sgl-project/sglang/pull/17981
- [ ] New faster FlashInfer CuteDSL kernel in FlashInfer https://github.com/flashinfer-ai/flashinfer/pull/2498

## MoE
- [x] bf16 cutlass fused MoE for Hopper https://github.com/sgl-project/sglang/pull/10275
- [x] BF16 TRTLLM MoE for Blackwell https://github.com/sgl-project/sglang/pull/13798
- [x] NVFP4 TRTLLM GEN MoE for Blackwell https://github.com/sgl-project/sglang/pull/13761

## Full Attention
- [ ] Enable XQA for SM90 and SM120 https://github.com/sgl-project/sglang/pull/17115
- [x] TRTLLM_MHA backend for Blackwell https://github.com/sgl-project/sglang/pull/11138
- [ ] https://github.com/sgl-project/sglang/pull/19030

## Communications
- [x] Fix regression caused by enabling symmetric memory: https://github.com/sgl-project/sglang/pull/18643

## NVFP4 kv cache
- [ ] FP4 KV Cache for SM120 https://github.com/sgl-project/sglang/pull/18314
- [ ] FP4 KV Cache for SM100 https://github.com/sgl-project/sglang/pull/17733

## Runtime
- [ ] piecewise cuda graph for TRTLLM-GEN backends for Blackwell https://github.com/sgl-project/sglang/pull/18184
- [ ] MTP_v2: https://github.com/sgl-project/sglang/pull/15591, https://github.com/sgl-project/sglang/pull/18808

## Qwen3-Next
- [ ] https://github.com/sgl-project/sglang/pull/18917
- [ ] Huggingface NVFP4 model support https://github.com/sgl-project/sglang/pull/17627

## Qwen3.5
- [x] https://github.com/sgl-project/sglang/pull/18937
- [ ] ViT: https://github.com/sgl-project/sglang/issues/18784 https://github.com/sgl-project/sglang/pull/18559



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking] Qwen3.5/Qwen3-Next Optimizations #18590

Hybrid Liear Attention

GDN Kernels

MoE

Full Attention

Communications

NVFP4 kv cache

Runtime

Qwen3-Next

Qwen3.5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tracking] Qwen3.5/Qwen3-Next Optimizations #18590

Description

Hybrid Liear Attention

GDN Kernels

MoE

Full Attention

Communications

NVFP4 kv cache

Runtime

Qwen3-Next

Qwen3.5

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions