Roadmap (2025 Q4)

We would like feedback from the community on this rough plan for Q4. This is of course a work in progress, and we welcome feedback at any time. Please add comments below or on any specific issues. We will edit this description as plans change.

Also, if you're interested in contributing, feel free to dive into any of the unassigned issues!

# Soul

These are the broad areas of focus for the quarter. Items in the roadmap below are tagged by “soul item”.

- [Testing]: Better testing and CI infrastructure to prevent build breaks and accuracy issues at the framework level
- [Model Optimization]: DeepSeek-R1, GPT-OSS, Qwen3, Qwen3-Next, MiniCPM4.1-8B, and others
- [API Usability]: API cleanup and refactoring for better user experience

# October

- [Model Optimization] DSR1 improvements (details TBD)
- [Model Optimization] #1831
- [Model Optimization] #1690
- [Model Optimization] GPT-OSS perf improvements for max throughput case
- [Model Optimization] Native Sparse Attention
- [Model Optimization] #1691
- [API Usability] #1722
- [Testing] Expanded CI coverage per-PR (ability to trigger tests on NVIDIA-internal test infrastructure, including various Blackwell devices)
- [Testing] Initial integration testing: e2e functional sanity checks
- [Testing] #1733
- [Model Optimization] Non-gated MoE with squared ReLU activation
- [Model Optimization] #1734
- [Model Optimization] #1732 
- [Model Optimization] Add RoPE, RoPE+Q, RoPE+Q+KVCacheUpdate fused kernels for MLA/GQA/MHA
- [API Usability] Uniform behavior when a (backend, target device, problem shape) is not supported
- [API Usability] More clear specification of SM func and perf support across interfaces/backends.
- [API Usability] #1641
- [API Usability] Unifying quantization related modules (fp4 quantize/quantize)

# November

- [Testing] [API Usability] #1811 
- [Model Optimization] Cosmos Reasoning 7B (details TBD)
- [Testing] Improved unit testing based on escape analysis
- [Testing] Improved integration testing based on escape analysis
- [Model Optimization] MXFP4 gemm perf improvements
- [API Usability] Support FP8-qkv FP8/FP4-output trtllm-gen in FlashInfer prefill/decode wrapper
- [API Usability] Unify qk_scale and o_scale Behavior Between trtllm-gen Attention and flashinfer-jit Attention
- [API Usability] Fused MoE general improvements, including (but not limited to):
  - [RFC: Flashinfer MoE Wrappers for vLLM Integration](https://docs.google.com/document/d/144ZQyWsuNWjqVAYGW1iT4PUo8v00I14xkCXCOo3UuKo/edit?tab=t.0)
  - #1669
- [API Usability] Attention API consolidation
- [API Usability] #1709




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Roadmap (2025 Q4) #1770

Soul

October

November

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Roadmap (2025 Q4) #1770

Description

Soul

October

November

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions