[Question]: Are there plans to support Qwen3-Next with MTP?

### System Info

**System Information:**
- OS:
- Python version:
- CUDA version:
- GPU model(s):
- Driver version:
- TensorRT-LLM version:

**Detailed output:**
```text
Paste the output of the above commands here
```


### How would you like to use TensorRT-LLM

I saw in the documentation that Qwen3 Next does not support MTP. Is there any plan to add support for these features?


**Specific questions:**
- Model: qwen3-next
- Use case (e.g., chatbot, batch inference, real-time serving): serving
- Expected throughput/latency requirements:
- Multi-GPU setup needed:


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Question]: Are there plans to support Qwen3-Next with MTP? #9901

System Info

How would you like to use TensorRT-LLM

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Question]: Are there plans to support Qwen3-Next with MTP? #9901

Description

System Info

How would you like to use TensorRT-LLM

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions