Skip to content

[Question]: Are there plans to support Qwen3-Next with MTP? #9901

@kjgfjlkj

Description

@kjgfjlkj

System Info

System Information:

  • OS:
  • Python version:
  • CUDA version:
  • GPU model(s):
  • Driver version:
  • TensorRT-LLM version:

Detailed output:

Paste the output of the above commands here

How would you like to use TensorRT-LLM

I saw in the documentation that Qwen3 Next does not support MTP. Is there any plan to add support for these features?

Specific questions:

  • Model: qwen3-next
  • Use case (e.g., chatbot, batch inference, real-time serving): serving
  • Expected throughput/latency requirements:
  • Multi-GPU setup needed:

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.

Metadata

Metadata

Assignees

Labels

feature requestNew feature or request. This includes new model, dtype, functionality supportquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions