-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
feature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality supportquestionFurther information is requestedFurther information is requested
Description
System Info
System Information:
- OS:
- Python version:
- CUDA version:
- GPU model(s):
- Driver version:
- TensorRT-LLM version:
Detailed output:
Paste the output of the above commands here
How would you like to use TensorRT-LLM
I saw in the documentation that Qwen3 Next does not support MTP. Is there any plan to add support for these features?
Specific questions:
- Model: qwen3-next
- Use case (e.g., chatbot, batch inference, real-time serving): serving
- Expected throughput/latency requirements:
- Multi-GPU setup needed:
Before submitting a new issue...
- Make sure you already searched for relevant issues, and checked the documentation and examples for answers to frequently asked questions.
Metadata
Metadata
Assignees
Labels
feature requestNew feature or request. This includes new model, dtype, functionality supportNew feature or request. This includes new model, dtype, functionality supportquestionFurther information is requestedFurther information is requested