+SGLang now supports smooth combination of these advanced features: **Multiple Token Prediction (MTP)**, **Large-Scale Expert Parallelism (EP)**, and **Prefill-Decode disaggregation**. This integration delivers **up to 60% higher output throughput** through a new decoding paradigm, better parallelism, and more efficient resource utilization without sacrificing generation quality. If you are serving models, e.g., DeepSeek V3, SGLang now supports MTP as a plug-and-play feature, unlocking immediate performance gains. You can find instruction for reproduction [here](https://github.com/sgl-project/sglang/issues/7998).
0 commit comments