Skip to content

Commit 7df8233

Browse files
authored
Update 2025-07-17-mtp.md
1 parent 978ef90 commit 7df8233

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

blog/2025-07-17-mtp.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ previewImg: /images/blog/mtp/thumbnail_3.png
77

88
## TL;DR
99

10-
SGLang is the **first and only** open-source serving framework to support **Multiple Token Prediction (MTP)** in combination with **Large-Scale Expert Parallelism (EP)** and **Prefill-Decode disaggregation**. This integration delivers **up to 60% higher output throughput** through a new decoding paradigm, better parallelism, and more efficient resource utilization without sacrificing generation quality. If you are serving models, e.g., DeepSeek V3, SGLang now supports MTP as a plug-and-play feature, unlocking immediate performance gains. You can find instruction for reproduction [here](https://github.com/sgl-project/sglang/issues/7998).
10+
SGLang now supports smooth combination of these advanced features: **Multiple Token Prediction (MTP)**, **Large-Scale Expert Parallelism (EP)**, and **Prefill-Decode disaggregation**. This integration delivers **up to 60% higher output throughput** through a new decoding paradigm, better parallelism, and more efficient resource utilization without sacrificing generation quality. If you are serving models, e.g., DeepSeek V3, SGLang now supports MTP as a plug-and-play feature, unlocking immediate performance gains. You can find instruction for reproduction [here](https://github.com/sgl-project/sglang/issues/7998).
1111

1212
SGLang’s inference framework running on NVIDIA GPUs enables AI practitioners to easily deliver inference at scale, empowering end users to “think smart” and harness the reasoning capabilities of state-of-the-art language models at the highest performance.
1313

0 commit comments

Comments
 (0)