new mtp update (#165)

yilian49 · web-flow · commit 8a75c04af146 · 2025-07-17T17:58:27.000-07:00
diff --git a/blog/2025-07-17-mtp.md b/blog/2025-07-17-mtp.md
@@ -28,7 +28,6 @@ MTP works by dividing the generation into two stages:
 -**Drafting:** The lightweight draft model predicts one or more short sequence candidate(s) of n tokens in a single fast pass. Here we use one sequence candidate as an example.  
    (1) *“Today is a sunny”* is the current prefix produced by the target model.  
    (2) *“day” is first generated by the target model's extend/prefill stage.*  
-
    (3) *“and” is the first draft token generated by the draft model's extend/prefill stage.*  
    (4) *“it’s so hot” are the three extra draft tokens generated by the draft model decoding iterations; In the example case, n=4 for “and it’s so hot”.*
 
@@ -76,7 +75,7 @@ The small-scale deployment configuration was selected based on production requir
 In this scenario, we deploy two decoding nodes across a total of 16 H200 GPUs, running 2 concurrent requests per rank with input sequence length of 65,536 tokens and output sequence length of 4,096 tokens.  As baseline, we tested the case with no MTP and no overlap scheduling, and the system achieves an output throughput of 51 tokens/sec per rank. Using overlap scheduling alone, a feature introduced in SGLang v0.4, we achieved 60.4 tokens/sec per rank, meeting the production threshold without the need for MTP. 
 When MTP is enabled, the system significantly surpasses this benchmark:
 * With a 3-token MTP window and topk=1, the system achieves a throughput of 81.5 tokens/sec per rank, with an average acceptance length of 2.18 tokens.
-*With a 4-token MTP window and topk=1, throughput increases to 82.0 tokens/sec per rank, with an average acceptance length of 2.44 tokens.
+* With a 4-token MTP window and topk=1, throughput increases to 82.0 tokens/sec per rank, with an average acceptance length of 2.44 tokens.
 
 ![Small-scale throughput graph](/images/blog/mtp/small_scale_throughput_hr.png)
 
@@ -117,9 +116,9 @@ You can monitor acceptance rates in logs to fine-tune this parameter over time.
 
 We would like to express our heartfelt gratitude to the following teams and collaborators. In particular, we extend our sincere thanks to the NVIDIA DGX Cloud team for providing powerful GPUs and for their exceptional support in ensuring operational excellence:
 
-**Eigen AI Team** - Jinglei Cheng, Jiaqi Gu, Yipin Guo, Di Jin, Uill Liu, Zhijian Liu, Zilin Shen, Ryan Hanrui Wang, Wei-Chen Wang, Junyao Zhang and many others.
+**Eigen AI Team** - Jinglei Cheng, Yipin Guo, Zilin Shen, Ryan Hanrui Wang, Wei-Chen Wang, Junyao Zhang and many others.
 
-**SGLang Team and Community** - Kavio Yu, Qiaolin Yu, Boxin Zhang, Shangming Cai, Jinfu Deng, Yineng Zhang and many others.
+**SGLang Team and Community** - Kavio Yu, Qiaolin Yu, Boxin Zhang, Shangming Cai, Jinfu Deng, Jiaqi Gu, Di Jin, Uill Liu, Yineng Zhang and many others.
 
 **xAI Team** - Sehoon Kim, Ying Sheng, Lianmin Zheng, Sangbin Cho, Hanming Lu, Byron Hsu, Pranjal Shankhdhar, Cheng Wan and many others.