lm-sys
diff --git a/‎blog/2025-07-17-mtp.md‎
Lines changed: 2 additions & 2 deletions b/‎blog/2025-07-17-mtp.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎public/images/blog/mtp/flow_chart_hr.png‎
-493 KB b/‎public/images/blog/mtp/flow_chart_hr.png‎
-493 KB
diff --git a/‎public/images/blog/mtp/flow_chart_hr_v2.png‎
-273 KB b/‎public/images/blog/mtp/flow_chart_hr_v2.png‎
-273 KB
diff --git a/‎public/images/blog/mtp/small_scale_throughput_hr.png‎
-369 KB b/‎public/images/blog/mtp/small_scale_throughput_hr.png‎
-369 KB
diff --git a/‎public/images/blog/mtp/small_scale_throughput_hr_v2.png‎
240 KB b/‎public/images/blog/mtp/small_scale_throughput_hr_v2.png‎
240 KB
diff --git a/‎public/images/blog/mtp/thumbnail_2.png‎
-346 KB b/‎public/images/blog/mtp/thumbnail_2.png‎
-346 KB
diff --git a/‎public/images/blog/mtp/thumbnail_3.png‎
254 KB b/‎public/images/blog/mtp/thumbnail_3.png‎
254 KB
@@ -2,7 +2,7 @@
 title: "Accelerating SGLang with Multiple Token Prediction"
 author: "Eigen AI Team"
 date: "July 17, 2025"
-previewImg: /images/blog/mtp/thumbnail_2.png
+previewImg: /images/blog/mtp/thumbnail_3.png
 ---
 
 ## TL;DR
@@ -77,7 +77,7 @@ When MTP is enabled, the system significantly surpasses this benchmark:
 * With a 3-token MTP window and topk=1, the system achieves a throughput of 81.5 tokens/sec per rank, with an average acceptance length of 2.18 tokens.
 * With a 4-token MTP window and topk=1, throughput increases to 82.0 tokens/sec per rank, with an average acceptance length of 2.44 tokens.
 
-![Small-scale throughput graph](/images/blog/mtp/small_scale_throughput_hr.png)
+![Small-scale throughput graph](/images/blog/mtp/small_scale_throughput_hr_v2.png)
 
 These results represent a +60% improvement in output throughput compared to the baseline (i.e., no overlap scheduling and no MTP). This case demonstrates that MTP yields substantial performance gains even in smaller cluster settings with modest concurrency levels, allowing for scalable performance even within constrained GPU resource budgets.