Skip to content

Commit b53c7c0

Browse files
authored
update fig (#167)
1 parent ed01c30 commit b53c7c0

File tree

7 files changed

+2
-2
lines changed

7 files changed

+2
-2
lines changed

blog/2025-07-17-mtp.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
title: "Accelerating SGLang with Multiple Token Prediction"
33
author: "Eigen AI Team"
44
date: "July 17, 2025"
5-
previewImg: /images/blog/mtp/thumbnail_2.png
5+
previewImg: /images/blog/mtp/thumbnail_3.png
66
---
77

88
## TL;DR
@@ -77,7 +77,7 @@ When MTP is enabled, the system significantly surpasses this benchmark:
7777
* With a 3-token MTP window and topk=1, the system achieves a throughput of 81.5 tokens/sec per rank, with an average acceptance length of 2.18 tokens.
7878
* With a 4-token MTP window and topk=1, throughput increases to 82.0 tokens/sec per rank, with an average acceptance length of 2.44 tokens.
7979

80-
![Small-scale throughput graph](/images/blog/mtp/small_scale_throughput_hr.png)
80+
![Small-scale throughput graph](/images/blog/mtp/small_scale_throughput_hr_v2.png)
8181

8282
These results represent a +60% improvement in output throughput compared to the baseline (i.e., no overlap scheduling and no MTP). This case demonstrates that MTP yields substantial performance gains even in smaller cluster settings with modest concurrency levels, allowing for scalable performance even within constrained GPU resource budgets.
8383

-493 KB
Binary file not shown.
-273 KB
Binary file not shown.
-369 KB
Binary file not shown.
240 KB
Loading
-346 KB
Binary file not shown.
254 KB
Loading

0 commit comments

Comments
 (0)