Skip to content

Commit 21eebfc

Browse files
authored
Update (#190)
* update * update * update amd day0
1 parent 6889851 commit 21eebfc

File tree

3 files changed

+107
-89
lines changed

3 files changed

+107
-89
lines changed

blog/2025-08-27-gpt-oss.md

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,21 +26,27 @@ To show the impact of our optimizations, we benchmarked SGLang across a range of
2626

2727
##### Low-Latency Performance (Batch Size = 1)
2828

29-
For latency-sensitive applications, we measured single-batch decode throughput across NVIDIA and AMD GPUs, showcasing excellent performance.
29+
For latency-sensitive applications, we measured single-batch decode throughput across B200 and H100 GPUs, showcasing excellent performance.
3030

31-
| Hardware / Precision | NVIDIA B200 | NVIDIA H100 | AMD MI350 |
32-
| -------------------- | ------------ | ------------ | ------------ |
33-
| MXFP4 | 416.02 tok/s | 318.53 tok/s | 200.84 tok/s |
34-
| BF16 | 315.63 tok/s | 293.12 tok/s | 220.06 tok/s |
31+
| Hardware / Precision | NVIDIA B200 | NVIDIA H100 |
32+
| -------------------- | ------------ | ------------ |
33+
| MXFP4 | 416.02 tok/s | 318.53 tok/s |
34+
| BF16 | 315.63 tok/s | 293.12 tok/s |
3535

3636
<span style="color: grey; font-size: 12px;">
37-
B200 was tested with TP=4, H100 with TP=8 and triton attention, and MI350 with TP=8 and triton backend.
37+
B200 was tested with TP=4, H100 was tested with TP=8 and triton attention.
3838
</span>
3939

4040
##### High-Throughput Performance (Batch Size = 32)
4141

4242
For high-throughput applications, SGLang delivers significant performance gains over our initial Day 0 support and have shown great performance on both prefill and decode on different hardwares.
4343

44+
<!-- grey text -->
45+
46+
<span style="color: grey; font-size: 12px;">
47+
The results of AMD MI350 were tested with triton backend which is not fully optimized yet, and more optimizations with AMD AITER will be released soon.
48+
</span>
49+
4450
<img src="/images/blog/gpt_oss/combined_prefill_performance.svg" alt="combined_prefill_performance.svg" style="display:block; margin-left: auto; margin-right: auto; width: 75%"></img>
4551

4652
<img src="/images/blog/gpt_oss/combined_decode_performance.svg" alt="combined_decode_performance.svg" style="display:block; margin-left: auto; margin-right: auto; width: 75%"></img>
@@ -127,6 +133,6 @@ print(response.output_text)
127133

128134
None of the Day-0 support or the subsequent optimizations would have been possible without the collective effort of the SGLang community. Shout-out to the SGLang team, SpecForge team, FlashInfer team, Oracle team, Eigen AI team, NVIDIA team and AMD team for pushing this forward together!
129135

130-
We will continue pushing the boundaries of LLM inference. On our roadmap are further explorations into SWA (Sliding Window Attention) optimizations, along with new advances in speculative decoding, to deliver even greater performance gains.
136+
We will continue pushing the boundaries of LLM inference. On our roadmap are further explorations into SWA (Sliding Window Attention) optimizations, AMD AITER integration, along with new advances in speculative decoding, to deliver even greater performance gains.
131137

132138
We invite you to try the latest version of SGLang and share your feedback. Thank you for being an essential part of this journey!

public/images/blog/gpt_oss/combined_decode_performance.svg

Lines changed: 46 additions & 40 deletions
Loading

0 commit comments

Comments
 (0)