You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For high-throughput applications, SGLang delivers significant performance gains over our initial Day 0 support and have shown great performance on both prefill and decode on different hardwares.
43
43
44
+
<!-- grey text -->
45
+
46
+
<spanstyle="color: grey; font-size: 12px;">
47
+
The results of AMD MI350 were tested with triton backend which is not fully optimized yet, and more optimizations with AMD AITER will be released soon.
None of the Day-0 support or the subsequent optimizations would have been possible without the collective effort of the SGLang community. Shout-out to the SGLang team, SpecForge team, FlashInfer team, Oracle team, Eigen AI team, NVIDIA team and AMD team for pushing this forward together!
129
135
130
-
We will continue pushing the boundaries of LLM inference. On our roadmap are further explorations into SWA (Sliding Window Attention) optimizations, along with new advances in speculative decoding, to deliver even greater performance gains.
136
+
We will continue pushing the boundaries of LLM inference. On our roadmap are further explorations into SWA (Sliding Window Attention) optimizations, AMD AITER integration, along with new advances in speculative decoding, to deliver even greater performance gains.
131
137
132
138
We invite you to try the latest version of SGLang and share your feedback. Thank you for being an essential part of this journey!
0 commit comments