You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: blog/2025-10-29-sglang-jax.md
+3-17Lines changed: 3 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -67,23 +67,9 @@ We currently support Eagle2 and Eagle3, and plan to continue optimizing the kern
67
67
68
68
## TPU Performance
69
69
After all the optimizations, SGLang-Jax matches or outperforms other TPU inference solutions.
70
+
SGLang-Jax on TPU is also competitive when compared to GPU solutions.
70
71
71
-
### Setup
72
-
To highlight SGLang-Jax’s efficiency on TPUs, we benchmarked it against vLLM-TPU using the `Qwen/Qwen3-32B` model on a TPU v6e-4 host. We tested SGLang-Jax (commit: `main-af32f095880ff676ed23eec19bc79584b5e20717`) and vLLM-TPU (version `vllm-tpu==0.11.1`).
We also compared TPU v6e-4 against H100 GPUs using SGLang on both.
75
-
Full benchmark instructions are available [here](https://github.com/sgl-project/sglang-jax/issues/270).
76
-
77
-
### Results
78
-
On Qwen3-32B, SGLang-Jax and vLLM-TPU deliver nearly identical prefill performance, but SGLang-Jax pulls ahead slightly during decoding. Both use similar kernels, resulting in comparable input throughput. However, SGLang-Jax supports an overlap scheduler, which reduces ITL and boosts output throughput.
<pstyle="color:gray; text-align: center;">SGLang-Jax vs. vLLM-TPU on TPU v6e</p>
82
-
83
-
We also compared TPUs to GPUs by pitting four v6e chips against two H100s—a configuration that roughly aligns in price, HBM capacity, and peak bf16 TFLOPS. The TPU consistently achieves higher input throughput and outperforms on output throughput in several scenarios.
0 commit comments