Update 2024-12-04-sglang-v0-4.md

merrymercy · web-flow · commit e5d8d11cd1a0 · 2024-12-04T21:05:44.000-08:00
diff --git a/blog/2024-12-04-sglang-v0-4.md b/blog/2024-12-04-sglang-v0-4.md
@@ -50,10 +50,10 @@ SGLang v0.4 introduces a cache-aware load balancer for LLM inference engines. Th
 
 Here are some benchmark results. The new cache-aware router significantly improves throughput.
 
-|  | SGLang v0.4 | SGLang v0.3 |
+|  | SGLang v0.3 | SGLang v0.4 |
 | :---- | :---- | :---- |
-| Throughput (token/s) | 158596 | 82665 |
-| Cache hit rate | 75% | 20% |
+| Throughput (token/s) | 82665 | 158596 |
+| Cache hit rate | 20% | 75% |
 
 > The benchmark is conducted on a [workload](https://github.com/sgl-project/sglang/pull/1990) that has multiple long prefix groups, and each group is perfectly balanced. The performance might vary based on the characteristics of the workload, but it should improve the cache hit rate significantly