Skip to content

Commit e5d8d11

Browse files
authored
Update 2024-12-04-sglang-v0-4.md
1 parent c8238b6 commit e5d8d11

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

blog/2024-12-04-sglang-v0-4.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,10 @@ SGLang v0.4 introduces a cache-aware load balancer for LLM inference engines. Th
5050

5151
Here are some benchmark results. The new cache-aware router significantly improves throughput.
5252

53-
| | SGLang v0.4 | SGLang v0.3 |
53+
| | SGLang v0.3 | SGLang v0.4 |
5454
| :---- | :---- | :---- |
55-
| Throughput (token/s) | 158596 | 82665 |
56-
| Cache hit rate | 75% | 20% |
55+
| Throughput (token/s) | 82665 | 158596 |
56+
| Cache hit rate | 20% | 75% |
5757

5858
> The benchmark is conducted on a [workload](https://github.com/sgl-project/sglang/pull/1990) that has multiple long prefix groups, and each group is perfectly balanced. The performance might vary based on the characteristics of the workload, but it should improve the cache hit rate significantly
5959

0 commit comments

Comments
 (0)