Skip to content

Commit 7e26b21

Browse files
committed
Minor
Signed-off-by: WoosukKwon <[email protected]>
1 parent 80f404a commit 7e26b21

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2025-01-27-v1.md renamed to _posts/2025-01-26-v1.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ vLLM V1 introduces a simple yet flexible scheduler. It removes the traditional d
5656

5757
## 3. Fast Prefix Caching
5858

59-
vLLM V1, like V0, uses hash-based prefix caching and LRU-based cache eviction. In V0, enabling prefix caching sometimes causes significant CPU overhead, leading to performance degradation with a low cache hit rate. As a result, it is disabled by default. In V1, we optimize the data structure for constant-time eviction and carefully minimize Python object creation overhead. This makes V1’s prefix caching introduce near-zero performance degradation, even when the cache hit rate is 0%. Prefix caching is now enabled by default in V1.
59+
vLLM V1, like V0, uses hash-based prefix caching and LRU-based cache eviction. In V0, enabling prefix caching sometimes causes significant CPU overhead, leading to rather decreased performance when the cache hit rate is low. As a result, it is disabled by default. In V1, we optimize the data structure for constant-time cache eviction and carefully minimize Python object creation overhead. This makes V1’s prefix caching introduce near-zero performance degradation, even when the cache hit rate is 0%. **Thanks to this change, we now enable prefix caching by default in V1.**
6060

6161
## 4. Clean Architecture for Tensor-Parallel Inference
6262

0 commit comments

Comments
 (0)