Minor

WoosukKwon · WoosukKwon · commit 7e26b212bde9 · 2025-01-26T17:17:10.000-08:00
Signed-off-by: WoosukKwon &lt;woosuk.kwon@berkeley.edu&gt;
diff --git a/_posts/2025-01-26-v1.md b/_posts/2025-01-26-v1.md
@@ -56,7 +56,7 @@ vLLM V1 introduces a simple yet flexible scheduler. It removes the traditional d
 
 ## 3. Fast Prefix Caching
 
-vLLM V1, like V0, uses hash-based prefix caching and LRU-based cache eviction. In V0, enabling prefix caching sometimes causes significant CPU overhead, leading to performance degradation with a low cache hit rate. As a result, it is disabled by default. In V1, we optimize the data structure for constant-time eviction and carefully minimize Python object creation overhead. This makes V1’s prefix caching introduce near-zero performance degradation, even when the cache hit rate is 0%. Prefix caching is now enabled by default in V1.
+vLLM V1, like V0, uses hash-based prefix caching and LRU-based cache eviction. In V0, enabling prefix caching sometimes causes significant CPU overhead, leading to rather decreased performance when the cache hit rate is low. As a result, it is disabled by default. In V1, we optimize the data structure for constant-time cache eviction and carefully minimize Python object creation overhead. This makes V1’s prefix caching introduce near-zero performance degradation, even when the cache hit rate is 0%. **Thanks to this change, we now enable prefix caching by default in V1.**
 
 ## 4. Clean Architecture for Tensor-Parallel Inference