Skip to content

Commit e890f28

Browse files
committed
Fix
Signed-off-by: WoosukKwon <[email protected]>
1 parent 02d5e05 commit e890f28

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

_posts/2025-01-24-v1.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,8 @@ The final piece of the puzzle for vLLM V1 was integrating [FlashAttention 3](htt
104104

105105
# Performance
106106

107+
Thanks to the extensive improvements in vLLM V1, we have observed significant performance gains across various models and hardware backends. Here are some key highlights:
108+
107109
# Limitations & Future Work
108110

109111
While vLLM V1 shows promising results, it is still in its alpha stage and lacks several features from V0. Here’s a clarification:
@@ -115,7 +117,7 @@ V1 supports decoder-only Transformers like Llama, mixture-of-experts (MoE) model
115117
V1 currently lacks support for log probs, prompt log probs sampling parameters, pipeline parallelism, structured decoding, speculative decoding, prometheus metrics, and LoRA. We are actively working to close this feature gap and add new optimizations. Please stay tuned!
116118

117119
**Hardware Support:**
118-
V1 currently supports only Ampere or later NVIDIA GPUs. We are working on support for other hardware backends such as TPU.
120+
V1 currently supports only Ampere or later NVIDIA GPUs. We are actively working to extend support to other hardware backends such as TPU.
119121

120122
Finally, please note that you can continue using V0 and maintain backward compatibility by not setting `VLLM_USE_V1=1`.
121123

0 commit comments

Comments
 (0)