Skip to content

Commit e83a5be

Browse files
committed
Minor
Signed-off-by: WoosukKwon <[email protected]>
1 parent f90c180 commit e83a5be

File tree

1 file changed

+19
-19
lines changed

1 file changed

+19
-19
lines changed

_posts/2025-01-27-v1.md renamed to _posts/2025-01-26-v1-alpha-release.md

Lines changed: 19 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ The final piece of the puzzle for vLLM V1 was integrating [FlashAttention 3](htt
112112

113113
# Performance
114114

115-
Thanks to the extensive architectural enhancements, vLLM V1 achieves state-of-the-art throughput and latency, delivering up to **1.7x** higher throughput compared to V0 (*without multi-step scheduling*).
115+
Thanks to the extensive architectural enhancements, vLLM V1 achieves state-of-the-art throughput and latency, delivering up to **1.7x higher throughput** compared to V0 (*without multi-step scheduling*).
116116
These dramatic performance gains stem from comprehensive CPU overhead reductions across the entire stack.
117117
The improvements are even more pronounced for vision-language models (VLMs) like Qwen2-VL, thanks to V1's enhanced support for VLMs.
118118

@@ -176,25 +176,25 @@ We gratefully acknowledge that the design of vLLM V1 builds upon and enhances se
176176

177177
The V1 re-architecture is a continued joint effort across the entire vLLM team and community. Below is an incomplete list of contributors to this milestone:
178178

179-
- UC Berkeley, Neural Magic (now Red Hat), Anyscale, and Roblox mainly drove the effort together.
180-
- [Woosuk Kwon](https://github.com/WoosukKwon) initiated the project and implemented the scheduler and model runner.
181-
- [Robert Shaw](https://github.com/robertgshaw2-redhat) implemented the optimized execution loop and API server.
182-
- [Cody Yu](https://github.com/comaniac) implemented efficient prefix caching for text and image inputs.
183-
- [Roger Wang](https://github.com/ywang96) led the overall enhanced MLLM support in V1.
184-
- [Kaichao You](https://github.com/youkaichao) led the torch.compile integration and implemented the piecewise CUDA graphs.
185-
- [Tyler Michael Smith](https://github.com/tlrmchlsmth) implemented the tensor parallelism support with Python multiprocessing.
179+
- UC Berkeley, Neural Magic (now Red Hat), Anyscale, and Roblox mainly drove the effort together.
180+
- [Woosuk Kwon](https://github.com/WoosukKwon) initiated the project and implemented the scheduler and model runner.
181+
- [Robert Shaw](https://github.com/robertgshaw2-redhat) implemented the optimized execution loop and API server.
182+
- [Cody Yu](https://github.com/comaniac) implemented efficient prefix caching for text and image inputs.
183+
- [Roger Wang](https://github.com/ywang96) led the overall enhanced MLLM support in V1.
184+
- [Kaichao You](https://github.com/youkaichao) led the torch.compile integration and implemented the piecewise CUDA graphs.
185+
- [Tyler Michael Smith](https://github.com/tlrmchlsmth) implemented the tensor parallelism support with Python multiprocessing.
186186
- [Rui Qiao](https://github.com/ruisearch42) implemented the tensor parallelism support with Ray and is implementing pipeline parallelism support.
187187
- [Lucas Wilkinson](https://github.com/LucasWilkinson) added support for FlashAttention 3.
188-
- [Alexander Matveev](https://github.com/alexm-redhat) implemented the optimized preprocessor for multimodal inputs.
189-
- [Sourashis Roy](https://github.com/sroy745) implemented the logit penalties in the sampler.
190-
- [Cyrus Leung](https://github.com/DarkLight1337) led the MLLM input processing refactoring effort and helped its integration to V1.
191-
- [Russell Bryant](https://github.com/russellb) addressed several multiprocess-related issues.
192-
- [Nick Hill](https://github.com/njhill) optimized the engine loop and API server.
193-
- [Ricky Xu](https://github.com/rickyyx) and [Chen Zhang](https://github.com/heheda12345) helped refactor the KV cache manager.
194-
- [Jie Li](https://github.com/jeejeelee) and [Michael Goin](https://github.com/mgoin) helped with MLLM support and optimization.
188+
- [Alexander Matveev](https://github.com/alexm-redhat) implemented the optimized preprocessor for multimodal inputs and is implementing TPU support.
189+
- [Sourashis Roy](https://github.com/sroy745) implemented the logit penalties in the sampler.
190+
- [Cyrus Leung](https://github.com/DarkLight1337) led the MLLM input processing refactoring effort and helped its integration to V1.
191+
- [Russell Bryant](https://github.com/russellb) addressed several multiprocess-related issues.
192+
- [Nick Hill](https://github.com/njhill) optimized the engine loop and API server.
193+
- [Ricky Xu](https://github.com/rickyyx) and [Chen Zhang](https://github.com/heheda12345) helped refactor the KV cache manager.
194+
- [Jie Li](https://github.com/jeejeelee) and [Michael Goin](https://github.com/mgoin) helped with MLLM support and optimization.
195195
- [Aaron Pham](https://github.com/aarnphm) is implementing the structured decoding support.
196-
- [Varun Sundar Rabindranath](https://github.com/varun-sundar-rabindranath) is implementing the multi-LoRA support.
197-
- [Andrew Feldman](https://github.com/afeldman-nm) is implementing the log probs and prompt log probs support.
198-
- [Lily Liu](https://github.com/LiuXiaoxuanPKU) is implementing the speculative decoding support.
199-
- [Kuntai Du](https://github.com/KuntaiDu) is implementing the prefill disaggregation and KV Cache transfer support.
196+
- [Varun Sundar Rabindranath](https://github.com/varun-sundar-rabindranath) is implementing the multi-LoRA support.
197+
- [Andrew Feldman](https://github.com/afeldman-nm) is implementing the log probs and prompt log probs support.
198+
- [Lily Liu](https://github.com/LiuXiaoxuanPKU) is implementing the speculative decoding support.
199+
- [Kuntai Du](https://github.com/KuntaiDu) is implementing the prefill disaggregation and KV Cache transfer support.
200200
- [Simon Mo](https://github.com/simon-mo) and [Zhuohan Li](https://github.com/zhuohan123) contributed to the V1 system design.

0 commit comments

Comments
 (0)