You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2025-01-26-v1-alpha-release.md
+19-19Lines changed: 19 additions & 19 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,7 @@ The final piece of the puzzle for vLLM V1 was integrating [FlashAttention 3](htt
112
112
113
113
# Performance
114
114
115
-
Thanks to the extensive architectural enhancements, vLLM V1 achieves state-of-the-art throughput and latency, delivering up to **1.7x** higher throughput compared to V0 (*without multi-step scheduling*).
115
+
Thanks to the extensive architectural enhancements, vLLM V1 achieves state-of-the-art throughput and latency, delivering up to **1.7x higher throughput** compared to V0 (*without multi-step scheduling*).
116
116
These dramatic performance gains stem from comprehensive CPU overhead reductions across the entire stack.
117
117
The improvements are even more pronounced for vision-language models (VLMs) like Qwen2-VL, thanks to V1's enhanced support for VLMs.
118
118
@@ -176,25 +176,25 @@ We gratefully acknowledge that the design of vLLM V1 builds upon and enhances se
176
176
177
177
The V1 re-architecture is a continued joint effort across the entire vLLM team and community. Below is an incomplete list of contributors to this milestone:
178
178
179
-
- UC Berkeley, Neural Magic (now Red Hat), Anyscale, and Roblox mainly drove the effort together.
180
-
-[Woosuk Kwon](https://github.com/WoosukKwon) initiated the project and implemented the scheduler and model runner.
181
-
-[Robert Shaw](https://github.com/robertgshaw2-redhat) implemented the optimized execution loop and API server.
182
-
-[Cody Yu](https://github.com/comaniac) implemented efficient prefix caching for text and image inputs.
183
-
-[Roger Wang](https://github.com/ywang96) led the overall enhanced MLLM support in V1.
184
-
-[Kaichao You](https://github.com/youkaichao) led the torch.compile integration and implemented the piecewise CUDA graphs.
185
-
-[Tyler Michael Smith](https://github.com/tlrmchlsmth) implemented the tensor parallelism support with Python multiprocessing.
179
+
- UC Berkeley, Neural Magic (now Red Hat), Anyscale, and Roblox mainly drove the effort together.
180
+
-[Woosuk Kwon](https://github.com/WoosukKwon) initiated the project and implemented the scheduler and model runner.
181
+
-[Robert Shaw](https://github.com/robertgshaw2-redhat) implemented the optimized execution loop and API server.
182
+
-[Cody Yu](https://github.com/comaniac) implemented efficient prefix caching for text and image inputs.
183
+
-[Roger Wang](https://github.com/ywang96) led the overall enhanced MLLM support in V1.
184
+
-[Kaichao You](https://github.com/youkaichao) led the torch.compile integration and implemented the piecewise CUDA graphs.
185
+
-[Tyler Michael Smith](https://github.com/tlrmchlsmth) implemented the tensor parallelism support with Python multiprocessing.
186
186
-[Rui Qiao](https://github.com/ruisearch42) implemented the tensor parallelism support with Ray and is implementing pipeline parallelism support.
187
187
-[Lucas Wilkinson](https://github.com/LucasWilkinson) added support for FlashAttention 3.
188
-
-[Alexander Matveev](https://github.com/alexm-redhat) implemented the optimized preprocessor for multimodal inputs.
189
-
-[Sourashis Roy](https://github.com/sroy745) implemented the logit penalties in the sampler.
190
-
-[Cyrus Leung](https://github.com/DarkLight1337) led the MLLM input processing refactoring effort and helped its integration to V1.
191
-
-[Russell Bryant](https://github.com/russellb) addressed several multiprocess-related issues.
192
-
-[Nick Hill](https://github.com/njhill) optimized the engine loop and API server.
193
-
-[Ricky Xu](https://github.com/rickyyx) and [Chen Zhang](https://github.com/heheda12345) helped refactor the KV cache manager.
194
-
-[Jie Li](https://github.com/jeejeelee) and [Michael Goin](https://github.com/mgoin) helped with MLLM support and optimization.
188
+
-[Alexander Matveev](https://github.com/alexm-redhat) implemented the optimized preprocessor for multimodal inputs and is implementing TPU support.
189
+
-[Sourashis Roy](https://github.com/sroy745) implemented the logit penalties in the sampler.
190
+
-[Cyrus Leung](https://github.com/DarkLight1337) led the MLLM input processing refactoring effort and helped its integration to V1.
191
+
-[Russell Bryant](https://github.com/russellb) addressed several multiprocess-related issues.
192
+
-[Nick Hill](https://github.com/njhill) optimized the engine loop and API server.
193
+
-[Ricky Xu](https://github.com/rickyyx) and [Chen Zhang](https://github.com/heheda12345) helped refactor the KV cache manager.
194
+
-[Jie Li](https://github.com/jeejeelee) and [Michael Goin](https://github.com/mgoin) helped with MLLM support and optimization.
195
195
-[Aaron Pham](https://github.com/aarnphm) is implementing the structured decoding support.
196
-
-[Varun Sundar Rabindranath](https://github.com/varun-sundar-rabindranath) is implementing the multi-LoRA support.
197
-
-[Andrew Feldman](https://github.com/afeldman-nm) is implementing the log probs and prompt log probs support.
198
-
-[Lily Liu](https://github.com/LiuXiaoxuanPKU) is implementing the speculative decoding support.
199
-
-[Kuntai Du](https://github.com/KuntaiDu) is implementing the prefill disaggregation and KV Cache transfer support.
196
+
-[Varun Sundar Rabindranath](https://github.com/varun-sundar-rabindranath) is implementing the multi-LoRA support.
197
+
-[Andrew Feldman](https://github.com/afeldman-nm) is implementing the log probs and prompt log probs support.
198
+
-[Lily Liu](https://github.com/LiuXiaoxuanPKU) is implementing the speculative decoding support.
199
+
-[Kuntai Du](https://github.com/KuntaiDu) is implementing the prefill disaggregation and KV Cache transfer support.
200
200
-[Simon Mo](https://github.com/simon-mo) and [Zhuohan Li](https://github.com/zhuohan123) contributed to the V1 system design.
0 commit comments