Skip to content

Commit 4577c6a

Browse files
committed
retro add images
1 parent 7588b8b commit 4577c6a

6 files changed

+6
-0
lines changed

_posts/2023-06-20-vllm.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ layout: post
33
title: "vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention"
44
author: "Woosuk Kwon*, Zhuohan Li*, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Yu, Joey Gonzalez, Hao Zhang, and Ion Stoica (* Equal Contribution)"
55
extra: "<br><p align=\"center\"><picture><img src=\"/assets/logos/vllm-logo-text-light.png\" width=\"65%\"></picture></p><br>"
6+
image: /assets/logos/vllm-logo-text-light.png
67
---
78
<p align="center" style="margin-top:-15px">
89
<a href="https://github.com/vllm-project/vllm"><b>GitHub</b></a> | <a href="https://vllm.readthedocs.io/en/latest/"><b>Documentation</b></a> | <a href="https://arxiv.org/pdf/2309.06180.pdf"><b>Paper</b></a>

_posts/2023-11-14-notes-vllm-vs-deepspeed.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
layout: post
33
title: "Notes on vLLM v.s. DeepSpeed-FastGen"
44
author: "vLLM Team"
5+
image: /assets/figures/notes-vllm-vs-deepspeed/s2.png
56
---
67

78
---

_posts/2024-07-23-llama31.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
layout: post
33
title: "Announcing Llama 3.1 Support in vLLM"
44
author: "vLLM Team"
5+
image: /assets/figures/llama31/perf_llama3.png
56
---
67

78
Today, the vLLM team is excited to partner with Meta to announce the support for the Llama 3.1 model series. Llama 3.1 comes with exciting new features with longer context length (up to 128K tokens), larger model size (up to 405B parameters), and more advanced model capabilities. The vLLM community has added many enhancements to make sure the longer, larger Llamas run smoothly on vLLM, which includes chunked prefill, FP8 quantization, and pipeline parallelism. We will introduce these new enhancements in this blogpost.

_posts/2024-07-25-lfai-perf.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
layout: post
33
title: "vLLM’s Open Governance and Performance Roadmap"
44
author: "vLLM Team"
5+
image: /assets/figures/lfai/vllm-lfai-light.png
56
---
67

78

_posts/2024-09-05-perf-update.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
layout: post
33
title: "vLLM v0.6.0: 2.7x Throughput Improvement and 5x Latency Reduction"
44
author: "vLLM Team"
5+
image: /assets/figures/perf-v060/llama8B_comparison.png
56
---
67

78
**TL;DR:** vLLM achieves 2.7x higher throughput and 5x faster TPOT (time per output token) on Llama 8B model, and 1.8x higher throughput and 2x less TPOT on Llama 70B model.

_posts/2024-10-17-spec-decode.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
layout: post
33
title: "How Speculative Decoding Boosts vLLM Performance by up to 2.8x"
44
author: "vLLM Team"
5+
image: /assets/figures/spec-decode/figure9.png
56
---
67

78
Speculative decoding in vLLM is a powerful technique that accelerates token generation by leveraging both small and large models in tandem. In this blog, we’ll break down speculative decoding in vLLM, how it works, and the performance improvements it brings.

0 commit comments

Comments
 (0)