Skip to content

Commit 941fb04

Browse files
authored
Update 2025-01-21-stack-release.md
1 parent 46d4516 commit 941fb04

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

_posts/2025-01-21-stack-release.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
---
22
layout: post
33
title: "High Performance and Easy Deployment of vLLM in K8S with “vLLM production-stack”"
4-
thumbnail-img: /assets/figure/stack/stack-thumbnail.png
5-
share-img: /assets/figure/stack/stack-thumbnail.png
4+
thumbnail-img: /assets/figures/stack/stack-thumbnail.png
5+
share-img: /assets/figures/stack/stack-thumbnail.png
66
author: LMCache Team
7-
image: /assets/figure/stack/stack-thumbnail.png
7+
image: /assets/figures/stack/stack-thumbnail.png
88
---
99
<br>
1010

@@ -27,7 +27,7 @@ image: /assets/figure/stack/stack-thumbnail.png
2727
How do we extend its power into a **full-stack** inference system that any organization can deploy at scale with *high reliability*, *high throughput*, and *low latency*? That’s precisely why the LMCache team and the vLLM team built **vLLM production-stack**.
2828

2929
<div align="center">
30-
<img src="/assets/figure/stack/stack-thumbnail.png" alt="Icon" style="width: 60%; vertical-align:middle;">
30+
<img src="/assets/figures/stack/stack-thumbnail.png" alt="Icon" style="width: 60%; vertical-align:middle;">
3131
</div>
3232

3333
# Introducing "*vLLM Production-Stack*"
@@ -41,7 +41,7 @@ How do we extend its power into a **full-stack** inference system that any organ
4141

4242
Below is a quick snapshot comparing vLLM production-stack with its closest counterparts:
4343
<div align="center">
44-
<img src="/assets/figure/stack/stack-table.png" alt="Icon" style="width: 90%; vertical-align:middle;">
44+
<img src="/assets/figures/stack/stack-table.png" alt="Icon" style="width: 90%; vertical-align:middle;">
4545
</div>
4646

4747
### The Design
@@ -54,7 +54,7 @@ At a high level:
5454
- Observability modules gather metrics like TTFT (Time-To-First-Token), TBT (Time-Between-Tokens), and throughput, giving you real-time insights into your system’s health.
5555

5656
<div align="center">
57-
<img src="/assets/figure/stack/stack-overview-2.png" alt="Icon" style="width: 90%; vertical-align:middle;">
57+
<img src="/assets/figures/stack/stack-overview-2.png" alt="Icon" style="width: 90%; vertical-align:middle;">
5858
</div>
5959

6060
# Advantage #1: Easy Deployment
@@ -72,18 +72,18 @@ We conduct a benchmark of multi-round Q&A workload on vLLM production-stack and
7272
The results show vLLM stack outperforms other setups across key metrics (time to first token and inter token latency).
7373

7474
<div align="center">
75-
<img src="/assets/figure/stack/stack-ttft.png" alt="Icon" style="width: 60%; vertical-align:middle;">
75+
<img src="/assets/figures/stack/stack-ttft.png" alt="Icon" style="width: 60%; vertical-align:middle;">
7676
</div>
7777

7878
<div align="center">
79-
<img src="/assets/figure/stack/stack-itl.png" alt="Icon" style="width: 60%; vertical-align:middle;">
79+
<img src="/assets/figures/stack/stack-itl.png" alt="Icon" style="width: 60%; vertical-align:middle;">
8080
</div>
8181

8282
# Advantage #3: Effortless Monitoring
8383
Keep real-time tracking of your LLM inference cluster with key metrics including latency distributions, number of requests over time, KV cache hit rate.
8484

8585
<div align="center">
86-
<img src="/assets/figure/stack/stack-panel.png" alt="Icon" style="width: 70%; vertical-align:middle;">
86+
<img src="/assets/figures/stack/stack-panel.png" alt="Icon" style="width: 70%; vertical-align:middle;">
8787
</div>
8888

8989

0 commit comments

Comments
 (0)