You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Add pre-commit workflow
* Add actionlint
* Add generic hooks
* Add black, isort, shellcheck
* Add requirements and markdown linting
* Add toml
* Add Dockerfile
* Add codespell
* Use Node.js version of `markdownlint`
* Add `requirements-lint.txt`
* Use CLI version of Node.js `markdownlint`
* Add `pre-commit` instructions to `Contributing`
* `pre-commit run -a` automatic fixes
* Exclude helm templates from `check-yaml`
* Comment hooks that require installed tools
* Make `codespell` happy
* Make `actionlint` happy
* Disable `shellcheck` until it can be installed properly
* Make `markdownlint` happy
* Add note about running pre-commit
---------
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
Copy file name to clipboardExpand all lines: README.md
+18-9Lines changed: 18 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,12 @@
1
-
# vLLM Production Stack: reference stack for production vLLM deployment
2
-
1
+
# vLLM Production Stack: reference stack for production vLLM deployment
3
2
4
3
**vLLM Production Stack** project provides a reference implementation on how to build an inference stack on top of vLLM, which allows you to:
5
4
6
5
- 🚀 Scale from single vLLM instance to distributed vLLM deployment without changing any application code
7
6
- 💻 Monitor the through a web dashboard
8
7
- 😄 Enjoy the performance benefits brought by request routing and KV cache offloading
9
8
10
-
## Latest News:
9
+
## Latest News
11
10
12
11
- 🔥 vLLM Production Stack is released! Checkout our [release blogs](https://blog.lmcache.ai/2025-01-21-stack-release)[01-22-2025]
13
12
- ✨Join us at #production-stack channel of vLLM [slack](https://slack.vllm.ai/), LMCache [slack](https://join.slack.com/t/lmcacheworkspace/shared_invite/zt-2viziwhue-5Amprc9k5hcIdXT7XevTaQ), or fill out this [interest form](https://forms.gle/wSoeNpncmPVdXppg8) for a chat!
@@ -20,7 +19,6 @@ The stack is set up using [Helm](https://helm.sh/docs/), and contains the follow
20
19
-**Request router**: Directs requests to appropriate backends based on routing keys or session IDs to maximize KV cache reuse.
21
20
-**Observability stack**: monitors the metrics of the backends through [Prometheus](https://github.com/prometheus/prometheus) + [Grafana](https://grafana.com/)
22
21
23
-
24
22
<imgsrc="https://github.com/user-attachments/assets/8f05e7b9-0513-40a9-9ba9-2d3acca77c0c"alt="Architecture of the stack"width="800"/>
25
23
26
24
## Roadmap
@@ -42,6 +40,7 @@ We are actively working on this project and will release the following features
42
40
### Deployment
43
41
44
42
vLLM Production Stack can be deployed via helm charts. Clone the repo to local and execute the following commands for a minimal deployment:
@@ -55,21 +54,18 @@ To validate the installation and and send query to the stack, refer to [this tut
55
54
56
55
For more information about customizing the helm chart, please refer to [values.yaml](https://github.com/vllm-project/production-stack/blob/main/helm/values.yaml) and our other [tutorials](https://github.com/vllm-project/production-stack/tree/main/tutorials).
57
56
58
-
59
57
### Uninstall
60
58
61
59
```bash
62
60
sudo helm uninstall vllm
63
61
```
64
62
65
-
66
63
## Grafana Dashboard
67
64
68
65
### Features
69
66
70
67
The Grafana dashboard provides the following insights:
71
68
72
-
73
69
1.**Available vLLM Instances**: Displays the number of healthy instances.
0 commit comments