You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: site-src/guides/index.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,10 +4,6 @@
4
4
5
5
This project is still in an alpha state and breaking changes may occur in the future.
6
6
7
-
???+ warning
8
-
9
-
10
-
This page is out of date with the v1.0.0 release candidate. Updates under active development
11
7
12
8
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get an Inference Gateway up and running!
13
9
@@ -53,6 +49,10 @@ Tooling:
53
49
54
50
=== "CPU-Based Model Server"
55
51
52
+
???+ warning
53
+
54
+
CPU deployment can be unreliable i.e. the pods may crash/restart because of resource contraints.
55
+
56
56
This setup is using the formal `vllm-cpu` image, which according to the documentation can run vLLM on x86 CPU platform.
57
57
For this setup, we use approximately 9.5GB of memory and 12 CPUs for each replica.
0 commit comments