You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: solutions/security/ai/connect-to-vLLM.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -73,9 +73,12 @@ vllm/vllm-openai:v0.9.1 \
73
73
--tensor-parallel-size 2
74
74
```
75
75
76
-
.**Click to expand a full explanation of the command**
76
+
77
+
.Click to expand a full explanation of the command
77
78
[%collapsible]
78
79
=====
80
+
81
+
```
79
82
`--gpus all`: Exposes all available GPUs to the container.
80
83
`--name`: Defines a name for the container.
81
84
`-v /root/.cache/huggingface:/root/.cache/huggingface`: Hugging Face cache directory (optional if used with `HUGGING_FACE_HUB_TOKEN`).
@@ -89,6 +92,8 @@ vllm/vllm-openai:v0.9.1 \
89
92
`-enable-auto-tool-choice`: Enables automatic function calling.
90
93
`--gpu-memory-utilization 0.90`: Limits max GPU used by vLLM (may vary depending on the machine resources available).
91
94
`--tensor-parallel-size 2`: This value should match the number of available GPUs (in this case, 2). This is critical for performance on multi-GPU systems.
95
+
```
96
+
92
97
=====
93
98
94
99
3. Verify the container's status by running the `docker ps -a` command. The output should show the value you specified for the `--name` parameter.
0 commit comments