Skip to content

Commit e526500

Browse files
committed
adds collapsible explanation section
1 parent ada1c84 commit e526500

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

solutions/security/ai/connect-to-vLLM.md

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,9 @@ The process involves four main steps:
4444
1. (Optional) If you plan to use a gated model (like Llama 3.1) or a private model, you need to create a [Hugging Face user access token](https://huggingface.co/docs/hub/en/security-tokens).
4545
1. Log in to your Hugging Face account.
4646
2. Navigate to **Settings > Access Tokens**.
47-
3. Create a new token with at least `read` permissions. Copy it to a secure location.
47+
3. Create a new token with at least `read` permissions. Save it in a secure location.
4848
2. Create an OpenAI-compatible secret token. Generate a strong, random string and save it in a secure location. You need the secret token to authenticate communication between {{ecloud}} and your Nginx reverse proxy.
49+
3. Install any necessary GPU drivers.
4950

5051
## Step 2: Run your vLLM container
5152

@@ -72,9 +73,11 @@ vllm/vllm-openai:v0.9.1 \
7273
--tensor-parallel-size 2
7374
```
7475

75-
::::{admonition} Explanation of command
76+
.**Click to expand an explanation of the command**
77+
[%collapsible]
78+
=====
7679
`--gpus all`: Exposes all available GPUs to the container.
77-
`--name`: Set predefined name for the container, otherwise it’s going to be generated
80+
`--name`: Defines a name for the container.
7881
`-v /root/.cache/huggingface:/root/.cache/huggingface`: Hugging Face cache directory (optional if used with `HUGGING_FACE_HUB_TOKEN`).
7982
`-e HUGGING_FACE_HUB_TOKEN`: Sets the environment variable for your Hugging Face token (only required for gated models).
8083
`--env VLLM_API_KEY`: vLLM API Key used for authentication between {{ecloud}} and vLLM.
@@ -86,4 +89,5 @@ vllm/vllm-openai:v0.9.1 \
8689
`-enable-auto-tool-choice`: Enables automatic function calling.
8790
`--gpu-memory-utilization 0.90`: Limits max GPU used by vLLM (may vary depending on the machine resources available).
8891
`--tensor-parallel-size 2`: This value should match the number of available GPUs (in this case, 2). This is critical for performance on multi-GPU systems.
89-
::::
92+
=====
93+

0 commit comments

Comments
 (0)