You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: solutions/security/ai/connect-to-vLLM.md
+61-2Lines changed: 61 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -41,7 +41,7 @@ The process involves four main steps:
41
41
42
42
## Step 1: Configure your host server
43
43
44
-
1. (Optional) If you plan to use a gated model (like Llama 3.1) or a private model, you need to create a [Hugging Face user access token](https://huggingface.co/docs/hub/en/security-tokens).
44
+
1. (Optional) If you plan to use a gated model (such as Llama 3.1) or a private model, create a [Hugging Face user access token](https://huggingface.co/docs/hub/en/security-tokens).
45
45
1. Log in to your Hugging Face account.
46
46
2. Navigate to **Settings > Access Tokens**.
47
47
3. Create a new token with at least `read` permissions. Save it in a secure location.
@@ -73,7 +73,7 @@ vllm/vllm-openai:v0.9.1 \
73
73
--tensor-parallel-size 2
74
74
```
75
75
76
-
.**Click to expand an explanation of the command**
76
+
.**Click to expand a full explanation of the command**
77
77
[%collapsible]
78
78
=====
79
79
`--gpus all`: Exposes all available GPUs to the container.
@@ -91,3 +91,62 @@ vllm/vllm-openai:v0.9.1 \
91
91
`--tensor-parallel-size 2`: This value should match the number of available GPUs (in this case, 2). This is critical for performance on multi-GPU systems.
92
92
=====
93
93
94
+
3. Verify the containers were created by running `docker ps -a`. The output should show the value you specified for the `--name` parameter.
95
+
96
+
## Step 3: Expose the API with a reverse proxy
97
+
98
+
This example uses Nginx to create a reverse proxy. This improves stability and enables monitoring by means of Elastic's native Nginx integration. The following example configuration forwards traffic to the vLLM container and uses a secret token for authentication.
99
+
100
+
1. Install Nginx on your server.
101
+
2. Create a configuration file, for example at `/etc/nginx/sites-available/default`. Give it the following content:
0 commit comments