Skip to content

Commit 4ce881b

Browse files
committed
Update connect-to-vLLM.md
1 parent e526500 commit 4ce881b

File tree

1 file changed

+61
-2
lines changed

1 file changed

+61
-2
lines changed

solutions/security/ai/connect-to-vLLM.md

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ The process involves four main steps:
4141

4242
## Step 1: Configure your host server
4343

44-
1. (Optional) If you plan to use a gated model (like Llama 3.1) or a private model, you need to create a [Hugging Face user access token](https://huggingface.co/docs/hub/en/security-tokens).
44+
1. (Optional) If you plan to use a gated model (such as Llama 3.1) or a private model, create a [Hugging Face user access token](https://huggingface.co/docs/hub/en/security-tokens).
4545
1. Log in to your Hugging Face account.
4646
2. Navigate to **Settings > Access Tokens**.
4747
3. Create a new token with at least `read` permissions. Save it in a secure location.
@@ -73,7 +73,7 @@ vllm/vllm-openai:v0.9.1 \
7373
--tensor-parallel-size 2
7474
```
7575

76-
.**Click to expand an explanation of the command**
76+
.**Click to expand a full explanation of the command**
7777
[%collapsible]
7878
=====
7979
`--gpus all`: Exposes all available GPUs to the container.
@@ -91,3 +91,62 @@ vllm/vllm-openai:v0.9.1 \
9191
`--tensor-parallel-size 2`: This value should match the number of available GPUs (in this case, 2). This is critical for performance on multi-GPU systems.
9292
=====
9393

94+
3. Verify the containers were created by running `docker ps -a`. The output should show the value you specified for the `--name` parameter.
95+
96+
## Step 3: Expose the API with a reverse proxy
97+
98+
This example uses Nginx to create a reverse proxy. This improves stability and enables monitoring by means of Elastic's native Nginx integration. The following example configuration forwards traffic to the vLLM container and uses a secret token for authentication.
99+
100+
1. Install Nginx on your server.
101+
2. Create a configuration file, for example at `/etc/nginx/sites-available/default`. Give it the following content:
102+
103+
```
104+
server {
105+
listen 80;
106+
server_name <yourdomainname.com>;
107+
return 301 https://$server_name$request_uri;
108+
}
109+
110+
server {
111+
listen 443 ssl http2;
112+
server_name <yourdomainname.com>;
113+
114+
ssl_certificate /etc/letsencrypt/live/<yourdomainname.com>/fullchain.pem;
115+
ssl_certificate_key /etc/letsencrypt/live/<yourdomainname.com>/privkey.pem;
116+
117+
location / {
118+
if ($http_authorization != "Bearer <secret token>") {
119+
return 401;
120+
}
121+
proxy_pass http://localhost:8000/;
122+
}
123+
}
124+
```
125+
126+
3. Enable and restart Nginx to apply the configuration.
127+
128+
:::{note}
129+
For quick testing, you can use [ngrok](https://ngrok.com/) as an alternative to Nginx, but it is not recommended for production use.
130+
:::
131+
132+
## Step 4: Configure the connector in your elastic deployment
133+
134+
Finally, create the connector within your Elastic deployment to link it to your vLLM instance.
135+
136+
1. Log in to {{kib}}.
137+
2. Navigate to the **Connectors** page, click **Create Connector**, and select **OpenAI**.
138+
3. Give the connector a descriptive name, such as `vLLM - Mistral Small 3.2`.
139+
4. In **Connector settings**, configure the following:
140+
* For **Select an OpenAI provider**, select **Other (OpenAI Compatible Service)**.
141+
* For **URL**, enter your server's public URL followed by `/v1/chat/completions`.
142+
5. For **Default Model**, enter `mistralai/Mistral-Small-3.2-24B-Instruct-2506` or the model ID you used during setup.
143+
6. For **Authentication**, configure the following:
144+
* For **API key**, enter the secret token you created in Step 1 and specified in your Nginx configuration file.
145+
* If your chosen model supports tool use, then turn on **Enable native function calling**.
146+
7. Click **Save**
147+
148+
Setup is now complete. The model served by your vLLM container can now power Elastic's generative AI features, such as the AI Assistant.
149+
150+
:::{note}
151+
To run a different model, stop the current container and run a new one with an updated `--model` parameter.
152+
:::

0 commit comments

Comments
 (0)