Skip to content

Commit 1b9ffd6

Browse files
committed
openai
1 parent c7a7fc9 commit 1b9ffd6

File tree

1 file changed

+368
-0
lines changed

1 file changed

+368
-0
lines changed
Lines changed: 368 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,368 @@
1+
---
2+
title: OpenWebUI + Olla (OpenAI API) Integration
3+
description: Configure OpenWebUI to talk to multiple OpenAI‑compatible backends via Olla’s /olla/openai/v1 proxy. Load‑balancing, failover, streaming, and model unification for vLLM, SGLang, and other OpenAI‑compatible servers.
4+
keywords: OpenWebUI, Olla, OpenAI API, vLLM, SGLang, LM Studio, load balancing, model unification
5+
---
6+
7+
# OpenWebUI Integration with OpenAI
8+
9+
OpenWebUI can speak to any OpenAI‑compatible endpoint. Olla sits in front as a smart proxy, exposing a single **OpenAI API base** that merges multiple backends (e.g. vLLM, SGLang) and handles load‑balancing + failover.
10+
11+
**Set in OpenWebUI:**
12+
13+
```bash
14+
export OPENAI_API_BASE_URL="http://localhost:40114/olla/openai/v1"
15+
```
16+
17+
**What you get via Olla**
18+
19+
* One stable OpenAI base URL for all backends
20+
* Priority/least‑connections load‑balancing and health checks
21+
* Streaming passthrough
22+
* Unified `/v1/models` across providers
23+
24+
## Overview
25+
26+
<table>
27+
<tr>
28+
<th>Project</th>
29+
<td><a href="https://github.com/open-webui/open-webui">github.com/open-webui/open-webui</a></td>
30+
</tr>
31+
<tr>
32+
<th>Integration Type</th>
33+
<td>Frontend UI</td>
34+
</tr>
35+
<tr>
36+
<th>Connection Method</th>
37+
<td>Open API Compatibility</td>
38+
</tr>
39+
<tr>
40+
<th>
41+
Features Supported <br/>
42+
<small>(via Olla)</small>
43+
</th>
44+
<td>
45+
<ul>
46+
<li>Chat Interface</li>
47+
<li>Model Selection</li>
48+
<li>Streaming Responses</li>
49+
</ul>
50+
</td>
51+
</tr>
52+
<tr>
53+
<th>Configuration</th>
54+
<td>
55+
Set <code>OPENAI_API_BASE_URL</code> to Olla OpenAI endpoint <br/>
56+
```
57+
export OPENAI_API_BASE_URL="http://localhost:40114/olla/openai/v1"
58+
```
59+
</td>
60+
</tr>
61+
<tr>
62+
<th>Example</th>
63+
<td>
64+
You can find an example of integration in <code>examples/ollama-openwebui</code> for Ollama as a full example, just remember to change to <code>OPENAI_API_BASE_URL</code>.
65+
</td>
66+
</tr>
67+
</table>
68+
69+
## Architecture
70+
71+
```
72+
┌─────────────┐ ┌───────── Olla (40114) ──────┐ ┌─────────────────────┐
73+
│ OpenWebUI │ ───▶│ /olla/openai/v1 (proxy) │ ───▶ │ vLLM :8000 (/v1/*) │
74+
│ (3000) │ │ • LB + failover │ └─────────────────────┘
75+
└─────────────┘ │ • health checks │ ┌─────────────────────┐
76+
│ • model unification (/v1) │ ───▶ │ SGLang :30000 (/v1) │
77+
└─────────────────────────────┘ └─────────────────────┘
78+
```
79+
80+
81+
82+
# Quick Start (Docker Compose)
83+
84+
Create **`compose.yaml`**:
85+
86+
```yaml
87+
services:
88+
olla:
89+
image: ghcr.io/thushan/olla:latest
90+
container_name: olla
91+
restart: unless-stopped
92+
ports:
93+
- "40114:40114"
94+
volumes:
95+
- ./olla.yaml:/app/config.yaml:ro
96+
- ./logs:/app/logs
97+
healthcheck:
98+
test: ["CMD", "wget", "--quiet", "--tries=1", "--spider", "http://localhost:40114/internal/health"]
99+
interval: 30s
100+
timeout: 5s
101+
retries: 3
102+
start_period: 10s
103+
104+
openwebui:
105+
image: ghcr.io/open-webui/open-webui:main
106+
container_name: openwebui
107+
restart: unless-stopped
108+
ports:
109+
- "3000:8080"
110+
volumes:
111+
- openwebui_data:/app/backend/data
112+
environment:
113+
- OPENAI_API_BASE_URL=http://olla:40114/olla/openai/v1
114+
- WEBUI_NAME=Olla + OpenWebUI
115+
- WEBUI_URL=http://localhost:3000
116+
depends_on:
117+
olla:
118+
condition: service_healthy
119+
120+
volumes:
121+
openwebui_data:
122+
driver: local
123+
```
124+
125+
Create **`olla.yaml`** (static discovery example):
126+
127+
```yaml
128+
server:
129+
host: 0.0.0.0
130+
port: 40114
131+
132+
proxy:
133+
engine: sherpa # or: olla (lower overhead), test both
134+
load_balancer: priority # or: least-connections
135+
136+
# Service discovery of OpenAI-compatible backends
137+
# (Each backend must expose /v1/*; Olla will translate as needed.)
138+
discovery:
139+
type: static
140+
static:
141+
endpoints:
142+
- url: http://192.168.1.100:8000
143+
name: gpu-vllm
144+
type: vllm
145+
priority: 100
146+
147+
- url: http://192.168.1.101:30000
148+
name: gpu-sglang
149+
type: sglang
150+
priority: 50
151+
152+
# Optional timeouts & streaming profile
153+
# proxy:
154+
# response_timeout: 1800s
155+
# read_timeout: 600s
156+
# profile: streaming
157+
```
158+
159+
Bring it up:
160+
161+
```bash
162+
docker compose up -d
163+
```
164+
165+
OpenWebUI: [http://localhost:3000](http://localhost:3000)
166+
167+
168+
169+
# Verifying via cURL
170+
171+
List unified models:
172+
173+
```bash
174+
curl http://localhost:40114/olla/openai/v1/models | jq
175+
```
176+
177+
Simple completion (non‑streaming):
178+
179+
```bash
180+
curl -s http://localhost:40114/olla/openai/v1/chat/completions \
181+
-H 'Content-Type: application/json' \
182+
-d '{
183+
"model": "gpt-oss-120b",
184+
"messages": [{"role":"user","content":"Hello from Olla"}]
185+
}' | jq
186+
```
187+
188+
Streaming (SSE):
189+
190+
```bash
191+
curl -N http://localhost:40114/olla/openai/v1/chat/completions \
192+
-H 'Content-Type: application/json' \
193+
-d '{
194+
"model": "gpt-oss-120b",
195+
"stream": true,
196+
"messages": [{"role":"user","content":"Stream test"}]
197+
}'
198+
```
199+
200+
Inspect Olla headers (which backend served the call):
201+
202+
```bash
203+
curl -sI http://localhost:40114/internal/status/endpoints | sed -n '1,20p'
204+
# Look for: X-Olla-Endpoint, X-Olla-Backend-Type, X-Olla-Response-Time
205+
```
206+
207+
208+
209+
# OpenWebUI Configuration Notes
210+
211+
* **Env var:** `OPENAI_API_BASE_URL` must point to Olla’s **/olla/openai/v1**.
212+
* **Model picker:** OpenWebUI’s model list is sourced from `/v1/models` (via Olla). If empty, see Troubleshooting.
213+
* **API keys:** If OpenWebUI prompts for an OpenAI key but your backends don’t require one, leave blank.
214+
215+
216+
217+
# Multiple Backends (vLLM, SGLang, LM Studio)
218+
219+
Add as many OpenAI‑compatible servers as you like. Priorities control routing.
220+
221+
```yaml
222+
discovery:
223+
static:
224+
endpoints:
225+
- url: http://vllm-a:8000
226+
name: vllm-a
227+
type: vllm
228+
priority: 100
229+
230+
- url: http://sglang-b:30000
231+
name: sglang-b
232+
type: sglang
233+
priority: 80
234+
235+
- url: http://lmstudio-c:1234
236+
name: lmstudio-c
237+
type: openai-compatible # generic OpenAI-compatible server
238+
priority: 60
239+
```
240+
241+
> Tip: Use `least-connections` when all nodes are similar; use `priority` to prefer local/cheaper nodes.
242+
243+
244+
245+
# Authentication (Front‑door keying via Nginx)
246+
247+
Olla doesn’t issue/validate API keys (yet). To expose Olla publicly, front it with Nginx to enforce simple static keys.
248+
249+
**`/etc/nginx/conf.d/olla.conf`**
250+
251+
```nginx
252+
map $http_authorization $api_key_valid {
253+
default 0;
254+
~*"Bearer (sk-thushan-XXXXXXXX|sk-yolo-YYYYYYYY)" 1;
255+
}
256+
257+
server {
258+
listen 80;
259+
server_name ai.example.com;
260+
261+
location /api/ {
262+
if ($api_key_valid = 0) { return 401; }
263+
proxy_pass http://localhost:40114;
264+
proxy_set_header Host $host;
265+
proxy_http_version 1.1;
266+
}
267+
}
268+
```
269+
270+
Then point external users to `http://ai.example.com/api/olla/openai/v1` and give them a matching `Authorization: Bearer ...`.
271+
272+
> For more robust auth (rate limits, per‑key quotas, logs), put an API gateway (Traefik/Envoy/Kong) ahead of Olla.
273+
274+
275+
276+
# Monitoring & Health
277+
278+
**Olla health:**
279+
280+
```bash
281+
curl http://localhost:40114/internal/health
282+
```
283+
284+
**Endpoint status:**
285+
286+
```bash
287+
curl http://localhost:40114/internal/status/endpoints | jq
288+
```
289+
290+
**Unified models:**
291+
292+
```bash
293+
curl http://localhost:40114/olla/openai/v1/models | jq
294+
```
295+
296+
**Logs:**
297+
298+
```bash
299+
docker logs -f olla
300+
301+
docker logs -f openwebui
302+
```
303+
304+
305+
306+
# Troubleshooting
307+
308+
**Models not appearing in OpenWebUI**
309+
310+
1. Olla up?
311+
312+
```bash
313+
curl http://localhost:40114/internal/health
314+
```
315+
316+
2. Backends discovered?
317+
318+
```bash
319+
curl http://localhost:40114/internal/status/endpoints | jq
320+
```
321+
322+
3. Models resolvable?
323+
324+
```bash
325+
curl http://localhost:40114/olla/openai/v1/models | jq
326+
```
327+
328+
4. OpenWebUI sees correct base?
329+
330+
```bash
331+
docker exec openwebui env | grep OPENAI_API_BASE_URL
332+
```
333+
334+
**Connection refused from OpenWebUI → Olla**
335+
336+
* Verify compose service names and ports
337+
* From container: `docker exec openwebui curl -sS olla:40114/internal/health`
338+
339+
**Slow responses**
340+
341+
* Switch to `proxy.engine: olla` or `profile: streaming`
342+
* Use `least-connections` for fairer distribution
343+
* Increase `proxy.response_timeout` for long generations
344+
345+
**Docker networking (Linux)**
346+
347+
* To hit host services: `http://172.17.0.1:<port>`
348+
* Remote nodes: use actual LAN IPs
349+
350+
351+
352+
# Standalone (no compose)
353+
354+
Run Olla locally:
355+
356+
```bash
357+
olla --config ./olla.yaml
358+
```
359+
360+
Run OpenWebUI:
361+
362+
```bash
363+
docker run -d --name openwebui \
364+
-p 3000:8080 \
365+
-v openwebui_data:/app/backend/data \
366+
-e OPENAI_API_BASE_URL=http://host.docker.internal:40114/olla/openai/v1 \
367+
ghcr.io/open-webui/open-webui:main
368+
```

0 commit comments

Comments
 (0)