Skip to content

Commit 7de1895

Browse files
mzeglaHuanli-Gong
andauthored
Demonstrating integration of Open WebUI with OpenVINO Model Server (#3627)
* Demonstrating integration of Open WebUI with OpenVINO Model Server (#3612) * sphinx integration and additional docs updates * typo fix * fix sphinx build and agentic demo doc --------- Co-authored-by: Huanli-Gong <[email protected]>
1 parent 52c9e04 commit 7de1895

19 files changed

+308
-4
lines changed

demos/README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ maxdepth: 1
66
hidden:
77
---
88
ovms_demos_continuous_batching_agent
9+
ovms_demos_integration_with_open_webui
910
ovms_demos_rerank
1011
ovms_demos_embeddings
1112
ovms_demos_continuous_batching
@@ -49,6 +50,7 @@ OpenVINO Model Server demos have been created to showcase the usage of the model
4950
| Demo | Description |
5051
|---|---|
5152
|[AI Agents with MCP servers and serving language models](./continuous_batching/agentic_ai/README.md)|OpenAI agents with MCP servers and serving LLM models|
53+
|[Integration with Open WebUI](integration_with_OpenWebUI/README.md)|Using OpenWeb UI with OVMS as inference provider. Shows text and image generation as well as usage with RAG and tools|
5254
|[LLM Text Generation with continuous batching](continuous_batching/README.md)|Generate text with LLM models and continuous batching pipeline|
5355
|[VLM Text Generation with continuous batching](continuous_batching/vlm/README.md)|Generate text with VLM models and continuous batching pipeline|
5456
|[OpenAI API text embeddings ](embeddings/README.md)|Get text embeddings via endpoint compatible with OpenAI API|

demos/continuous_batching/agentic_ai/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -459,13 +459,13 @@ python openai_agent.py --query "What is the current weather in Tokyo?" --model O
459459
:::{tab-item} OpenVINO/Mistral-7B-Instruct-v0.3-int4-ov
460460
:sync: OpenVINO/Mistral-7B-Instruct-v0.3-int4-ov
461461
```bash
462-
python openai_agent.py --query "What is the current weather in Tokyo?" --model OpenVINO/Mistral-7B-Instruct-v0.3-int4-ov --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather --tool_choice required
462+
python openai_agent.py --query "What is the current weather in Tokyo?" --model OpenVINO/Mistral-7B-Instruct-v0.3-int4-ov --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather --tool-choice required
463463
```
464464
:::
465465
:::{tab-item} Phi-4-mini-instruct-int4-ov
466466
:sync: Phi-4-mini-instruct-int4-ov
467467
```bash
468-
python openai_agent.py --query "What is the current weather in Tokyo?" --model OpenVINO/Phi-4-mini-instruct-int4-ov --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather --tool_choice required
468+
python openai_agent.py --query "What is the current weather in Tokyo?" --model OpenVINO/Phi-4-mini-instruct-int4-ov --base-url http://localhost:8000/v3 --mcp-server-url http://localhost:8080/sse --mcp-server weather --tool-choice required
469469
```
470470
:::
471471
::::
126 KB
Loading
Lines changed: 293 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,293 @@
1+
# Demonstrating integration of Open WebUI with OpenVINO Model Server {#ovms_demos_integration_with_open_webui}
2+
3+
## Description
4+
5+
[Open WebUI](https://github.com/open-webui/open-webui) is a very popular component that provides a user interface to generative models. It supports use cases related to text generation, RAG, image generation, and many more. It also supports integration with remote execution servings compatible with standard APIs like OpenAI for chat completions and image generation.
6+
7+
The goal of this demo is to integrate Open WebUI with [OpenVINO Model Server](https://github.com/openvinotoolkit/model_server). It would include instructions for deploying the serving with a set of models and configuring Open WebUI to delegate generation to the serving endpoints.
8+
9+
---
10+
11+
## Setup
12+
13+
### Prerequisites
14+
15+
In this demo, OpenVINO Model Server is deployed on Linux with CPU using Docker and Open WebUI is installed via Python pip. Requirements to follow this demo:
16+
17+
* [Docker Engine](https://docs.docker.com/engine/) installed
18+
* Host with x86_64 architecture
19+
* Linux, macOS, or Windows via [WSL](https://learn.microsoft.com/en-us/windows/wsl/)
20+
* Python 3.11 with pip
21+
* HuggingFace account to download models
22+
23+
There are other options to fulfill the prerequisites like [OpenVINO Model Server deployment on baremetal Linux or Windows](https://docs.openvino.ai/nightly/model-server/ovms_docs_deploying_server_baremetal.html) and [Open WebUI installation with Docker](https://docs.openwebui.com/#quick-start-with-docker-). The steps in this demo can be reused across different options, and the reference for each step cover both deployments.
24+
25+
This demo was tested on CPU but most of the models could be also run on Intel accelerators like GPU and NPU.
26+
27+
### Step 1: Preparation
28+
29+
Download export script, install its dependencies and create the directory for models:
30+
31+
```bash
32+
curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/3/demos/common/export_models/export_model.py -o export_model.py
33+
pip install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/3/demos/common/export_models/requirements.txt
34+
mkdir models
35+
```
36+
37+
### Step 2: Export Model
38+
39+
The text generation model used in this demo is [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct). If the model is not downloaded before, access must be requested. Run export script to download and quantize the model:
40+
41+
```bash
42+
python export_model.py text_generation --source_model meta-llama/Llama-3.2-1B-Instruct --weight-format int8 --kv_cache_precision u8 --config_file_path models/config.json
43+
```
44+
45+
### Step 3: Server Deployment
46+
47+
Deploy with docker:
48+
49+
```bash
50+
docker run -d -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server --rest_port 8000 --config_path /workspace/config.json
51+
```
52+
53+
Here is the basic call to check if it works:
54+
55+
```bash
56+
curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"meta-llama/Llama-3.2-1B-Instruct\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a helpful assistant.\"},{\"role\":\"user\",\"content\":\"Say this is a test\"}]}"
57+
```
58+
59+
### Step 4: Start Open WebUI
60+
61+
Install Open WebUI:
62+
63+
```bash
64+
pip install open-webui
65+
```
66+
67+
Running Open WebUI:
68+
69+
```bash
70+
open-webui serve
71+
```
72+
73+
Go to [http://localhost:8080](http://localhost:8080) and create admin account to get started.
74+
75+
![get started with Open WebUI](./get_started_with_Open_WebUI.png)
76+
77+
### Reference
78+
[https://docs.openvino.ai/2025/model-server/ovms_demos_continuous_batching.html](https://docs.openvino.ai/2025/model-server/ovms_demos_continuous_batching.html#model-preparation)
79+
80+
[https://docs.openwebui.com](https://docs.openwebui.com/#installation-with-pip)
81+
82+
---
83+
84+
## Chat
85+
86+
### Step 1: Connections Setting
87+
88+
1. Go to **Admin Panel****Settings****Connections** ([http://localhost:8080/admin/settings/connections](http://localhost:8080/admin/settings/connections))
89+
2. Click **+Add Connection** under **OpenAI API**
90+
* URL: `http://localhost:8000/v3`
91+
* Model IDs: put `meta-llama/Llama-3.2-1B-Instruct` and click **+** to add the model, or leave empty to include all models
92+
3. Click **Save**
93+
94+
![connection setting](./connection_setting.png)
95+
### Step 2: Start Chatting
96+
97+
Click **New Chat** and select the model to start chatting.
98+
99+
![chat demo](./chat_demo.png)
100+
101+
### Reference
102+
[https://docs.openwebui.com/getting-started/quick-start/starting-with-openai-compatible](https://docs.openwebui.com/getting-started/quick-start/starting-with-openai-compatible/#step-2-connect-your-server-to-open-webui)
103+
104+
---
105+
106+
## RAG
107+
108+
### Step 1: Model Preparation
109+
110+
In addition to text generation, endpoints for embedding and reranking in Retrieval Augmented Generation can also be deployed with OpenVINO Model Server. In this demo, the embedding model is [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) and the the reranking model is [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base). Run export script to download and quantize the models:
111+
```bash
112+
python export_model.py embeddings_ov --source_model sentence-transformers/all-MiniLM-L6-v2 --weight-format int8 --config_file_path models/config.json
113+
python export_model.py rerank_ov --source_model BAAI/bge-reranker-base --weight-format int8 --config_file_path models/config.json
114+
```
115+
116+
Keep the model server running or restart it. Here are the basic calls to check if they work:
117+
```bash
118+
curl http://localhost:8000/v3/embeddings -H "Content-Type: application/json" -d "{\"model\":\"sentence-transformers/all-MiniLM-L6-v2\",\"input\":\"hello world\"}"
119+
curl http://localhost:8000/v3/rerank -H "Content-Type: application/json" -d "{\"model\":\"BAAI/bge-reranker-base\",\"query\":\"welcome\",\"documents\":[\"good morning\",\"farewell\"]}"
120+
```
121+
122+
### Step 2: Documents Setting
123+
124+
1. Go to **Admin Panel****Settings****Documents** ([http://localhost:8080/admin/settings/documents](http://localhost:8080/admin/settings/documents))
125+
2. Select **OpenAI** for **Embedding Model Engine**
126+
* URL: `http://localhost:8000/v3`
127+
* Embedding Model: `sentence-transformers/all-MiniLM-L6-v2`
128+
* Put anything in API key
129+
3. Enable **Hybrid Search**
130+
4. Select **External** for **Reranking Engine**
131+
* URL: `http://localhost:8000/v3/rerank`
132+
* Reranking Model: `BAAI/bge-reranker-base`
133+
5. Click **Save**
134+
135+
![embedding and retrieval setting](./embedding_and_retrieval_setting.png)
136+
137+
### Step 3: Knowledge Base
138+
139+
1. Prepare the Documentation
140+
141+
The documentation used in this demo is [https://github.com/open-webui/docs/archive/refs/heads/main.zip](https://github.com/open-webui/docs/archive/refs/heads/main.zip). Download and extract it to get the folder.
142+
143+
2. Go to **Workspace****Knowledge****+Create a Knowledge Base** ([http://localhost:8080/workspace/knowledge/create](http://localhost:8080/workspace/knowledge/create))
144+
3. Name and describe the knowledge base
145+
4. Click **Create Knowledge**
146+
5. Click **+Add Content****Upload directory**, then select the extracted folder. This will upload all files with suitable extensions.
147+
148+
![create a knowledge base](./create_a_knowledge_base.png)
149+
150+
### Step 4: Chat with RAG
151+
152+
1. Click **New Chat**. Enter `#` symbol
153+
2. Select documents that appear above the chat box for retrieval. Document icons will appear above **Send a message**
154+
3. Enter a query and sent
155+
156+
![chat with RAG demo](./chat_with_RAG_demo.png)
157+
158+
### Step 5: RAG-enabled Model
159+
160+
1. Go to **Workspace****Models****+Add New Model** ([http://localhost:8080/workspace/models/create](http://localhost:8080/workspace/models/create))
161+
2. Configure the Model:
162+
* Name the model
163+
* Select a base model from list
164+
* Click **Select Knowledge** and select a knowledge base for retrieval
165+
166+
![create and configure the RAG-enabled model](./create_and_configure_the_RAG-enabled_model.png)
167+
168+
3. Click **Save & Create**
169+
4. Click the created model and start chatting
170+
171+
![RAG-enabled model demo](./RAG-enabled_model_demo.png)
172+
173+
### Reference
174+
175+
[https://docs.openvino.ai/nightly/model-server/ovms_demos_continuous_batching_rag.html](https://docs.openvino.ai/nightly/model-server/ovms_demos_continuous_batching_rag.html#export-models-from-huggingface-hub-including-conversion-to-openvino-format)
176+
177+
[https://docs.openwebui.com/tutorials/tips/rag-tutorial](https://docs.openwebui.com/tutorials/tips/rag-tutorial/#setup)
178+
179+
---
180+
181+
## Image Generation
182+
183+
### Step 1: Model Preparation
184+
185+
The image generation model used in this demo is [dreamlike-art/dreamlike-anime-1.0](https://huggingface.co/dreamlike-art/dreamlike-anime-1.0). Run export script to download and quantize the model:
186+
187+
```bash
188+
python export_model.py image_generation --source_model dreamlike-art/dreamlike-anime-1.0 --weight-format int8 --config_file_path models/config.json
189+
```
190+
191+
Keep the model server running or restart it. Here is the basic call to check if it works:
192+
193+
```bash
194+
curl http://localhost:8000/v3/images/generations -H "Content-Type: application/json" -d "{\"model\":\"dreamlike-art/dreamlike-anime-1.0\",\"prompt\":\"anime\",\"num_inference_steps\":1,\"size\":\"256x256\",\"response_format\":\"b64_json\"}"
195+
```
196+
197+
### Step 2: Image Generation Setting
198+
199+
1. Go to **Admin Panel****Settings****Images** ([http://localhost:8080/admin/settings/images](http://localhost:8080/admin/settings/images))
200+
2. Configure **OpenAI API**:
201+
* URL: `http://localhost:8000/v3`
202+
* Put anything in API key
203+
3. Enable **Image Generation (Experimental)**
204+
* Set Default Model: `dreamlike-art/dreamlike-anime-1.0`
205+
* Set Image Size. Must be in WxH format, example: `256x256`
206+
4. Click **Save**
207+
208+
![image generation setting](./image_generation_setting.png)
209+
210+
### Step 3: Generate Image
211+
212+
Method 1:
213+
1. Toggle the **Image** switch to on
214+
2. Enter a query and sent
215+
216+
![image generation method 1 demo](./image_generation_method_1_demo.png)
217+
218+
Method 2:
219+
1. Send a query, with or without the **Image** switch on
220+
2. After the response has finished generating, it can be edited to a prompt
221+
3. Click the **Picture icon** to generate an image
222+
223+
![image generation method 2 demo](./image_generation_method_2_demo.png)
224+
225+
### Reference
226+
[https://docs.openvino.ai/nightly/model-server/ovms_demos_image_generation.html](https://docs.openvino.ai/nightly/model-server/ovms_demos_image_generation.html#export-model-for-cpu)
227+
228+
[https://docs.openwebui.com/tutorials/images](https://docs.openwebui.com/tutorials/images/#using-image-generation)
229+
230+
---
231+
## VLM
232+
233+
### Step 1: Model Preparation
234+
235+
The vision language model used in this demo is [OpenGVLab/InternVL2-2B](https://huggingface.co/OpenGVLab/InternVL2-2B). Run export script to download and quantize the model:
236+
237+
```bash
238+
python export_model.py text_generation --source_model OpenGVLab/InternVL2-2B --weight-format int4 --pipeline_type VLM --model_name OpenGVLab/InternVL2-2B --config_file_path models/config.json
239+
```
240+
241+
Keep the model server running or restart it. Here is the basic call to check if it works:
242+
243+
```bash
244+
curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" -d "{ \"model\": \"OpenGVLab/InternVL2-2B\", \"messages\":[{\"role\": \"user\", \"content\": [{\"type\": \"text\", \"text\": \"Describe what is one the picture.\"},{\"type\": \"image_url\", \"image_url\": {\"url\": \"http://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/2/demos/common/static/images/zebra.jpeg\"}}]}], \"max_completion_tokens\": 100}"
245+
```
246+
247+
### Step 2: Chat with VLM
248+
249+
1. Start a **New Chat** with model set to `OpenGVLab/InternVL2-2B`.
250+
2. Click **+more** to upload images, by capturing the screen or uploading files. The image used in this demo is [http://raw.githubusercontent.com/openvinotoolkit/model\_server/refs/heads/releases/2025/2/demos/common/static/images/zebra.jpeg](http://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/2/demos/common/static/images/zebra.jpeg).
251+
252+
![upload images](./upload_images.png)
253+
3. Enter a query and sent
254+
255+
![chat with VLM demo](./chat_with_VLM_demo.png)
256+
257+
### Reference
258+
[https://docs.openvino.ai/nightly/model-server/ovms_demos_continuous_batching_vlm.html](https://docs.openvino.ai/nightly/model-server/ovms_demos_continuous_batching_vlm.html#model-preparation)
259+
260+
---
261+
262+
## AI agent with Tools
263+
264+
### Step 1: Start Tool Server
265+
266+
Start a OpenAPI tool server available in the [openapi-servers repo](https://github.com/open-webui/openapi-servers). The server used in this demo is [https://github.com/open-webui/openapi-servers/tree/main/servers/time](https://github.com/open-webui/openapi-servers/tree/main/servers/time). Run it locally at `http://localhost:18000`:
267+
268+
```bash
269+
git clone https://github.com/open-webui/openapi-servers
270+
cd openapi-servers/servers/time
271+
pip install -r requirements.txt
272+
uvicorn main:app --host 0.0.0.0 --port 18000 --reload
273+
```
274+
275+
### Step 2: Tools Setting
276+
277+
1. Go to **Admin Panel****Settings****Tools** ([http://localhost:8080/admin/settings/tools](http://localhost:8080/admin/settings/tools))
278+
2. Click **+Add Connection**
279+
* URL: `http://localhost:18000`
280+
* Name the tool
281+
3. Click **Save**
282+
283+
![tools setting](./tools_setting.png)
284+
285+
### Step 3: Chat with AI Agent
286+
287+
1. Click **+more** and toggle on the tool
288+
2. Enter a query and sent
289+
290+
![chat with AI Agent demo](./chat_with_AI_Agent_demo.png)
291+
292+
### Reference
293+
[https://docs.openwebui.com/openapi-servers/open-webui](https://docs.openwebui.com/openapi-servers/open-webui/#step-2-connect-tool-server-in-open-webui)
71.8 KB
Loading
98.2 KB
Loading
110 KB
Loading
154 KB
Loading
98.8 KB
Loading
67.3 KB
Loading

0 commit comments

Comments
 (0)