|
| 1 | +# Demonstrating integration of Open WebUI with OpenVINO Model Server {#ovms_demos_integration_with_open_webui} |
| 2 | + |
| 3 | +## Description |
| 4 | + |
| 5 | +[Open WebUI](https://github.com/open-webui/open-webui) is a very popular component that provides a user interface to generative models. It supports use cases related to text generation, RAG, image generation, and many more. It also supports integration with remote execution servings compatible with standard APIs like OpenAI for chat completions and image generation. |
| 6 | + |
| 7 | +The goal of this demo is to integrate Open WebUI with [OpenVINO Model Server](https://github.com/openvinotoolkit/model_server). It would include instructions for deploying the serving with a set of models and configuring Open WebUI to delegate generation to the serving endpoints. |
| 8 | + |
| 9 | +--- |
| 10 | + |
| 11 | +## Setup |
| 12 | + |
| 13 | +### Prerequisites |
| 14 | + |
| 15 | +In this demo, OpenVINO Model Server is deployed on Linux with CPU using Docker and Open WebUI is installed via Python pip. Requirements to follow this demo: |
| 16 | + |
| 17 | +* [Docker Engine](https://docs.docker.com/engine/) installed |
| 18 | +* Host with x86_64 architecture |
| 19 | +* Linux, macOS, or Windows via [WSL](https://learn.microsoft.com/en-us/windows/wsl/) |
| 20 | +* Python 3.11 with pip |
| 21 | +* HuggingFace account to download models |
| 22 | + |
| 23 | +There are other options to fulfill the prerequisites like [OpenVINO Model Server deployment on baremetal Linux or Windows](https://docs.openvino.ai/nightly/model-server/ovms_docs_deploying_server_baremetal.html) and [Open WebUI installation with Docker](https://docs.openwebui.com/#quick-start-with-docker-). The steps in this demo can be reused across different options, and the reference for each step cover both deployments. |
| 24 | + |
| 25 | +This demo was tested on CPU but most of the models could be also run on Intel accelerators like GPU and NPU. |
| 26 | + |
| 27 | +### Step 1: Preparation |
| 28 | + |
| 29 | +Download export script, install its dependencies and create the directory for models: |
| 30 | + |
| 31 | +```bash |
| 32 | +curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/3/demos/common/export_models/export_model.py -o export_model.py |
| 33 | +pip install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/3/demos/common/export_models/requirements.txt |
| 34 | +mkdir models |
| 35 | +``` |
| 36 | + |
| 37 | +### Step 2: Export Model |
| 38 | + |
| 39 | +The text generation model used in this demo is [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct). If the model is not downloaded before, access must be requested. Run export script to download and quantize the model: |
| 40 | + |
| 41 | +```bash |
| 42 | +python export_model.py text_generation --source_model meta-llama/Llama-3.2-1B-Instruct --weight-format int8 --kv_cache_precision u8 --config_file_path models/config.json |
| 43 | +``` |
| 44 | + |
| 45 | +### Step 3: Server Deployment |
| 46 | + |
| 47 | +Deploy with docker: |
| 48 | + |
| 49 | +```bash |
| 50 | +docker run -d -p 8000:8000 -v $(pwd)/models:/workspace:ro openvino/model_server --rest_port 8000 --config_path /workspace/config.json |
| 51 | +``` |
| 52 | + |
| 53 | +Here is the basic call to check if it works: |
| 54 | + |
| 55 | +```bash |
| 56 | +curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" -d "{\"model\":\"meta-llama/Llama-3.2-1B-Instruct\",\"messages\":[{\"role\":\"system\",\"content\":\"You are a helpful assistant.\"},{\"role\":\"user\",\"content\":\"Say this is a test\"}]}" |
| 57 | +``` |
| 58 | + |
| 59 | +### Step 4: Start Open WebUI |
| 60 | + |
| 61 | +Install Open WebUI: |
| 62 | + |
| 63 | +```bash |
| 64 | +pip install open-webui |
| 65 | +``` |
| 66 | + |
| 67 | +Running Open WebUI: |
| 68 | + |
| 69 | +```bash |
| 70 | +open-webui serve |
| 71 | +``` |
| 72 | + |
| 73 | +Go to [http://localhost:8080](http://localhost:8080) and create admin account to get started. |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | +### Reference |
| 78 | +[https://docs.openvino.ai/2025/model-server/ovms_demos_continuous_batching.html](https://docs.openvino.ai/2025/model-server/ovms_demos_continuous_batching.html#model-preparation) |
| 79 | + |
| 80 | +[https://docs.openwebui.com](https://docs.openwebui.com/#installation-with-pip) |
| 81 | + |
| 82 | +--- |
| 83 | + |
| 84 | +## Chat |
| 85 | + |
| 86 | +### Step 1: Connections Setting |
| 87 | + |
| 88 | +1. Go to **Admin Panel** → **Settings** → **Connections** ([http://localhost:8080/admin/settings/connections](http://localhost:8080/admin/settings/connections)) |
| 89 | +2. Click **+Add Connection** under **OpenAI API** |
| 90 | + * URL: `http://localhost:8000/v3` |
| 91 | + * Model IDs: put `meta-llama/Llama-3.2-1B-Instruct` and click **+** to add the model, or leave empty to include all models |
| 92 | +3. Click **Save** |
| 93 | + |
| 94 | + |
| 95 | +### Step 2: Start Chatting |
| 96 | + |
| 97 | +Click **New Chat** and select the model to start chatting. |
| 98 | + |
| 99 | + |
| 100 | + |
| 101 | +### Reference |
| 102 | +[https://docs.openwebui.com/getting-started/quick-start/starting-with-openai-compatible](https://docs.openwebui.com/getting-started/quick-start/starting-with-openai-compatible/#step-2-connect-your-server-to-open-webui) |
| 103 | + |
| 104 | +--- |
| 105 | + |
| 106 | +## RAG |
| 107 | + |
| 108 | +### Step 1: Model Preparation |
| 109 | + |
| 110 | +In addition to text generation, endpoints for embedding and reranking in Retrieval Augmented Generation can also be deployed with OpenVINO Model Server. In this demo, the embedding model is [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) and the the reranking model is [BAAI/bge-reranker-base](https://huggingface.co/BAAI/bge-reranker-base). Run export script to download and quantize the models: |
| 111 | +```bash |
| 112 | +python export_model.py embeddings_ov --source_model sentence-transformers/all-MiniLM-L6-v2 --weight-format int8 --config_file_path models/config.json |
| 113 | +python export_model.py rerank_ov --source_model BAAI/bge-reranker-base --weight-format int8 --config_file_path models/config.json |
| 114 | +``` |
| 115 | + |
| 116 | +Keep the model server running or restart it. Here are the basic calls to check if they work: |
| 117 | +```bash |
| 118 | +curl http://localhost:8000/v3/embeddings -H "Content-Type: application/json" -d "{\"model\":\"sentence-transformers/all-MiniLM-L6-v2\",\"input\":\"hello world\"}" |
| 119 | +curl http://localhost:8000/v3/rerank -H "Content-Type: application/json" -d "{\"model\":\"BAAI/bge-reranker-base\",\"query\":\"welcome\",\"documents\":[\"good morning\",\"farewell\"]}" |
| 120 | +``` |
| 121 | + |
| 122 | +### Step 2: Documents Setting |
| 123 | + |
| 124 | +1. Go to **Admin Panel** → **Settings** → **Documents** ([http://localhost:8080/admin/settings/documents](http://localhost:8080/admin/settings/documents)) |
| 125 | +2. Select **OpenAI** for **Embedding Model Engine** |
| 126 | + * URL: `http://localhost:8000/v3` |
| 127 | + * Embedding Model: `sentence-transformers/all-MiniLM-L6-v2` |
| 128 | + * Put anything in API key |
| 129 | +3. Enable **Hybrid Search** |
| 130 | +4. Select **External** for **Reranking Engine** |
| 131 | + * URL: `http://localhost:8000/v3/rerank` |
| 132 | + * Reranking Model: `BAAI/bge-reranker-base` |
| 133 | +5. Click **Save** |
| 134 | + |
| 135 | + |
| 136 | + |
| 137 | +### Step 3: Knowledge Base |
| 138 | + |
| 139 | +1. Prepare the Documentation |
| 140 | + |
| 141 | + The documentation used in this demo is [https://github.com/open-webui/docs/archive/refs/heads/main.zip](https://github.com/open-webui/docs/archive/refs/heads/main.zip). Download and extract it to get the folder. |
| 142 | + |
| 143 | +2. Go to **Workspace** → **Knowledge** → **+Create a Knowledge Base** ([http://localhost:8080/workspace/knowledge/create](http://localhost:8080/workspace/knowledge/create)) |
| 144 | +3. Name and describe the knowledge base |
| 145 | +4. Click **Create Knowledge** |
| 146 | +5. Click **+Add Content** → **Upload directory**, then select the extracted folder. This will upload all files with suitable extensions. |
| 147 | + |
| 148 | + |
| 149 | + |
| 150 | +### Step 4: Chat with RAG |
| 151 | + |
| 152 | +1. Click **New Chat**. Enter `#` symbol |
| 153 | +2. Select documents that appear above the chat box for retrieval. Document icons will appear above **Send a message** |
| 154 | +3. Enter a query and sent |
| 155 | + |
| 156 | + |
| 157 | + |
| 158 | +### Step 5: RAG-enabled Model |
| 159 | + |
| 160 | +1. Go to **Workspace** → **Models** → **+Add New Model** ([http://localhost:8080/workspace/models/create](http://localhost:8080/workspace/models/create)) |
| 161 | +2. Configure the Model: |
| 162 | + * Name the model |
| 163 | + * Select a base model from list |
| 164 | + * Click **Select Knowledge** and select a knowledge base for retrieval |
| 165 | + |
| 166 | + |
| 167 | + |
| 168 | +3. Click **Save & Create** |
| 169 | +4. Click the created model and start chatting |
| 170 | + |
| 171 | + |
| 172 | + |
| 173 | +### Reference |
| 174 | + |
| 175 | +[https://docs.openvino.ai/nightly/model-server/ovms_demos_continuous_batching_rag.html](https://docs.openvino.ai/nightly/model-server/ovms_demos_continuous_batching_rag.html#export-models-from-huggingface-hub-including-conversion-to-openvino-format) |
| 176 | + |
| 177 | +[https://docs.openwebui.com/tutorials/tips/rag-tutorial](https://docs.openwebui.com/tutorials/tips/rag-tutorial/#setup) |
| 178 | + |
| 179 | +--- |
| 180 | + |
| 181 | +## Image Generation |
| 182 | + |
| 183 | +### Step 1: Model Preparation |
| 184 | + |
| 185 | +The image generation model used in this demo is [dreamlike-art/dreamlike-anime-1.0](https://huggingface.co/dreamlike-art/dreamlike-anime-1.0). Run export script to download and quantize the model: |
| 186 | + |
| 187 | +```bash |
| 188 | +python export_model.py image_generation --source_model dreamlike-art/dreamlike-anime-1.0 --weight-format int8 --config_file_path models/config.json |
| 189 | +``` |
| 190 | + |
| 191 | +Keep the model server running or restart it. Here is the basic call to check if it works: |
| 192 | + |
| 193 | +```bash |
| 194 | +curl http://localhost:8000/v3/images/generations -H "Content-Type: application/json" -d "{\"model\":\"dreamlike-art/dreamlike-anime-1.0\",\"prompt\":\"anime\",\"num_inference_steps\":1,\"size\":\"256x256\",\"response_format\":\"b64_json\"}" |
| 195 | +``` |
| 196 | + |
| 197 | +### Step 2: Image Generation Setting |
| 198 | + |
| 199 | +1. Go to **Admin Panel** → **Settings** → **Images** ([http://localhost:8080/admin/settings/images](http://localhost:8080/admin/settings/images)) |
| 200 | +2. Configure **OpenAI API**: |
| 201 | + * URL: `http://localhost:8000/v3` |
| 202 | + * Put anything in API key |
| 203 | +3. Enable **Image Generation (Experimental)** |
| 204 | + * Set Default Model: `dreamlike-art/dreamlike-anime-1.0` |
| 205 | + * Set Image Size. Must be in WxH format, example: `256x256` |
| 206 | +4. Click **Save** |
| 207 | + |
| 208 | + |
| 209 | + |
| 210 | +### Step 3: Generate Image |
| 211 | + |
| 212 | +Method 1: |
| 213 | +1. Toggle the **Image** switch to on |
| 214 | +2. Enter a query and sent |
| 215 | + |
| 216 | + |
| 217 | + |
| 218 | +Method 2: |
| 219 | +1. Send a query, with or without the **Image** switch on |
| 220 | +2. After the response has finished generating, it can be edited to a prompt |
| 221 | +3. Click the **Picture icon** to generate an image |
| 222 | + |
| 223 | + |
| 224 | + |
| 225 | +### Reference |
| 226 | +[https://docs.openvino.ai/nightly/model-server/ovms_demos_image_generation.html](https://docs.openvino.ai/nightly/model-server/ovms_demos_image_generation.html#export-model-for-cpu) |
| 227 | + |
| 228 | +[https://docs.openwebui.com/tutorials/images](https://docs.openwebui.com/tutorials/images/#using-image-generation) |
| 229 | + |
| 230 | +--- |
| 231 | +## VLM |
| 232 | + |
| 233 | +### Step 1: Model Preparation |
| 234 | + |
| 235 | +The vision language model used in this demo is [OpenGVLab/InternVL2-2B](https://huggingface.co/OpenGVLab/InternVL2-2B). Run export script to download and quantize the model: |
| 236 | + |
| 237 | +```bash |
| 238 | +python export_model.py text_generation --source_model OpenGVLab/InternVL2-2B --weight-format int4 --pipeline_type VLM --model_name OpenGVLab/InternVL2-2B --config_file_path models/config.json |
| 239 | +``` |
| 240 | + |
| 241 | +Keep the model server running or restart it. Here is the basic call to check if it works: |
| 242 | + |
| 243 | +```bash |
| 244 | +curl http://localhost:8000/v3/chat/completions -H "Content-Type: application/json" -d "{ \"model\": \"OpenGVLab/InternVL2-2B\", \"messages\":[{\"role\": \"user\", \"content\": [{\"type\": \"text\", \"text\": \"Describe what is one the picture.\"},{\"type\": \"image_url\", \"image_url\": {\"url\": \"http://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/2/demos/common/static/images/zebra.jpeg\"}}]}], \"max_completion_tokens\": 100}" |
| 245 | +``` |
| 246 | + |
| 247 | +### Step 2: Chat with VLM |
| 248 | + |
| 249 | +1. Start a **New Chat** with model set to `OpenGVLab/InternVL2-2B`. |
| 250 | +2. Click **+more** to upload images, by capturing the screen or uploading files. The image used in this demo is [http://raw.githubusercontent.com/openvinotoolkit/model\_server/refs/heads/releases/2025/2/demos/common/static/images/zebra.jpeg](http://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/2/demos/common/static/images/zebra.jpeg). |
| 251 | + |
| 252 | + |
| 253 | +3. Enter a query and sent |
| 254 | + |
| 255 | + |
| 256 | + |
| 257 | +### Reference |
| 258 | +[https://docs.openvino.ai/nightly/model-server/ovms_demos_continuous_batching_vlm.html](https://docs.openvino.ai/nightly/model-server/ovms_demos_continuous_batching_vlm.html#model-preparation) |
| 259 | + |
| 260 | +--- |
| 261 | + |
| 262 | +## AI agent with Tools |
| 263 | + |
| 264 | +### Step 1: Start Tool Server |
| 265 | + |
| 266 | +Start a OpenAPI tool server available in the [openapi-servers repo](https://github.com/open-webui/openapi-servers). The server used in this demo is [https://github.com/open-webui/openapi-servers/tree/main/servers/time](https://github.com/open-webui/openapi-servers/tree/main/servers/time). Run it locally at `http://localhost:18000`: |
| 267 | + |
| 268 | +```bash |
| 269 | +git clone https://github.com/open-webui/openapi-servers |
| 270 | +cd openapi-servers/servers/time |
| 271 | +pip install -r requirements.txt |
| 272 | +uvicorn main:app --host 0.0.0.0 --port 18000 --reload |
| 273 | +``` |
| 274 | + |
| 275 | +### Step 2: Tools Setting |
| 276 | + |
| 277 | +1. Go to **Admin Panel** → **Settings** → **Tools** ([http://localhost:8080/admin/settings/tools](http://localhost:8080/admin/settings/tools)) |
| 278 | +2. Click **+Add Connection** |
| 279 | + * URL: `http://localhost:18000` |
| 280 | + * Name the tool |
| 281 | +3. Click **Save** |
| 282 | + |
| 283 | + |
| 284 | + |
| 285 | +### Step 3: Chat with AI Agent |
| 286 | + |
| 287 | +1. Click **+more** and toggle on the tool |
| 288 | +2. Enter a query and sent |
| 289 | + |
| 290 | + |
| 291 | + |
| 292 | +### Reference |
| 293 | +[https://docs.openwebui.com/openapi-servers/open-webui](https://docs.openwebui.com/openapi-servers/open-webui/#step-2-connect-tool-server-in-open-webui) |
0 commit comments