diff --git a/docs/faq.mdx b/docs/faq.mdx
index dbc64a8d41..494bb1bede 100644
--- a/docs/faq.mdx
+++ b/docs/faq.mdx
@@ -128,6 +128,10 @@ Everything you need to run Open WebUI, including your data, remains within your
docker run -d -p 3000:8080 -e HF_ENDPOINT=https://hf-mirror.com/ --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
```
+### Q: Why are my reasoning model's thinking blocks showing as raw text instead of being hidden?
+
+**A:** This happens if the model's thinking tags are not recognized by Open WebUI. You can customize the tags in the model's Advanced Parameters. For more details, see the **[Reasoning & Thinking Models](/features/chat-features/reasoning-models)** guide.
+
### Q: RAG with Open WebUI is very bad or not working at all. Why?
**A:** If you're using **Ollama**, be aware that Ollama sets the context length to **2048 tokens by default**. This means that none of the retrieved data might be used because it doesn't fit within the available context window.
@@ -136,10 +140,66 @@ To improve the performance of Retrieval-Augmented Generation (**RAG**) with Open
To do this, configure your **Ollama model params** to allow a larger context window. You can check and modify this setting in your chat directly or from model editor page to enhance the RAG experience significantly.
+### Q: I asked the model what it is and it gave the wrong answer. Is Open WebUI routing to the wrong model?
+
+**A:** No—**LLMs do not reliably know their own identity.** When you ask a model "What model are you?" or "Are you GPT-4?", the response is not a system diagnostic. It's simply the model generating text based on patterns in its training data.
+
+Models frequently:
+- Claim to be a different model (e.g., a Llama model claiming to be ChatGPT)
+- Give outdated information about themselves
+- Hallucinate version numbers or capabilities
+- Change their answer depending on how you phrase the question
+
+**To verify which model you're actually using:**
+1. Check the model selector in the Open WebUI interface
+2. Look at the **Admin Panel > Settings > Connections** to confirm your API endpoints
+3. Check your provider's dashboard/logs for the actual API calls being made
+
+Asking the model itself is **not** a valid way to diagnose routing issues. If you suspect a configuration problem, check your connection settings and API keys instead.
+
+### Q: But why can models on official chat interfaces (like ChatGPT or Claude.ai) correctly identify themselves?
+
+**A:** Because the provider **injects a system prompt** that explicitly tells the model what it is. When you use ChatGPT, OpenAI's interface includes a hidden system message like "You are ChatGPT, a large language model trained by OpenAI..." before your conversation begins.
+
+The model isn't "aware" of itself—it's simply been instructed to claim a specific identity. You can do the same thing in Open WebUI by adding a system prompt to your model configuration (e.g., "You are Llama 3.3 70B..."). The model will then confidently repeat whatever identity you've told it to claim.
+
+This is also why the same model accessed through different interfaces might give different answers about its identity—it depends entirely on what system prompt (if any) was provided.
+
+### Q: Why am I seeing multiple API requests when I only send one message? Why is my token usage higher than expected?
+
+**A:** Open WebUI uses **Task Models** to power background features that enhance your chat experience. When you send a single message, additional API calls may be made for:
+
+- **Title Generation**: Automatically generating a title for new chats
+- **Tag Generation**: Auto-tagging chats for organization
+- **Query Generation**: Creating optimized search queries for RAG (when you attach files or knowledge)
+- **Web Search Queries**: Generating search terms when web search is enabled
+- **Autocomplete Suggestions**: If enabled
+
+By default, these tasks use the **same model** you're chatting with. If you're using an expensive API model (like GPT-4 or Claude), this can significantly increase your costs.
+
+**To reduce API costs:**
+1. Go to **Admin Panel > Settings > Interface** (for title/tag generation settings)
+2. Configure a **Task Model** under **Admin Panel > Settings > Models** to use a smaller, cheaper model (like GPT-4o-mini) or a local model for background tasks
+3. Disable features you don't need (auto-title, auto-tags, etc.)
+
+:::tip Cost-Saving Recommendation
+Set your Task Model to a fast, inexpensive model (or a local model via Ollama) while keeping your primary chat model as a more capable one. This gives you the best of both worlds: smart responses for your conversations, cheap/free processing for background tasks.
+:::
+
+For more optimization tips, see the **[Performance Tips Guide](tutorials/tips/performance)**.
+
### Q: Is MCP (Model Context Protocol) supported in Open WebUI?
**A:** Yes, Open WebUI now includes **native support for MCP Streamable HTTP**, enabling direct, first-class integration with MCP tools that communicate over the standard HTTP transport. For any **other MCP transports or non-HTTP implementations**, you should use our official proxy adapter, **MCPO**, available at 👉 [https://github.com/open-webui/mcpo](https://github.com/open-webui/mcpo). MCPO provides a unified OpenAPI-compatible layer that bridges alternative MCP transports into Open WebUI safely and consistently. This architecture ensures maximum compatibility, strict security boundaries, and predictable tool behavior across different environments while keeping Open WebUI backend-agnostic and maintainable.
+### Q: Why doesn't Open WebUI support [Specific Provider]'s latest API (e.g. OpenAI Responses API)?
+
+**A:** Open WebUI is built around **universal protocols**, not specific providers. Our core philosophy is to support standard, widely-adopted APIs like the **OpenAI Chat Completions protocol**.
+
+This protocol-centric design ensures that Open WebUI remains backend-agnostic and compatible with dozens of providers (like OpenRouter, LiteLLM, vLLM, and Groq) simultaneously. We avoid implementing proprietary, provider-specific APIs (such as OpenAI's stateful Responses API or Anthropic's Messages API) to prevent unsustainable architectural bloat and to maintain a truly open ecosystem.
+
+If you need functionality exclusive to a proprietary API (like OpenAI's hidden reasoning traces), we recommend using a proxy like **LiteLLM** or **OpenRouter**, which translate those proprietary features into the standard Chat Completions protocol that Open WebUI supports.
+
### Q: Why is the frontend integrated into the same Docker image? Isn't this unscalable or problematic?
The assumption that bundling the frontend with the backend is unscalable comes from a misunderstanding of how modern Single-Page Applications work. Open WebUI’s frontend is a static SPA, meaning it consists only of HTML, CSS, and JavaScript files with no runtime coupling to the backend. Because these files are static, lightweight, and require no separate server, including them in the same image has no impact on scalability. This approach simplifies deployment, ensures every replica serves the exact same assets, and eliminates unnecessary moving parts. If you prefer, you can still host the SPA on any CDN or static hosting service and point it to a remote backend, but packaging both together is the standard and most practical method for containerized SPAs.
diff --git a/docs/features/audio/speech-to-text/env-variables.md b/docs/features/audio/speech-to-text/env-variables.md
index 1d4fd9264a..ee16d5b1a3 100644
--- a/docs/features/audio/speech-to-text/env-variables.md
+++ b/docs/features/audio/speech-to-text/env-variables.md
@@ -11,16 +11,120 @@ For a complete list of all Open WebUI environment variables, see the [Environmen
:::
-The following is a summary of the environment variables for speech to text (STT).
-
-# Environment Variables For Speech To Text (STT)
-
-| Variable | Description |
-|----------|-------------|
-| `WHISPER_MODEL` | Sets the Whisper model to use for local Speech-to-Text |
-| `WHISPER_MODEL_DIR` | Specifies the directory to store Whisper model files |
-| `WHISPER_LANGUAGE` | Specifies the ISO 639-1 (ISO 639-2 for Hawaiian and Cantonese) Speech-to-Text language to use for Whisper (language is predicted unless set) |
-| `AUDIO_STT_ENGINE` | Specifies the Speech-to-Text engine to use (empty for local Whisper, or `openai`) |
-| `AUDIO_STT_MODEL` | Specifies the Speech-to-Text model for OpenAI-compatible endpoints |
-| `AUDIO_STT_OPENAI_API_BASE_URL` | Sets the OpenAI-compatible base URL for Speech-to-Text |
-| `AUDIO_STT_OPENAI_API_KEY` | Sets the OpenAI API key for Speech-to-Text |
+The following is a summary of the environment variables for speech to text (STT) and text to speech (TTS).
+
+:::tip UI Configuration
+Most of these settings can also be configured in the **Admin Panel → Settings → Audio** tab. Environment variables take precedence on startup but can be overridden in the UI.
+:::
+
+## Speech To Text (STT) Environment Variables
+
+### Local Whisper
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `WHISPER_MODEL` | Whisper model size | `base` |
+| `WHISPER_MODEL_DIR` | Directory to store Whisper model files | `{CACHE_DIR}/whisper/models` |
+| `WHISPER_COMPUTE_TYPE` | Compute type for inference (see note below) | `int8` |
+| `WHISPER_LANGUAGE` | ISO 639-1 language code (empty = auto-detect) | empty |
+| `WHISPER_MULTILINGUAL` | Use the multilingual Whisper model | `false` |
+| `WHISPER_MODEL_AUTO_UPDATE` | Auto-download model updates | `false` |
+| `WHISPER_VAD_FILTER` | Enable Voice Activity Detection filter | `false` |
+
+:::info WHISPER_COMPUTE_TYPE Options
+- `int8` — CPU default, fastest but may not work on older GPUs
+- `float16` — **Recommended for CUDA/GPU**
+- `int8_float16` — Hybrid mode (int8 weights, float16 computation)
+- `float32` — Maximum compatibility, slowest
+
+If using the `:cuda` Docker image with an older GPU, set `WHISPER_COMPUTE_TYPE=float16` to avoid errors.
+:::
+
+### OpenAI-Compatible STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_ENGINE` | STT engine: empty (local Whisper), `openai`, `azure`, `deepgram`, `mistral` | empty |
+| `AUDIO_STT_MODEL` | STT model for external providers | empty |
+| `AUDIO_STT_OPENAI_API_BASE_URL` | OpenAI-compatible API base URL | `https://api.openai.com/v1` |
+| `AUDIO_STT_OPENAI_API_KEY` | OpenAI API key | empty |
+| `AUDIO_STT_SUPPORTED_CONTENT_TYPES` | Comma-separated list of supported audio MIME types | empty |
+
+### Azure STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_AZURE_API_KEY` | Azure Cognitive Services API key | empty |
+| `AUDIO_STT_AZURE_REGION` | Azure region | `eastus` |
+| `AUDIO_STT_AZURE_LOCALES` | Comma-separated locales (e.g., `en-US,de-DE`) | auto |
+| `AUDIO_STT_AZURE_BASE_URL` | Custom Azure base URL (optional) | empty |
+| `AUDIO_STT_AZURE_MAX_SPEAKERS` | Max speakers for diarization | `3` |
+
+### Deepgram STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `DEEPGRAM_API_KEY` | Deepgram API key | empty |
+
+### Mistral STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_MISTRAL_API_KEY` | Mistral API key | empty |
+| `AUDIO_STT_MISTRAL_API_BASE_URL` | Mistral API base URL | `https://api.mistral.ai/v1` |
+| `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS` | Use chat completions endpoint | `false` |
+
+## Text To Speech (TTS) Environment Variables
+
+### General TTS
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_TTS_ENGINE` | TTS engine: empty (disabled), `openai`, `elevenlabs`, `azure`, `transformers` | empty |
+| `AUDIO_TTS_MODEL` | TTS model | `tts-1` |
+| `AUDIO_TTS_VOICE` | Default voice | `alloy` |
+| `AUDIO_TTS_SPLIT_ON` | Split text on: `punctuation` or `none` | `punctuation` |
+| `AUDIO_TTS_API_KEY` | API key for ElevenLabs or Azure TTS | empty |
+
+### OpenAI-Compatible TTS
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_TTS_OPENAI_API_BASE_URL` | OpenAI-compatible TTS API base URL | `https://api.openai.com/v1` |
+| `AUDIO_TTS_OPENAI_API_KEY` | OpenAI TTS API key | empty |
+| `AUDIO_TTS_OPENAI_PARAMS` | Additional JSON params for OpenAI TTS | empty |
+
+### Azure TTS
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_TTS_AZURE_SPEECH_REGION` | Azure Speech region | `eastus` |
+| `AUDIO_TTS_AZURE_SPEECH_BASE_URL` | Custom Azure Speech base URL (optional) | empty |
+| `AUDIO_TTS_AZURE_SPEECH_OUTPUT_FORMAT` | Audio output format | `audio-24khz-160kbitrate-mono-mp3` |
+
+## Tips for Configuring Audio
+
+### Using Local Whisper STT
+
+For GPU acceleration issues or older GPUs, try setting:
+```yaml
+environment:
+ - WHISPER_COMPUTE_TYPE=float16
+```
+
+### Using External TTS Services
+
+When running Open WebUI in Docker with an external TTS service:
+
+```yaml
+environment:
+ - AUDIO_TTS_ENGINE=openai
+ - AUDIO_TTS_OPENAI_API_BASE_URL=http://host.docker.internal:5050/v1
+ - AUDIO_TTS_OPENAI_API_KEY=your-api-key
+```
+
+:::tip
+Use `host.docker.internal` on Docker Desktop (Windows/Mac) to access services on the host. On Linux, use the host IP or container networking.
+:::
+
+For troubleshooting audio issues, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
diff --git a/docs/features/audio/speech-to-text/mistral-voxtral-integration.md b/docs/features/audio/speech-to-text/mistral-voxtral-integration.md
new file mode 100644
index 0000000000..f844d0ec2b
--- /dev/null
+++ b/docs/features/audio/speech-to-text/mistral-voxtral-integration.md
@@ -0,0 +1,125 @@
+---
+sidebar_position: 2
+title: "Mistral Voxtral STT"
+---
+
+# Using Mistral Voxtral for Speech-to-Text
+
+This guide covers how to use Mistral's Voxtral model for Speech-to-Text with Open WebUI. Voxtral is Mistral's speech-to-text model that provides accurate transcription.
+
+## Requirements
+
+- A Mistral API key
+- Open WebUI installed and running
+
+## Quick Setup (UI)
+
+1. Click your **profile icon** (bottom-left corner)
+2. Select **Admin Panel**
+3. Click **Settings** → **Audio** tab
+4. Configure the following:
+
+| Setting | Value |
+|---------|-------|
+| **Speech-to-Text Engine** | `MistralAI` |
+| **API Key** | Your Mistral API key |
+| **STT Model** | `voxtral-mini-latest` (or leave empty for default) |
+
+5. Click **Save**
+
+## Available Models
+
+| Model | Description |
+|-------|-------------|
+| `voxtral-mini-latest` | Default transcription model (recommended) |
+
+## Environment Variables Setup
+
+If you prefer to configure via environment variables:
+
+```yaml
+services:
+ open-webui:
+ image: ghcr.io/open-webui/open-webui:main
+ environment:
+ - AUDIO_STT_ENGINE=mistral
+ - AUDIO_STT_MISTRAL_API_KEY=your-mistral-api-key
+ - AUDIO_STT_MODEL=voxtral-mini-latest
+ # ... other configuration
+```
+
+### All Mistral STT Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_ENGINE` | Set to `mistral` | empty (uses local Whisper) |
+| `AUDIO_STT_MISTRAL_API_KEY` | Your Mistral API key | empty |
+| `AUDIO_STT_MISTRAL_API_BASE_URL` | Mistral API base URL | `https://api.mistral.ai/v1` |
+| `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS` | Use chat completions endpoint | `false` |
+| `AUDIO_STT_MODEL` | STT model | `voxtral-mini-latest` |
+
+## Transcription Methods
+
+Mistral supports two transcription methods:
+
+### Standard Transcription (Default)
+Uses the dedicated transcription endpoint. This is the recommended method.
+
+### Chat Completions Method
+Set `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS=true` to use Mistral's chat completions API for transcription. This method:
+- Requires audio in mp3 or wav format (automatic conversion is attempted)
+- May provide different results than the standard endpoint
+
+## Using STT
+
+1. Click the **microphone icon** in the chat input
+2. Speak your message
+3. Click the microphone again or wait for silence detection
+4. Your speech will be transcribed and appear in the input box
+
+## Supported Audio Formats
+
+Voxtral accepts common audio formats. The system defaults to accepting `audio/*` and `video/webm`.
+
+If using the chat completions method, audio is automatically converted to mp3.
+
+## Troubleshooting
+
+### API Key Errors
+
+If you see "Mistral API key is required":
+1. Verify your API key is entered correctly
+2. Check the API key hasn't expired
+3. Ensure your Mistral account has API access
+
+### Transcription Not Working
+
+1. Check container logs: `docker logs open-webui -f`
+2. Verify the STT Engine is set to `MistralAI`
+3. Try the standard transcription method (disable chat completions)
+
+### Audio Format Issues
+
+If using chat completions method and audio conversion fails:
+- Ensure FFmpeg is available in the container
+- Try recording in a different format (wav or mp3)
+- Switch to the standard transcription method
+
+For more troubleshooting, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
+
+## Comparison with Other STT Options
+
+| Feature | Mistral Voxtral | OpenAI Whisper | Local Whisper |
+|---------|-----------------|----------------|---------------|
+| **Cost** | Per-minute pricing | Per-minute pricing | Free |
+| **Privacy** | Audio sent to Mistral | Audio sent to OpenAI | Audio stays local |
+| **Model Options** | voxtral-mini-latest | whisper-1 | tiny → large |
+| **GPU Required** | No | No | Recommended |
+
+## Cost Considerations
+
+Mistral charges per minute of audio for STT. Check [Mistral's pricing page](https://mistral.ai/products/la-plateforme#pricing) for current rates.
+
+:::tip
+For free STT, use **Local Whisper** (the default) or the browser's **Web API** for basic transcription.
+:::
diff --git a/docs/features/audio/speech-to-text/openai-stt-integration.md b/docs/features/audio/speech-to-text/openai-stt-integration.md
new file mode 100644
index 0000000000..12dc9e60fb
--- /dev/null
+++ b/docs/features/audio/speech-to-text/openai-stt-integration.md
@@ -0,0 +1,136 @@
+---
+sidebar_position: 0
+title: "OpenAI STT Integration"
+---
+
+# Using OpenAI for Speech-to-Text
+
+This guide covers how to use OpenAI's Whisper API for Speech-to-Text with Open WebUI. This provides cloud-based transcription without needing local GPU resources.
+
+:::tip Looking for TTS?
+See the companion guide: [Using OpenAI for Text-to-Speech](/features/audio/text-to-speech/openai-tts-integration)
+:::
+
+## Requirements
+
+- An OpenAI API key with access to the Audio API
+- Open WebUI installed and running
+
+## Quick Setup (UI)
+
+1. Click your **profile icon** (bottom-left corner)
+2. Select **Admin Panel**
+3. Click **Settings** → **Audio** tab
+4. Configure the following:
+
+| Setting | Value |
+|---------|-------|
+| **Speech-to-Text Engine** | `OpenAI` |
+| **API Base URL** | `https://api.openai.com/v1` |
+| **API Key** | Your OpenAI API key |
+| **STT Model** | `whisper-1` |
+| **Supported Content Types** | Leave empty for defaults, or set `audio/wav,audio/mpeg,audio/webm` |
+
+5. Click **Save**
+
+## Available Models
+
+| Model | Description |
+|-------|-------------|
+| `whisper-1` | OpenAI's Whisper large-v2 model, hosted in the cloud |
+
+:::info
+OpenAI currently only offers `whisper-1`. For more model options, use Local Whisper (built into Open WebUI) or other providers like Deepgram.
+:::
+
+## Environment Variables Setup
+
+If you prefer to configure via environment variables:
+
+```yaml
+services:
+ open-webui:
+ image: ghcr.io/open-webui/open-webui:main
+ environment:
+ - AUDIO_STT_ENGINE=openai
+ - AUDIO_STT_OPENAI_API_BASE_URL=https://api.openai.com/v1
+ - AUDIO_STT_OPENAI_API_KEY=sk-...
+ - AUDIO_STT_MODEL=whisper-1
+ # ... other configuration
+```
+
+### All STT Environment Variables (OpenAI)
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_ENGINE` | Set to `openai` | empty (uses local Whisper) |
+| `AUDIO_STT_OPENAI_API_BASE_URL` | OpenAI API base URL | `https://api.openai.com/v1` |
+| `AUDIO_STT_OPENAI_API_KEY` | Your OpenAI API key | empty |
+| `AUDIO_STT_MODEL` | STT model | `whisper-1` |
+| `AUDIO_STT_SUPPORTED_CONTENT_TYPES` | Allowed audio MIME types | `audio/*,video/webm` |
+
+### Supported Audio Formats
+
+By default, Open WebUI accepts `audio/*` and `video/webm` for transcription. If you need to restrict or expand supported formats, set `AUDIO_STT_SUPPORTED_CONTENT_TYPES`:
+
+```yaml
+environment:
+ - AUDIO_STT_SUPPORTED_CONTENT_TYPES=audio/wav,audio/mpeg,audio/webm
+```
+
+OpenAI's Whisper API supports: `mp3`, `mp4`, `mpeg`, `mpga`, `m4a`, `wav`, `webm`
+
+## Using STT
+
+1. Click the **microphone icon** in the chat input
+2. Speak your message
+3. Click the microphone again or wait for silence detection
+4. Your speech will be transcribed and appear in the input box
+
+## OpenAI vs Local Whisper
+
+| Feature | OpenAI Whisper API | Local Whisper |
+|---------|-------------------|---------------|
+| **Latency** | Network dependent | Faster for short clips |
+| **Cost** | Per-minute pricing | Free (uses your hardware) |
+| **Privacy** | Audio sent to OpenAI | Audio stays local |
+| **GPU Required** | No | Recommended for speed |
+| **Model Options** | `whisper-1` only | tiny, base, small, medium, large |
+
+Choose **OpenAI** if:
+- You don't have a GPU
+- You want consistent performance
+- Privacy isn't a concern
+
+Choose **Local Whisper** if:
+- You want free transcription
+- You need audio to stay private
+- You have a GPU for acceleration
+
+## Troubleshooting
+
+### Microphone Not Working
+
+1. Ensure you're using HTTPS or localhost
+2. Check browser microphone permissions
+3. See [Microphone Access Issues](/troubleshooting/audio#microphone-access-issues)
+
+### Transcription Errors
+
+1. Check your OpenAI API key is valid
+2. Verify the API Base URL is correct
+3. Check container logs for error messages
+
+### Language Issues
+
+OpenAI's Whisper API automatically detects language. If you need to force a specific language, consider using Local Whisper with the `WHISPER_LANGUAGE` environment variable.
+
+For more troubleshooting, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
+
+## Cost Considerations
+
+OpenAI charges per minute of audio for STT. See [OpenAI Pricing](https://platform.openai.com/docs/pricing) for current rates.
+
+:::tip
+For free STT, use **Local Whisper** (the default) or the browser's **Web API** for basic transcription.
+:::
diff --git a/docs/features/audio/speech-to-text/stt-config.md b/docs/features/audio/speech-to-text/stt-config.md
index fcc3d6fc95..11117a3650 100644
--- a/docs/features/audio/speech-to-text/stt-config.md
+++ b/docs/features/audio/speech-to-text/stt-config.md
@@ -3,22 +3,25 @@ sidebar_position: 1
title: "Configuration"
---
-Open Web UI supports both local, browser, and remote speech to text.
+Open WebUI supports both local, browser, and remote speech to text.


-## Cloud / Remote Speech To Text Proivders
+## Cloud / Remote Speech To Text Providers
-The following cloud speech to text providers are currently supported. API keys can be configured as environment variables (OpenAI) or in the admin settings page (both keys).
+The following speech-to-text providers are supported:
- | Service | API Key Required |
- | ------------- | ------------- |
- | OpenAI | ✅ |
- | DeepGram | ✅ |
+| Service | API Key Required | Guide |
+|---------|------------------|-------|
+| Local Whisper (default) | ❌ | Built-in, see [Environment Variables](/features/audio/speech-to-text/env-variables) |
+| OpenAI (Whisper API) | ✅ | [OpenAI STT Guide](/features/audio/speech-to-text/openai-stt-integration) |
+| Mistral (Voxtral) | ✅ | [Mistral Voxtral Guide](/features/audio/speech-to-text/mistral-voxtral-integration) |
+| Deepgram | ✅ | — |
+| Azure | ✅ | — |
- WebAPI provides STT via the built-in browser STT provider.
+**Web API** provides STT via the browser's built-in speech recognition (no API key needed, configured in user settings).
## Configuring Your STT Provider
@@ -59,3 +62,39 @@ Once your recording has begun you can:
- If you wish to abort the recording (for example, you wish to start a fresh recording) you can click on the 'x' icon to scape the recording interface

+
+## Troubleshooting
+
+### Common Issues
+
+#### "int8 compute type not supported" Error
+
+If you see an error like `Error transcribing chunk: Requested int8 compute type, but the target device or backend do not support efficient int8 computation`, this usually means your GPU doesn't support the requested `int8` compute operations.
+
+**Solutions:**
+- **Upgrade to the latest version** — persistent configuration for compute type has been improved in recent updates to resolve known issues with CUDA compatibility.
+- **Switch to the standard Docker image** instead of the `:cuda` image — older GPUs (Maxwell architecture, ~2014-2016) may not be supported by modern CUDA accelerated libraries.
+- **Change the compute type** using the `WHISPER_COMPUTE_TYPE` environment variable:
+ ```yaml
+ environment:
+ - WHISPER_COMPUTE_TYPE=float16 # or float32
+ ```
+
+:::tip
+For smaller models like Whisper, CPU mode often provides comparable performance without GPU compatibility issues. The `:cuda` image primarily accelerates RAG embeddings and won't significantly impact STT speed for most users.
+:::
+
+#### Microphone Not Working
+
+1. **Check browser permissions** — ensure your browser has microphone access
+2. **Use HTTPS** — some browsers require secure connections for microphone access
+3. **Try another browser** — Chrome typically has the best support for web audio APIs
+
+#### Poor Recognition Accuracy
+
+- **Set the language explicitly** using `WHISPER_LANGUAGE=en` (uses ISO 639-1 codes)
+- **Toggles multilingual support** — Use `WHISPER_MULTILINGUAL=true` if you need to support languages other than English. When disabled (default), only the English-only variant of the model is used for better performance in English tasks.
+- **Use a larger Whisper model** — options: `tiny`, `base`, `small`, `medium`, `large`
+- Larger models are more accurate but slower
+
+For more detailed troubleshooting, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
diff --git a/docs/features/audio/text-to-speech/Kokoro-FastAPI-integration.md b/docs/features/audio/text-to-speech/Kokoro-FastAPI-integration.md
index 101ab230f9..248d2c9412 100644
--- a/docs/features/audio/text-to-speech/Kokoro-FastAPI-integration.md
+++ b/docs/features/audio/text-to-speech/Kokoro-FastAPI-integration.md
@@ -138,3 +138,42 @@ docker compose up --build
**That's it!**
For more information on building the Docker container, including changing ports, please refer to the [Kokoro-FastAPI](https://github.com/remsky/Kokoro-FastAPI) repository
+
+## Troubleshooting
+
+### NVIDIA GPU Not Detected
+
+If the GPU version isn't using your GPU:
+
+1. **Install NVIDIA Container Toolkit:**
+ ```bash
+ # Ubuntu/Debian
+ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
+ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
+ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
+ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
+ sudo systemctl restart docker
+ ```
+
+2. **Verify GPU access:**
+ ```bash
+ docker run --rm --gpus all nvidia/cuda:12.2.0-base nvidia-smi
+ ```
+
+### Connection Issues from Open WebUI
+
+If Open WebUI can't reach Kokoro:
+
+- Use `host.docker.internal:8880` instead of `localhost:8880` (Docker Desktop)
+- If both are in Docker Compose, use `http://kokoro-fastapi-gpu:8880/v1`
+- Verify the service is running: `curl http://localhost:8880/health`
+
+### CPU Version Performance
+
+The CPU version uses ONNX optimization and performs well for most use cases. If speed is a concern:
+
+- Consider upgrading to the GPU version
+- Ensure no other heavy processes are running on the CPU
+- The CPU version is recommended for systems without compatible NVIDIA GPUs
+
+For more troubleshooting tips, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
diff --git a/docs/features/audio/text-to-speech/chatterbox-tts-api-integration.md b/docs/features/audio/text-to-speech/chatterbox-tts-api-integration.md
index 1b2c3c8d1a..c65bf61868 100644
--- a/docs/features/audio/text-to-speech/chatterbox-tts-api-integration.md
+++ b/docs/features/audio/text-to-speech/chatterbox-tts-api-integration.md
@@ -234,3 +234,31 @@ For more information on `chatterbox-tts-api`, you can visit the [GitHub repo](ht
- 📖 **Documentation**: See [API Documentation](https://github.com/travisvn/chatterbox-tts-api/blob/main/docs/API_README.md) and [Docker Guide](https://github.com/travisvn/chatterbox-tts-api/blob/main/docs/DOCKER_README.md)
- 💬 **Discord**: [Join the Discord for this project](http://chatterboxtts.com/discord)
+
+## Troubleshooting
+
+### Memory Requirements
+
+Chatterbox has higher memory requirements than other TTS solutions:
+- **Minimum:** 4GB RAM
+- **Recommended:** 8GB+ RAM
+- **GPU:** NVIDIA CUDA or Apple M-series (MPS) recommended
+
+If you experience memory issues, consider using a lighter alternative like [OpenAI Edge TTS](/features/audio/text-to-speech/openai-edge-tts-integration) or [Kokoro-FastAPI](/features/audio/text-to-speech/Kokoro-FastAPI-integration).
+
+### Docker Networking
+
+If Open WebUI can't connect to Chatterbox:
+
+- **Docker Desktop:** Use `http://host.docker.internal:4123/v1`
+- **Docker Compose:** Use `http://chatterbox-tts-api:4123/v1`
+- **Linux:** Use your host machine's IP address
+
+### First-Time Startup
+
+The first TTS request takes significantly longer as the model loads. Check logs with:
+```bash
+docker logs chatterbox-tts-api -f
+```
+
+For more troubleshooting tips, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
diff --git a/docs/features/audio/text-to-speech/kokoro-web-integration.md b/docs/features/audio/text-to-speech/kokoro-web-integration.md
index 5801618404..7c66f61bd2 100644
--- a/docs/features/audio/text-to-speech/kokoro-web-integration.md
+++ b/docs/features/audio/text-to-speech/kokoro-web-integration.md
@@ -89,4 +89,27 @@ Visit the [**Kokoro Web Demo**](https://voice-generator.pages.dev) to preview al
For additional options, voice customization guides, and advanced settings, visit the [GitHub repository](https://github.com/eduardolat/kokoro-web).
+## Troubleshooting
+
+### Connection Issues
+
+If Open WebUI can't reach Kokoro Web:
+
+- **Docker Desktop (Windows/Mac):** Use `http://host.docker.internal:3000/api/v1`
+- **Docker Compose (same network):** Use `http://kokoro-web:3000/api/v1`
+- **Linux Docker:** Use your host machine's IP address
+
+### Voice Not Working
+
+1. Verify the secret API key matches in both the Kokoro Web config and Open WebUI settings
+2. Test the API directly:
+ ```bash
+ curl -X POST http://localhost:3000/api/v1/audio/speech \
+ -H "Authorization: Bearer your-api-key" \
+ -H "Content-Type: application/json" \
+ -d '{"input": "Hello world", "voice": "af_heart"}'
+ ```
+
+For more troubleshooting tips, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
+
**Enjoy natural AI voices in your OpenWebUI conversations!**
diff --git a/docs/features/audio/text-to-speech/openai-edge-tts-integration.md b/docs/features/audio/text-to-speech/openai-edge-tts-integration.md
index 7bd30a307c..232a1993d2 100644
--- a/docs/features/audio/text-to-speech/openai-edge-tts-integration.md
+++ b/docs/features/audio/text-to-speech/openai-edge-tts-integration.md
@@ -261,3 +261,66 @@ For direct support, you can visit the [Voice AI & TTS Discord](https://tts.travi
## 🎙️ Voice Samples
[Play voice samples and see all available Edge TTS voices](https://tts.travisvn.com/)
+
+## Troubleshooting
+
+### Connection Issues
+
+#### "localhost" Not Working from Docker
+
+If Open WebUI runs in Docker and can't reach the TTS service at `localhost:5050`:
+
+**Solutions:**
+- Use `host.docker.internal:5050` instead of `localhost:5050` (Docker Desktop on Windows/Mac)
+- On Linux, use the host's IP address, or add `--network host` to your Docker run command
+- If both services are in Docker Compose, use the container name: `http://openai-edge-tts:5050/v1`
+
+**Example Docker Compose for both services on the same network:**
+
+```yaml
+services:
+ open-webui:
+ image: ghcr.io/open-webui/open-webui:main
+ environment:
+ - AUDIO_TTS_ENGINE=openai
+ - AUDIO_TTS_OPENAI_API_BASE_URL=http://openai-edge-tts:5050/v1
+ - AUDIO_TTS_OPENAI_API_KEY=your_api_key_here
+ networks:
+ - webui-network
+
+ openai-edge-tts:
+ image: travisvn/openai-edge-tts:latest
+ ports:
+ - "5050:5050"
+ environment:
+ - API_KEY=your_api_key_here
+ networks:
+ - webui-network
+
+networks:
+ webui-network:
+ driver: bridge
+```
+
+#### Testing the TTS Service
+
+Verify the TTS service is working independently:
+
+```bash
+curl -X POST http://localhost:5050/v1/audio/speech \
+ -H "Content-Type: application/json" \
+ -H "Authorization: Bearer your_api_key_here" \
+ -d '{"input": "Test message", "voice": "alloy"}' \
+ --output test.mp3
+```
+
+If this works but Open WebUI still can't connect, the issue is network-related between containers.
+
+### No Audio Output in Open WebUI
+
+1. Check that the API Base URL ends with `/v1`
+2. Verify the API key matches between both services (or remove the requirement)
+3. Check Open WebUI container logs: `docker logs open-webui`
+4. Check openai-edge-tts logs: `docker logs openai-edge-tts` (or your container name)
+
+For more troubleshooting tips, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
diff --git a/docs/features/audio/text-to-speech/openai-tts-integration.md b/docs/features/audio/text-to-speech/openai-tts-integration.md
new file mode 100644
index 0000000000..ec6958009c
--- /dev/null
+++ b/docs/features/audio/text-to-speech/openai-tts-integration.md
@@ -0,0 +1,162 @@
+---
+sidebar_position: 0
+title: "OpenAI TTS Integration"
+---
+
+# Using OpenAI for Text-to-Speech
+
+This guide covers how to use OpenAI's official Text-to-Speech API with Open WebUI. This is the simplest setup if you already have an OpenAI API key.
+
+:::tip Looking for STT?
+See the companion guide: [Using OpenAI for Speech-to-Text](/features/audio/speech-to-text/openai-stt-integration)
+:::
+
+## Requirements
+
+- An OpenAI API key with access to the Audio API
+- Open WebUI installed and running
+
+## Quick Setup (UI)
+
+1. Click your **profile icon** (bottom-left corner)
+2. Select **Admin Panel**
+3. Click **Settings** → **Audio** tab
+4. Configure the following:
+
+| Setting | Value |
+|---------|-------|
+| **Text-to-Speech Engine** | `OpenAI` |
+| **API Base URL** | `https://api.openai.com/v1` |
+| **API Key** | Your OpenAI API key |
+| **TTS Model** | `tts-1` or `tts-1-hd` |
+| **TTS Voice** | Choose from available voices |
+
+5. Click **Save**
+
+## Available Models
+
+| Model | Description | Best For |
+|-------|-------------|----------|
+| `tts-1` | Standard quality, lower latency | Real-time applications, faster responses |
+| `tts-1-hd` | Higher quality audio | Pre-recorded content, premium audio quality |
+
+## Available Voices
+
+OpenAI provides 6 built-in voices:
+
+| Voice | Description |
+|-------|-------------|
+| `alloy` | Neutral, balanced |
+| `echo` | Warm, conversational |
+| `fable` | Expressive, British accent |
+| `onyx` | Deep, authoritative |
+| `nova` | Friendly, upbeat |
+| `shimmer` | Soft, gentle |
+
+:::tip
+Try different voices to find the one that best suits your use case. You can preview voices in OpenAI's documentation.
+:::
+
+## Per-Model TTS Voice
+
+You can assign a specific TTS voice to individual models, allowing different AI personas to have distinct voices. This is configured in the Model Editor.
+
+### Setting a Model-Specific Voice
+
+1. Go to **Workspace > Models**
+2. Click the **Edit** (pencil) icon on the model you want to configure
+3. Scroll down to find the **TTS Voice** field
+4. Enter the voice name (e.g., `alloy`, `echo`, `shimmer`, `onyx`, `nova`, `fable`)
+5. Click **Save**
+
+### Voice Priority
+
+When playing TTS audio, Open WebUI uses the following priority:
+
+1. **Model-specific TTS voice** (if set in Model Editor)
+2. **User's personal voice setting** (if configured in user settings)
+3. **System default voice** (configured by admin)
+
+This allows admins to give each AI persona a consistent voice while still letting users override with their personal preference when no model-specific voice is set.
+
+### Use Cases
+
+- **Character personas**: Give a "British Butler" model the `fable` voice, while an "Energetic Assistant" uses `nova`
+- **Language learning**: Assign appropriate voices for different language tutors
+- **Accessibility**: Set clearer voices for models designed for accessibility use cases
+
+## Environment Variables Setup
+
+If you prefer to configure via environment variables:
+
+```yaml
+services:
+ open-webui:
+ image: ghcr.io/open-webui/open-webui:main
+ environment:
+ - AUDIO_TTS_ENGINE=openai
+ - AUDIO_TTS_OPENAI_API_BASE_URL=https://api.openai.com/v1
+ - AUDIO_TTS_OPENAI_API_KEY=sk-...
+ - AUDIO_TTS_MODEL=tts-1
+ - AUDIO_TTS_VOICE=alloy
+ # ... other configuration
+```
+
+### All TTS Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_TTS_ENGINE` | Set to `openai` | empty |
+| `AUDIO_TTS_OPENAI_API_BASE_URL` | OpenAI API base URL | `https://api.openai.com/v1` |
+| `AUDIO_TTS_OPENAI_API_KEY` | Your OpenAI API key | empty |
+| `AUDIO_TTS_MODEL` | TTS model (`tts-1` or `tts-1-hd`) | `tts-1` |
+| `AUDIO_TTS_VOICE` | Voice to use | `alloy` |
+
+## Testing TTS
+
+1. Start a new chat
+2. Send a message to any model
+3. Click the **speaker icon** on the AI response to hear it read aloud
+
+## Response Splitting
+
+When reading long responses, Open WebUI can split text into chunks before sending them to the TTS engine. This is configured in **Admin Panel > Settings > Audio** under **Response Splitting**.
+
+| Option | Description |
+|--------|-------------|
+| **Punctuation** (default) | Splits at sentence boundaries: periods (`.`), exclamation marks (`!`), question marks (`?`), and newlines. Best for natural pacing. |
+| **Paragraphs** | Splits only at paragraph breaks (double newlines). Results in longer audio chunks. |
+| **None** | Sends the entire response as one chunk. May cause delays before audio starts on long responses. |
+
+:::tip
+**Punctuation** mode is recommended for most use cases. It provides the best balance of streaming performance (audio starts quickly) and natural speech pacing.
+:::
+
+## Troubleshooting
+
+### No Audio Plays
+
+1. Check your OpenAI API key is valid and has Audio API access
+2. Verify the API Base URL is correct (`https://api.openai.com/v1`)
+3. Check browser console (F12) for errors
+
+### Audio Quality Issues
+
+- Switch from `tts-1` to `tts-1-hd` for higher quality
+- Note: `tts-1-hd` has slightly higher latency
+
+### Rate Limits
+
+OpenAI has rate limits on the Audio API. If you're hitting limits:
+- Consider caching common phrases
+- Use `tts-1` instead of `tts-1-hd` (uses fewer tokens)
+
+For more troubleshooting, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
+
+## Cost Considerations
+
+OpenAI charges per character for TTS. See [OpenAI Pricing](https://platform.openai.com/docs/pricing) for current rates. Note that `tts-1-hd` costs more than `tts-1`.
+
+:::info
+For a free alternative, consider [OpenAI Edge TTS](/features/audio/text-to-speech/openai-edge-tts-integration) which uses Microsoft's free Edge browser TTS.
+:::
diff --git a/docs/features/audio/text-to-speech/openedai-speech-integration.md b/docs/features/audio/text-to-speech/openedai-speech-integration.md
index b4813e71f9..f866fe605f 100644
--- a/docs/features/audio/text-to-speech/openedai-speech-integration.md
+++ b/docs/features/audio/text-to-speech/openedai-speech-integration.md
@@ -190,6 +190,40 @@ If you encounter any problems integrating `openedai-speech` with Open WebUI, fol
- If you're still experiencing issues, try restarting the `openedai-speech` service or the entire Docker environment.
- If the problem persists, consult the `openedai-speech` GitHub repository or seek help on a relevant community forum.
+### GPU Memory Issues (XTTS)
+
+If XTTS fails to load or causes out-of-memory errors:
+- XTTS requires approximately 4GB of GPU VRAM
+- Consider using the minimal Piper-only image (`docker-compose.min.yml`) which runs on CPU
+- Reduce other GPU memory usage before starting the container
+
+### AMD GPU (ROCm) Notes
+
+When using AMD GPUs:
+1. Uncomment `USE_ROCM=1` in your `speech.env` file
+2. Use the `docker-compose.rocm.yml` file
+3. Ensure ROCm drivers are properly installed on the host
+
+### ARM64 / Apple Silicon
+
+- XTTS has CPU-only support on ARM64 and will be **very slow**
+- Use the Piper-only image (`docker-compose.min.yml`) for acceptable performance on ARM devices
+- Apple M-series chips work but benefit from the minimal image
+
+### Container Networking
+
+If using Docker networks:
+```yaml
+# Add to your Docker Compose
+networks:
+ webui-network:
+ driver: bridge
+```
+
+Then reference `http://openedai-speech:8000/v1` instead of `localhost`.
+
+For more troubleshooting tips, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
+
## FAQ
**How can I control the emotional range of the generated audio?**
diff --git a/docs/features/auth/ldap.mdx b/docs/features/auth/ldap.mdx
index 0250006326..c0dbd40787 100644
--- a/docs/features/auth/ldap.mdx
+++ b/docs/features/auth/ldap.mdx
@@ -131,7 +131,11 @@ LDAP_APP_PASSWORD="admin"
LDAP_SEARCH_BASE="dc=example,dc=org"
LDAP_ATTRIBUTE_FOR_USERNAME="uid"
LDAP_ATTRIBUTE_FOR_MAIL="mail"
-LDAP_SEARCH_FILTER="(uid=%(user)s)" # More secure and performant
+# LDAP_SEARCH_FILTER is optional and used for additional filtering conditions.
+# The username filter is automatically added by Open WebUI, so do NOT include
+# user placeholder syntax like %(user)s or %s - these are not supported.
+# Leave empty for simple setups, or add group membership filters, e.g.:
+# LDAP_SEARCH_FILTER="(memberOf=cn=allowed-users,ou=groups,dc=example,dc=org)"
```
### UI Configuration
diff --git a/docs/features/channels/index.md b/docs/features/channels/index.md
index ef0d7a4397..63cf63e107 100644
--- a/docs/features/channels/index.md
+++ b/docs/features/channels/index.md
@@ -36,7 +36,7 @@ Direct Message (DM) channels enable private conversations:
- Display participant avatars instead of channel icons
- Can be hidden from sidebar while preserving message history
- Automatically reappear when new messages arrive
-- Show online/offline status indicator for participants
+- Show online/offline status indicator for participants (Can be disabled via **Admin Panel > Settings > General > User Status**)
## Enabling Channels
@@ -164,6 +164,23 @@ Channels support granular access control:
* **Read-only access:** Users can view content but cannot contribute
* **Feature toggle:** Administrators can control channel access via `USER_PERMISSIONS_FEATURES_CHANNELS` environment variable or group permissions in the admin panel
+---
+
+## Native Channel Awareness (Agentic)
+
+When using a model with **Native Function Calling** enabled (see the [**Central Tool Calling Guide**](/features/plugin/tools#tool-calling-modes-default-vs-native)), models can navigate and search through your organization's channels autonomously.
+
+### Available Channel Tools:
+- **`search_channels`**: The model can find channels by name or description to identify where relevant discussions might be happening.
+- **`search_channel_messages`**: The model can search for specific keywords or topics across all channels it has access to.
+- **`view_channel_message`**: The model can read specific individual messages and their metadata.
+- **`view_channel_thread`**: The model can retrieve an entire conversation thread to understand the full context of a discussion.
+
+### Why use native tool calling for Channels?
+This removes the need for human users to manually bridge information between private chats and public channels. You can ask an AI: *"Check the #dev-team channel and summarize the latest updates on the deployment issue,"* or *"What was decided in the #marketing-strategy thread about the logo?"*
+
+The model will use its "Agentic" loop to find the channel, search for relevant messages, read the full thread, and provide you with a synthesized answer—all without you leaving your current chat.
+
## Use Cases
### 1. Team Development (`#dev-team`)
diff --git a/docs/features/chat-features/autocomplete.md b/docs/features/chat-features/autocomplete.md
index 8c8585e63e..bcf1da52ab 100644
--- a/docs/features/chat-features/autocomplete.md
+++ b/docs/features/chat-features/autocomplete.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 10
+sidebar_position: 5
title: "Autocomplete"
---
diff --git a/docs/features/chat-features/chat-params.md b/docs/features/chat-features/chat-params.md
index f2238cbb4b..e0ce7b4173 100644
--- a/docs/features/chat-features/chat-params.md
+++ b/docs/features/chat-features/chat-params.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 4
+sidebar_position: 6
title: "Chat Parameters"
---
diff --git a/docs/features/chat-features/chatshare.md b/docs/features/chat-features/chatshare.md
index e77f5e1a61..619162d070 100644
--- a/docs/features/chat-features/chatshare.md
+++ b/docs/features/chat-features/chatshare.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 4
+sidebar_position: 3
title: "Chat Sharing"
---
diff --git a/docs/features/chat-features/code-execution/artifacts.md b/docs/features/chat-features/code-execution/artifacts.md
index ab2ebf4115..5d2bcd2fbe 100644
--- a/docs/features/chat-features/code-execution/artifacts.md
+++ b/docs/features/chat-features/code-execution/artifacts.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 1
+sidebar_position: 2
title: "Artifacts"
---
diff --git a/docs/features/chat-features/code-execution/index.md b/docs/features/chat-features/code-execution/index.md
index d2ef064980..74d2660ee0 100644
--- a/docs/features/chat-features/code-execution/index.md
+++ b/docs/features/chat-features/code-execution/index.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 5
+sidebar_position: 1
title: "Code Execution"
---
diff --git a/docs/features/chat-features/code-execution/mermaid.md b/docs/features/chat-features/code-execution/mermaid.md
index 40a4a0e800..6f4de5d8cf 100644
--- a/docs/features/chat-features/code-execution/mermaid.md
+++ b/docs/features/chat-features/code-execution/mermaid.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 3
+sidebar_position: 4
title: "MermaidJS Rendering"
---
diff --git a/docs/features/chat-features/code-execution/python.md b/docs/features/chat-features/code-execution/python.md
index 3b869ec5c6..b1aaa89bdd 100644
--- a/docs/features/chat-features/code-execution/python.md
+++ b/docs/features/chat-features/code-execution/python.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 2
+sidebar_position: 3
title: "Python Code Execution"
---
diff --git a/docs/features/chat-features/conversation-organization.md b/docs/features/chat-features/conversation-organization.md
index aaa477223e..23fc18848c 100644
--- a/docs/features/chat-features/conversation-organization.md
+++ b/docs/features/chat-features/conversation-organization.md
@@ -1,62 +1,129 @@
---
sidebar_position: 4
-title: "Organizing Conversations"
+title: "Folders & Projects"
---
-Open WebUI provides powerful organization features that help users manage their conversations. You can easily categorize and tag conversations, making it easier to find and retrieve them later. The two primary ways to organize conversations are through **Folders** and **Tags**.
+# Folders & Projects
-## Folders: From Simple Organization to Powerful Projects
+Open WebUI provides powerful folder-based organization that turns simple chat containers into full-featured **project workspaces**. Folders allow you to not only group related conversations but also define specific contexts, system prompts, and knowledge bases that apply to all chats within them.
-Folders in Open WebUI have evolved from simple containers into powerful, project-like workspaces. They allow you to not only group related conversations but also to define specific contexts, instructions, and knowledge bases for those conversations.
+## Enabling Folders
-### Basic Folder Operations
+Folders are enabled by default. Administrators can control this feature via:
-At their core, folders still allow you to keep your chat list tidy:
+- **Admin Panel**: The folders feature is controlled globally alongside other features.
+- **Environment Variable**: [`ENABLE_FOLDERS`](/getting-started/env-configuration#enable_folders) - Set to `True` (default) to enable or `False` to disable.
-- **Creating a Folder**: You can create a new folder to store specific conversations. This is useful if you want to keep conversations of a similar topic or purpose together.
-- **Moving Conversations into Folders**: Existing conversations can be moved into folders by dragging and dropping them. This allows you to structure your workspace in a way that suits your workflow.
+## Core Features
-
+### Creating Folders
-### Starting a Chat within a Folder
+Create a new folder to organize your conversations:
-By simply clicking on a folder in the sidebar, you select the folder as your space to start a chat in. The main chat interface will then update to show that you selected that folder and any new chat you start will now automatically be created inside this folder, inheriting its unique settings.
+1. In the **sidebar**, click the **+ button** next to "Chats" or right-click in the chat list.
+2. Select **"New Folder"**.
+3. Enter a name for your folder.
+4. Click **Save**.
-### Editing Folder Settings: System Prompts & Knowledge
+### Moving Conversations into Folders
-You can give each folder a unique personality and context. By hovering over a folder, clicking the three-dot menu, and selecting **"Edit"**, you will open the folder's settings modal popup. Here, you can configure:
+Organize existing chats by moving them into folders:
-- **Folder Name**: Change the name of your folder to better reflect its purpose.
-- **System Prompt**: Optionally assign a dedicated System Prompt to the folder. This prompt is automatically prepended to every new conversation and message created within that folder, tailoring the AI's behavior for specific tasks. You can still use folders for organization without a system prompt.
-- **Attached Knowledge**: Link one or more knowledge bases to your folder. Any files attached here will automatically be included as context in all new chats within that project folder. This is also optional; you can still use folders for organization, without attaching extra knowledge bases.
+- **Drag and Drop**: Click and drag any conversation from the sidebar into a folder.
+- **Right-click Menu**: Right-click on a conversation and select "Move to Folder".
-### Example Use Case
+### Nested Folders
-:::tip
+Folders can be nested within other folders to create hierarchical organization:
-**Creating a 'Python Expert' Project**
-Imagine you are working on a Python project. You can create a folder called "Python Expert".
+- Drag a folder onto another folder to make it a subfolder.
+- Use the right-click menu to move folders between parent folders.
+- Folders can be expanded or collapsed to show/hide their contents.
-1. **Edit the folder** and set the System Prompt to something like: `You are an expert Python developer. You provide clean, efficient, and well-documented code. When asked for code, you prioritize clarity and adherence to PEP 8 standards.`
-2. **Attach Knowledge** by linking a knowledge base which contains a PDF of your project's technical specification, or a specific library's documentation.
-3. **Activate/Select the folder** by clicking on it.
-4. Now, any new chat you start will automatically have this expert persona, the context of your documents and is saved within the folder, ensuring you get highly relevant and specialized assistance for your project.
+### Starting a Chat in a Folder
+
+When you click on a folder in the sidebar, it becomes your **active workspace**:
+
+1. Click on any folder in the sidebar to select it.
+2. The chat interface will show that folder is active.
+3. Any new chat you start will automatically be created inside this folder.
+4. New chats will **inherit the folder's settings** (system prompt and knowledge).
+
+## Folder Settings (Project Configuration)
+
+Folders can be configured as full project workspaces with their own AI behavior and context. To edit folder settings:
+
+1. Hover over a folder in the sidebar.
+2. Click the **three-dot menu** (⋯).
+3. Select **"Edit"** to open the folder settings modal.
+
+### Folder Name
+
+Change the name of your folder to better reflect its purpose or project.
+
+### Folder Background Image
+
+Customize the visual appearance of your folder by uploading a background image. This helps visually distinguish different projects in your workspace.
+
+### System Prompt
+
+Assign a dedicated **System Prompt** to the folder that automatically applies to all conversations within it:
+
+- The system prompt is **prepended to every new conversation** created in the folder.
+- This tailors the AI's behavior for specific tasks or personas.
+- System prompts are optional—you can use folders purely for organization without one.
+
+:::info
+
+The System Prompt field is only visible if you have permission to set system prompts (controlled by admin settings).
:::
-## Tagging Conversations
+### Attached Knowledge
+
+Link **knowledge bases and files** to your folder:
+
+- All attached files and knowledge bases are automatically included as **context** for every chat in the folder.
+- This enables RAG (Retrieval Augmented Generation) for all folder conversations.
+- Knowledge is optional—folders work for organization without any attached files.
+
+## Example Use Case
+
+:::tip **Creating a "Python Expert" Project**
-Tags provide an additional layer of organization by allowing you to label conversations with keywords or phrases.
+Imagine you're working on a Python development project:
-- **Adding Tags to Conversations**: Tags can be applied to conversations based on their content or purpose. Tags are flexible and can be added or removed as needed.
-
-- **Using Tags for Searching**: Tags make it easy to locate specific conversations by using the search feature. You can filter conversations by tags to quickly find those related to specific topics.
+1. **Create a folder** named "Python Expert".
+2. **Edit the folder** and set the System Prompt:
+ ```
+ You are an expert Python developer. You provide clean, efficient, and well-documented code. When asked for code, prioritize clarity and adherence to PEP 8 standards.
+ ```
+3. **Attach Knowledge** by linking your project's technical specification PDF or library documentation.
+4. **Click on the folder** to select it as your active workspace.
+5. **Start chatting** — every new conversation will have:
+ - The expert Python persona
+ - Access to your project documents
+ - Automatic organization in the folder
-### Example Use Case
+:::
+
+## Tags (Complementary Organization)
+
+In addition to folders, **tags** provide a flexible labeling system for conversations:
-:::tip
+- **Adding Tags**: Apply keyword labels to conversations based on content or purpose.
+- **Searching by Tags**: Filter conversations by tags using the search feature.
+- **Flexible Organization**: Tags can be added or removed at any time and don't affect folder structure.
-**Tagging by Topic**
-If you frequently discuss certain topics, such as "marketing" or "development," you can tag conversations with these terms. Later, when you search for a specific tag, all relevant conversations will be quickly accessible.
+:::tip **Tagging by Topic**
+
+If you frequently discuss topics like "marketing" or "development," tag conversations with these terms. When you search for a specific tag, all relevant conversations are quickly accessible regardless of which folder they're in.
:::
+
+## Related Configuration
+
+| Setting | Description |
+|---------|-------------|
+| [`ENABLE_FOLDERS`](/getting-started/env-configuration#enable_folders) | Enable/disable the folders feature globally (Default: `True`) |
+| [`USER_PERMISSIONS_FEATURES_FOLDERS`](/getting-started/env-configuration#user_permissions_features_folders) | Control user-level access to the folders feature (Default: `True`) |
diff --git a/docs/features/chat-features/history-search.mdx b/docs/features/chat-features/history-search.mdx
new file mode 100644
index 0000000000..106aaa9c7e
--- /dev/null
+++ b/docs/features/chat-features/history-search.mdx
@@ -0,0 +1,51 @@
+---
+sidebar_position: 10
+title: "History & Search"
+---
+
+# History & Search
+
+Open WebUI provides a powerful system for managing and navigating your previous conversations. Whether you are looking for a specific snippet of code from last month or trying to organize hundreds of chats, the history and search features ensure your data is always accessible.
+
+## Chat History Sidebar
+
+All your conversations are automatically saved in the **Sidebar**.
+
+* **Persistence**: Chats are saved to the internal database (`webui.db`) and are available across all your devices.
+* **Organization**: Chats are grouped by time period (Today, Yesterday, Previous 7 Days, etc.).
+* **Renaming**: Titles are automatically generated by a task model, but you can manually rename any chat by clicking the pencil icon next to its title.
+* **Archivial**: Instead of deleting, you can **Archive** chats to remove them from the main list while keeping them downloadable and searchable.
+
+## Searching Your History
+
+You can search through your conversations using the global search bar in the sidebar.
+
+1. Click the **Search** icon or use the keyboard shortcut `Cmd+K` / `Ctrl+K`.
+2. Type your query. Open WebUI performs a fuzzy search across:
+ * **Chat Titles**
+ * **Message Content**
+ * **Tags** (Search using `tag:my-tag-name`)
+3. Click on a result to immediately jump to that conversation.
+
+---
+
+## Native Conversation Search (Agentic)
+
+When using a model with **Native Function Calling** enabled (see the [**Central Tool Calling Guide**](/features/plugin/tools#tool-calling-modes-default-vs-native)), models can search through your chat history autonomously.
+
+### Available History Tools:
+- **`search_chats`**: The model can search all your previous conversations for a specific topic or fact.
+- **`view_chat`**: After finding a relevant chat ID, the model can retrieve and "read" the full message history of that session.
+
+### Why use native tool calling for Search?
+This allows the model to leverage its own previous work. You can ask: *"What was the feedback I got on my last email draft?"* or *"Find the Python script I wrote about image processing two weeks ago."*
+
+The model will search your history, identify the correct chat, read it, and provide you with an answer or the code you were looking for—saving you the effort of manual scrolling and copy-pasting.
+
+---
+
+## Data Management
+
+* **Export**: You can download individual chats or your entire history as **JSON**, **PDF**, or **Markdown**.
+* **Import**: Drag and drop a JSON chat export into the sidebar to restore it.
+* **Deletion**: You can delete individual chats or clear your entire history from **Settings > Chats**.
diff --git a/docs/features/chat-features/index.mdx b/docs/features/chat-features/index.mdx
index 437c10f9fd..93f191b8b5 100644
--- a/docs/features/chat-features/index.mdx
+++ b/docs/features/chat-features/index.mdx
@@ -1,5 +1,5 @@
---
-sidebar_position: 800
+sidebar_position: 1
title: "Chat Features"
---
@@ -9,7 +9,7 @@ Open WebUI provides a comprehensive set of chat features designed to enhance you
## Core Chat Features
-- **[🗂️ Conversation Organization](./conversation-organization.md)**: Organize chats with folders and tags to keep your workspace tidy and structured.
+- **[📁 Folders & Projects](./conversation-organization.md)**: Transform folders into powerful project workspaces with custom system prompts and attached knowledge bases.
- **[🔗 URL Parameters](./url-params.md)**: Configure chat sessions through URL parameters, enabling quick setup of models, tools, and other features.
@@ -18,3 +18,9 @@ Open WebUI provides a comprehensive set of chat features designed to enhance you
- **[✨ Autocomplete](./autocomplete.md)**: AI-powered text prediction that helps you write prompts faster using a task model.
- **[🗨️ Chat Sharing](./chatshare.md)**: Share conversations locally or via the Open WebUI Community platform with controllable privacy settings.
+
+- **[🔍 History & Search](./history-search.mdx)**: Navigate and search your previous conversations, or allow models to search them autonomously via native tools.
+
+- **[🕒 Temporal Awareness](./temporal-awareness.mdx)**: How models understand time and date, including native tools for precise time calculations.
+
+- **[🧠 Reasoning & Thinking Models](./reasoning-models.mdx)**: Specialized support for models that generate internal chains of thought using thinking tags.
diff --git a/docs/features/chat-features/multi-model-chats.mdx b/docs/features/chat-features/multi-model-chats.mdx
new file mode 100644
index 0000000000..04d6d9e2fd
--- /dev/null
+++ b/docs/features/chat-features/multi-model-chats.mdx
@@ -0,0 +1,70 @@
+---
+sidebar_position: 2
+title: "Multi-Model Chats"
+---
+
+# Multi-Model Chats
+
+Open WebUI allows you to interact with **multiple models simultaneously** within a single chat interface. This powerful feature enables you to compare responses, verify facts, and leverage the unique strengths of different LLMs side-by-side.
+
+## Overview
+
+In a Multi-Model Chat, your prompt is sent to two or more selected models at the same time. Their responses are displayed in parallel columns (or stacked, depending on screen size), giving you immediate insight into how different AI architectures approach the same problem.
+
+## How to Use
+
+1. **Select Models**: In the chat header (Model Selector), click the **+ (Plus)** button to add more models to your current session.
+ * *Example Setup*: Select **GPT-5.1 Thinking** (for reasoning), **Gemini 3** (for creative writing), and **Claude Sonnet 4.5** (for overall performance).
+2. **Send Prompt**: Type your question as usual.
+3. **View Results**: Watch as all models generate their responses simultaneously in the chat window.
+
+## Usage Scenarios
+
+* **Model Comparison/Benchmarking**: Test which model writes better Python code or which one hallucinates less on niche topics.
+* **Fact Validation**: "Cross-examine" models. If two models say X and one says Y, you can investigate further.
+* **Diverse Perspectives**: Get a "Creative" take from one model and a "Technical" take from another for the same query.
+
+## Permissions
+
+Admins can control access to Multi-Model Chats on a per-role or per-group basis.
+
+* **Location**: Admin Panel > Settings > General > User Permissions > Chat > **Multiple Models**
+* **Environment Variable**: `USER_PERMISSIONS_CHAT_MULTIPLE_MODELS` (Default: `True`)
+
+If disabled, users will not see the "plus" button in the model selector and cannot initiate multi-model sessions.
+
+---
+
+## Merging Responses (Mixture of Agents)
+
+Once you have responses from multiple models, Open WebUI offers an advanced capability to **Merge** them into a single, superior answer. This implements a **Mixture of Agents (MOA)** workflow.
+
+### What is Merging?
+
+Merging takes the outputs from all your active models and sends them—along with your original prompt—to a "Synthesizer Model." This Synthesizer Model reads all the draft answers and combines them into one final, polished response.
+
+### How to Merge
+
+1. Start a **Multi-Model Chat** and get responses from your selected models.
+2. Look for the **Merge** (or "Synthesize") button in the response controls area (often near the regeneration controls).
+3. Open WebUI will generate a **new response** that aggregates the best parts of the previous outputs.
+
+### Advantages of Merging
+
+* **Higher Accuracy**: Research suggests that aggregating outputs from multiple models often outperforms any single model acting alone.
+* **Best of Both Worlds**: You might get the code accuracy of Model A combined with the clear explanations of Model B.
+* **Reduced Hallucinations**: The synthesizer model can filter out inconsistencies found in individual responses.
+
+### Configuration
+
+The merging process relies on the backend **Tasks** system.
+
+* **Task Model**: The specific model used to perform the merger can be configured in **Admin Panel > Settings > Tasks**. We recommend using a highly capable model (like GPT-5.1 or Claude Sonnet 4.5) as the task model for the best results.
+* **Prompt Template**: The system uses a specialized prompt template to instruct the AI on how to synthesize the answers.
+
+:::info Experimental
+The Merging/MOA feature is an advanced capability. While powerful, it requires a capable Task Model to work effectively.
+:::
+
+
+
diff --git a/docs/features/chat-features/reasoning-models.mdx b/docs/features/chat-features/reasoning-models.mdx
new file mode 100644
index 0000000000..5233cf596c
--- /dev/null
+++ b/docs/features/chat-features/reasoning-models.mdx
@@ -0,0 +1,393 @@
+---
+sidebar_position: 10
+title: "Reasoning & Thinking Models"
+---
+
+import { TopBanners } from "@site/src/components/TopBanners";
+
+
+
+# Reasoning & Thinking Models
+
+Open WebUI provides first-class support for models that exhibit "thinking" or "reasoning" behaviors (such as DeepSeek R1, OpenAI o1, and others). These models often generate internal chains of thought before providing a final answer.
+
+## How Thinking Tags Work
+
+When a model generates reasoning content, it typically wraps that content in specific XML-like tags (e.g., `...` or `...`).
+
+Open WebUI automatically:
+1. **Detects** these tags in the model's output stream.
+2. **Extracts** the content between the tags.
+3. **Renders** the extracted content in a collapsible UI element labeled "Thought" or "Thinking".
+
+This keeps the main chat interface clean while still giving you access to the model's internal processing.
+
+## The `reasoning_tags` Parameter
+
+You can customize which tags Open WebUI should look for using the `reasoning_tags` parameter. This can be set on a **per-chat** or **per-model** basis.
+
+### Default Tags
+By default, Open WebUI looks for several common reasoning tag pairs:
+- ``, ``
+- ``, ``
+- ``, ``
+- ``, ``
+- ``, ``
+- `<|begin_of_thought|>`, `<|end_of_thought|>`
+
+### Customization
+If your model uses different tags, you can provide a list of tag pairs in the `reasoning_tags` parameter. Each pair is a tuple or list of the opening and closing tag.
+
+## Configuration & Behavior
+
+- **Stripping from Payload**: The `reasoning_tags` parameter itself is an Open WebUI-specific control and is **stripped** from the payload before being sent to the LLM backend (OpenAI, Ollama, etc.). This ensures compatibility with providers that do not recognize this parameter.
+- **Chat History**: Thinking tags are **not** stripped from the chat history. If previous messages in a conversation contain thinking blocks, they are sent back to the model as part of the context, allowing the model to "remember" its previous reasoning steps.
+- **UI Rendering**: Internally, reasoning blocks are processed and rendered using a specialized UI component. When saved or exported, they may be represented as HTML `` tags.
+
+---
+
+## Open WebUI Settings
+
+Open WebUI provides several built-in settings to configure reasoning model behavior. These can be found in:
+
+- **Chat Controls** (sidebar) → **Advanced Parameters** — per-chat settings
+- **Workspace** → **Models** → **Edit Model** → **Advanced Parameters** — per-model settings (Admin only)
+- **Admin Panel** → **Settings** → **Models** → select a model → **Advanced Parameters** — alternative per-model settings location
+
+### Reasoning Tags Setting
+
+This setting controls how Open WebUI parses and displays thinking/reasoning blocks:
+
+| Option | Description |
+|--------|-------------|
+| **Default** | Uses the system default behavior |
+| **Enabled** | Explicitly enables reasoning tag detection using default `...` tags |
+| **Disabled** | Turns off reasoning tag detection entirely |
+| **Custom** | Allows you to specify custom start and end tags |
+
+#### Using Custom Tags
+
+If your model uses non-standard reasoning tags (e.g., `...` or `[思考]...[/思考]`), select **Custom** and enter:
+
+- **Start Tag**: The opening tag (e.g., ``)
+- **End Tag**: The closing tag (e.g., ``)
+
+This is useful for:
+- Models with localized thinking tags
+- Custom fine-tuned models with unique tag formats
+- Models that use XML-style reasoning markers
+
+### think (Ollama)
+
+This Ollama-specific setting enables or disables the model's built-in reasoning feature:
+
+| Option | Description |
+|--------|-------------|
+| **Default** | Uses Ollama's default behavior |
+| **On** | Explicitly enables thinking mode for the model |
+| **Off** | Disables thinking mode |
+
+:::note
+This setting sends the `think` parameter directly to Ollama. It's separate from how Open WebUI parses the response—you may need both this setting AND proper reasoning tags configuration for the full experience.
+:::
+
+### Reasoning Effort
+
+For models that support variable reasoning depth (like some API providers), this setting controls how much effort the model puts into reasoning:
+
+- Common values: `low`, `medium`, `high`
+- Some providers accept numeric values
+
+:::info
+Reasoning Effort is only applicable to models from specific providers that support this parameter. It has no effect on local Ollama models.
+:::
+
+---
+
+## Interleaved Thinking with Tool Calls
+
+When a model uses **native function calling** (tool use) within a single turn, Open WebUI preserves the reasoning content and sends it back to the API for subsequent calls within that turn. This enables true "interleaved thinking" where:
+
+1. Model generates reasoning → makes a tool call
+2. Tool executes and returns results
+3. Model receives: original messages + previous reasoning + tool call + tool result
+4. Model continues reasoning → may make more tool calls or provide final answer
+5. Process repeats until the turn completes
+
+### How It Works
+
+During a multi-step tool calling turn, Open WebUI:
+
+1. **Captures** reasoning content from the model's response (via `reasoning_content`, `reasoning`, or `thinking` fields in the delta)
+2. **Stores** it in content blocks alongside tool calls
+3. **Serializes** the reasoning with its original tags (e.g., `...`) when building messages for the next API call
+4. **Includes** the serialized content in the assistant message's `content` field
+
+This ensures the model has access to its previous thought process when deciding on subsequent actions within the same turn.
+
+### How Reasoning Is Sent Back
+
+When building the next API request during a tool call loop, Open WebUI serializes reasoning as **text wrapped in tags** inside the assistant message's `content` field:
+
+```text
+Let me search for the current weather data...
+```
+
+The message structure looks like:
+
+```json
+{
+ "role": "assistant",
+ "content": "reasoning content here",
+ "tool_calls": [...]
+}
+```
+
+### Provider Compatibility
+
+Open WebUI follows the **OpenAI Chat Completions API standard**. Reasoning content is serialized as text within the message content field, not as provider-specific structured blocks.
+
+| Provider Type | Compatibility |
+|--------------|---------------|
+| OpenAI-compatible APIs | ✅ Works — reasoning is in the content text |
+| Ollama | ✅ Works — Ollama processes the message content |
+| Anthropic (extended thinking) | ❌ Not supported — Anthropic requires structured `{"type": "thinking"}` blocks, use a pipe function |
+| OpenAI o-series (stateful) | ⚠️ Limited — reasoning is hidden/internal, nothing to capture |
+
+### Important Notes
+
+- **Within-turn preservation**: Reasoning is preserved and sent back to the API only within the same turn (while tool calls are being processed)
+- **Cross-turn behavior**: Between separate user messages, reasoning is **not** sent back to the API. The thinking content is displayed in the UI but stripped from the message content that gets sent in subsequent requests.
+- **Text-based serialization**: Reasoning is sent as text wrapped in tags (e.g., `thinking content`), not as structured content blocks. This works with most OpenAI-compatible APIs but may not align with provider-specific formats like Anthropic's extended thinking content blocks.
+
+---
+
+## Streaming vs Non-Streaming
+
+### Streaming Mode (Default)
+
+In streaming mode (`stream: true`), Open WebUI processes tokens as they arrive and can detect reasoning blocks in real-time. This generally works well without additional configuration.
+
+### Non-Streaming Mode
+
+In non-streaming mode (`stream: false`), the entire response is returned at once. **This is where most parsing issues occur** because:
+
+1. The response arrives as a single block of text
+2. Without the reasoning parser, no post-processing separates the `` content
+3. The raw response is displayed as-is
+
+:::info Important
+If you're using non-streaming requests (via API or certain configurations), **the reasoning parser is essential** for proper thinking block separation.
+:::
+
+---
+
+## API Usage
+
+When using the Open WebUI API with reasoning models:
+
+```json
+{
+ "model": "qwen3:32b",
+ "messages": [
+ {"role": "user", "content": "Solve: What is 234 * 567?"}
+ ],
+ "stream": true
+}
+```
+
+**Recommendation:** Use `"stream": true` for the most reliable reasoning block parsing.
+
+---
+
+## Troubleshooting
+
+### Thinking Content Merged with Final Answer
+
+**Symptom:** When using a reasoning model, the entire response (including `...` blocks) is displayed as the final answer, instead of being separated into a hidden/collapsible thinking section.
+
+**Example of incorrect display:**
+
+```text
+
+Okay, the user wants a code snippet for a sticky header using CSS and JavaScript.
+Let me think about how to approach this.
+...
+I think that's a solid approach. Let me write the code now.
+
+
+Here's a complete code snippet that demonstrates a sticky header using CSS and JavaScript...
+```
+
+**Expected behavior:** The thinking content should be hidden or collapsible, with only the final answer visible.
+
+### For Ollama Users
+
+The most common cause is that Ollama is not configured with the correct **reasoning parser**. When running Ollama, you need to specify the `--reasoning-parser` flag to enable proper parsing of thinking blocks.
+
+#### Step 1: Configure the Reasoning Parser
+
+When starting Ollama, add the `--reasoning-parser` flag:
+
+```bash
+# For DeepSeek-R1 style reasoning (recommended for most models)
+ollama serve --reasoning-parser deepseek_r1
+
+# Alternative parsers (if the above doesn't work for your model)
+ollama serve --reasoning-parser qwen3
+ollama serve --reasoning-parser deepseek_v3
+```
+
+:::tip Recommended Parser
+For most reasoning models, including Qwen3 and DeepSeek variants, use `--reasoning-parser deepseek_r1`. This parser handles the standard `...` format used by most reasoning models.
+:::
+
+#### Step 2: Restart Ollama
+
+After adding the flag, restart the Ollama service:
+
+```bash
+# Stop Ollama
+# On Linux/macOS:
+pkill ollama
+
+# On Windows (PowerShell):
+Stop-Process -Name ollama -Force
+
+# Start with the reasoning parser
+ollama serve --reasoning-parser deepseek_r1
+```
+
+#### Step 3: Verify in Open WebUI
+
+1. Go to Open WebUI and start a new chat with your reasoning model
+2. Ask a question that requires reasoning (e.g., a math problem or logic puzzle)
+3. The response should now show the thinking content in a collapsible section
+
+### Available Reasoning Parsers
+
+| Parser | Description | Use Case |
+|--------|-------------|----------|
+| `deepseek_r1` | DeepSeek R1 format | Most reasoning models, including Qwen3 |
+| `deepseek_v3` | DeepSeek V3 format | Some DeepSeek variants |
+| `qwen3` | Qwen3-specific format | If `deepseek_r1` doesn't work with Qwen |
+
+### Troubleshooting Checklist
+
+#### 1. Verify Ollama Is Running with Reasoning Parser
+
+Check if Ollama was started with the correct flag:
+
+```bash
+# Check the Ollama process
+ps aux | grep ollama
+# or on Windows:
+Get-Process -Name ollama | Format-List *
+```
+
+Look for `--reasoning-parser` in the command line arguments.
+
+#### 2. Check Model Compatibility
+
+Not all models output reasoning in the same format. Verify your model's documentation for:
+
+- What tags it uses for thinking content (e.g., ``, ``, etc.)
+- Whether it requires specific prompting to enable thinking mode
+
+#### 3. Test with Streaming Enabled
+
+If non-streaming isn't working, try enabling streaming in your chat:
+
+1. Go to **Chat Controls** (sidebar)
+2. Ensure streaming is enabled (this is the default)
+3. Test the model again
+
+#### 4. Check Open WebUI Version
+
+Ensure you're running the latest version of Open WebUI, as reasoning model support continues to improve:
+
+```bash
+docker pull ghcr.io/open-webui/open-webui:main
+```
+
+#### 5. Verify the Model Response Format
+
+Use the Ollama CLI directly to check what format your model outputs:
+
+```bash
+ollama run your-model:tag "Explain step by step: What is 15 + 27?"
+```
+
+Look for `` tags in the output. If they're not present, the model may require specific system prompts to enable thinking mode.
+
+### Reasoning Lost Between Tool Calls
+
+**Symptom:** The model seems to "forget" what it was thinking about after a tool call completes.
+
+**Possible Causes:**
+1. The model doesn't output reasoning in a captured format (`reasoning_content`, `reasoning`, or `thinking` delta fields)
+2. The model uses text-based thinking tags that aren't being parsed as reasoning blocks
+
+**Solution:** Check if your model outputs reasoning through:
+- Structured delta fields (`reasoning_content`, `reasoning`, `thinking`)
+- Text-based tags that Open WebUI detects (ensure reasoning tag detection is enabled)
+
+### Anthropic Extended Thinking Not Working with Tool Calls
+
+**Symptom:** Using Anthropic's Claude models with extended thinking enabled, but tool calls fail with errors like:
+
+```
+Expected `thinking` or `redacted_thinking`, but found `text`. When `thinking` is enabled,
+a final `assistant` message must start with a thinking block.
+```
+
+**Cause:** This is a fundamental architectural difference. Open WebUI follows the **OpenAI Chat Completions API standard** and does not natively support Anthropic's proprietary API format. Anthropic's extended thinking requires structured content blocks with `{"type": "thinking"}` or `{"type": "redacted_thinking"}`, which are Anthropic-specific formats that don't exist in the OpenAI standard.
+
+Open WebUI serializes reasoning as text wrapped in tags (e.g., `...`) inside the message content field. This works with OpenAI-compatible APIs but does not satisfy Anthropic's requirement for structured thinking blocks.
+
+**Why Open WebUI Doesn't Support This Natively:**
+
+There is no standard way for storing reasoning content as part of the API payload across different providers. If Open WebUI implemented support for one provider's format (Anthropic), it would likely break existing deployments for many other inference providers. Given the wide variety of backends Open WebUI supports, we follow the OpenAI Completions API as the common standard.
+
+**Workarounds:**
+
+1. **Use a Pipe Function**: Create a custom [pipe function](/features/pipelines/pipes) that converts Open WebUI's text-based thinking format to Anthropic's structured thinking blocks before sending requests to the Anthropic API.
+
+2. **Disable Extended Thinking**: If you don't need extended thinking for tool-calling workflows, disable it to avoid the format mismatch.
+
+:::note
+This limitation applies specifically to combining Anthropic's extended thinking with tool calls. Extended thinking works without tool calls, and tool calls work without extended thinking—the issue only occurs when using both features together via the Anthropic API.
+:::
+
+### Stateful Reasoning Models (GPT-5.2, etc.)
+
+**Symptom:** Using a model that hides its reasoning (stateful/internal reasoning), and reasoning is not being preserved.
+
+**Cause:** Some newer models (like GPT-5.2) keep their reasoning internal and don't expose it in the API response. Open WebUI can only preserve reasoning that is actually returned by the model.
+
+**Behavior:** If the model returns a reasoning summary instead of full reasoning content, that summary is what gets preserved and sent back.
+
+---
+
+## Frequently Asked Questions
+
+### Why is the thinking block showing as raw text?
+If the model uses tags that are not in the default list and have not been configured in `reasoning_tags`, Open WebUI will treat them as regular text. You can fix this by adding the correct tags to the `reasoning_tags` parameter in the Model Settings or Chat Controls.
+
+### Does the model see its own thinking?
+
+**It depends on the context:**
+
+- **Within the same turn (during tool calls)**: **Yes**. When a model makes tool calls, Open WebUI preserves the reasoning content and sends it back to the API as part of the assistant message. This enables the model to maintain context about what it was thinking when it made the tool call.
+
+- **Across different turns**: **No**. When a user message starts a fresh turn, the reasoning from previous turns is **not** sent back to the API. The thinking content is extracted and displayed in the UI but stripped from the message content before being sent in subsequent requests. This follows the design of reasoning models like OpenAI's `o1`, where the "chain of thought" is intended to be internal and ephemeral.
+
+### How is reasoning sent during tool calls?
+
+When tool calls are involved, reasoning is serialized as text with its original tags and included in the assistant message's `content` field. For example:
+
+```
+Let me search for the current weather...
+```
+
+This text-based format works with most OpenAI-compatible providers. However, some providers (like Anthropic) may expect structured thinking content blocks in a specific format—Open WebUI currently uses text-based serialization rather than provider-specific structured formats.
diff --git a/docs/features/chat-features/temporal-awareness.mdx b/docs/features/chat-features/temporal-awareness.mdx
new file mode 100644
index 0000000000..ff47b2f0f2
--- /dev/null
+++ b/docs/features/chat-features/temporal-awareness.mdx
@@ -0,0 +1,36 @@
+---
+sidebar_position: 11
+title: "Temporal Awareness"
+---
+
+# Temporal Awareness (Date & Time)
+
+For an AI to be truly helpful, it needs to understand the concept of time. Open WebUI ensures that models are aware of the current date, time, and timezone so they can provide contextually relevant answers (e.g., "What's on my schedule today?" or "Summarize my meetings from yesterday").
+
+## System-Level Awareness
+
+By default, Open WebUI injects temporal variables into the model's environment via the system prompt. Even without specialized tools, most models are aware of:
+- **`CURRENT_DATE`**: Injected as YYYY-MM-DD.
+- **`CURRENT_TIME`**: Injected as HH:MM.
+- **`CURRENT_WEEKDAY`**: (e.g., Monday, Tuesday).
+
+These variables can be manually used in [**Prompts**](/features/workspace/prompts) or [**Model Files**](/features/workspace/models) using the `{{CURRENT_DATE}}` syntax.
+
+---
+
+## Native Temporal Tools (Agentic)
+
+When using a model with **Native Function Calling** enabled (see the [**Central Tool Calling Guide**](/features/plugin/tools#tool-calling-modes-default-vs-native)), models gain granular control over time calculations and queries.
+
+### Available Time Tools:
+- **`get_current_timestamp`**: The model can retrieve the exact current Unix timestamp (UTC) and ISO date string.
+- **`calculate_timestamp`**: The model can perform relative time arithmetic (e.g., "Calculate the date for 3 days ago" or "When is next Friday?").
+
+### Why use native tool calling for Time?
+While static variables tell the model "when it is now," native tools allow the model to **reason about time**.
+
+If you ask: *"Find the notes I wrote last Tuesday,"* a model without tools might guess the date incorrectly. A tool-equipped model will:
+1. **Calculate** the exact date of "last Tuesday" using `calculate_timestamp`.
+2. **Search** your notes using that specific date as a filter via `search_notes`.
+
+This precision is essential for reliable agentic workflows that involve searching history, scheduling tasks, or analyzing time-sensitive data.
diff --git a/docs/features/chat-features/url-params.md b/docs/features/chat-features/url-params.md
index 50140d6a09..f0cb2c5995 100644
--- a/docs/features/chat-features/url-params.md
+++ b/docs/features/chat-features/url-params.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 5
+sidebar_position: 7
title: "URL Parameters"
---
diff --git a/docs/features/evaluation/index.mdx b/docs/features/evaluation/index.mdx
index 8f1a3e445c..88fcea2329 100644
--- a/docs/features/evaluation/index.mdx
+++ b/docs/features/evaluation/index.mdx
@@ -86,7 +86,15 @@ This is a sample leaderboard layout:

-### Topic-Based Reranking
+### Model Activity Tracking
+
+In addition to overall Elo ratings, you can now view a model's performance history through the **Model Activity Chart**. This feature provides a chronological view of how a model's evaluation has evolved over time.
+
+- **Diverging Chart**: The chart displays wins (positive) and losses (negative) daily or weekly, giving you a clear visual indicator of the model's reliability over time.
+- **Time Ranges**: You can toggle between different time horizons: **30 Days**, **1 Year**, or **All Time**.
+- **Weekly Aggregation**: For longer time ranges (1Y and All), the data is automatically aggregated by week to provide a smoother, more readable trend.
+
+To view the activity chart, click on a model in the Leaderboard to open its detailed evaluation modal.
When you rate chats, you can **tag them by topic** for more granular insights. This is especially useful if you’re working in different domains like **customer service, creative writing, technical support**, etc.
diff --git a/docs/features/experimental/direct-connections.mdx b/docs/features/experimental/direct-connections.mdx
new file mode 100644
index 0000000000..7e28993a4c
--- /dev/null
+++ b/docs/features/experimental/direct-connections.mdx
@@ -0,0 +1,37 @@
+---
+sidebar_position: 1510
+title: "Direct Connections"
+---
+
+**Direct Connections** is a feature that allows users to connect their Open WebUI client directly to OpenAI-compatible API endpoints, bypassing the Open WebUI backend for inference requests.
+
+## Overview
+
+In a standard deployment, Open WebUI acts as a proxy: the browser sends the prompt to the Open WebUI backend, which then forwards it to the LLM provider (Ollama, OpenAI, etc.).
+
+With **Direct Connections**, the browser communicates directly with the API provider.
+
+## Benefits
+
+* **Privacy & Control**: Users can use their own personal API keys without storing them on the Open WebUI server (keys are stored in the browser's local storage).
+* **Reduced Latency**: Removes the "middleman" hop through the Open WebUI backend, potentially speeding up response times.
+* **Server Load Reduction**: Offloads the network traffic and connection management from the Open WebUI server to the individual client browsers.
+
+## Prerequisites
+
+1. **Admin Enablement**: The administrator must enable this feature globally.
+ * **Admin Panel > Settings > Connections > Direct Connections**: Toggle **On**.
+ * Alternatively, set the environment variable: `ENABLE_DIRECT_CONNECTIONS=true`.
+2. **CORS Configuration**: Since the browser is making the request, the API provider must have **Cross-Origin Resource Sharing (CORS)** configured to allow requests from your Open WebUI domain.
+ * *Note: Many strict providers (like official OpenAI) might block direct browser requests due to CORS policies. This feature is often best used with flexible providers or internal API gateways.*
+
+## User Configuration
+
+Once enabled by the admin, users can configure their own connections:
+
+1. Go to **User Settings > Connections**.
+2. Click **+ (Add Connection)**.
+3. Enter the **Base URL** (e.g., `https://api.groq.com/openai/v1`) and your **API Key**.
+4. Click **Save**.
+
+The models from this direct connection will now appear in your model list, often indistinguishable from backend-provided models, but requests will flow directly from your machine to the provider.
diff --git a/docs/features/image-generation-and-editing/automatic1111.md b/docs/features/image-generation-and-editing/automatic1111.md
index a84a348829..ed6a1b6592 100644
--- a/docs/features/image-generation-and-editing/automatic1111.md
+++ b/docs/features/image-generation-and-editing/automatic1111.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 1
+sidebar_position: 2
title: "AUTOMATIC1111"
---
diff --git a/docs/features/image-generation-and-editing/comfyui.md b/docs/features/image-generation-and-editing/comfyui.md
index eef4cc73b6..55b12dcdb7 100644
--- a/docs/features/image-generation-and-editing/comfyui.md
+++ b/docs/features/image-generation-and-editing/comfyui.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 2
+sidebar_position: 3
title: "ComfyUI"
---
diff --git a/docs/features/image-generation-and-editing/gemini.md b/docs/features/image-generation-and-editing/gemini.mdx
similarity index 69%
rename from docs/features/image-generation-and-editing/gemini.md
rename to docs/features/image-generation-and-editing/gemini.mdx
index 58764f7946..ea954434ae 100644
--- a/docs/features/image-generation-and-editing/gemini.md
+++ b/docs/features/image-generation-and-editing/gemini.mdx
@@ -11,15 +11,15 @@ Open WebUI also supports image generation through the **Google AI Studio API** a
### Initial Setup
-1. Obtain an [API key](https://aistudio.google.com/api-keys) from Google AI Studio.
-2. You may need to create a project and enable the `Generative Language API` in addition to adding billing information.
+1. Obtain an [API key](https://aistudio.google.com/api-keys) from Google AI Studio - alternatively an API Key from Google Cloud and activate the `Generative Language API` for the project.
+2. You most likely need to create a project and enable the `Generative Language API` in addition to adding billing information, because the image generation API is not available for free.
:::warning
If you are utilizing a free API key, it is vital to have a payment method on file. The absence of a valid payment method is a frequent cause of errors during the setup process.
:::
:::tip
-Alternatively, if you are using Vertex AI, you can create an API key in Google Cloud instead of a service account. This key will function correctly, provided it is assigned the appropriate permissions.
+Alternatively, if you are using Vertex AI, you can create an API key in Google Cloud instead of a service account. This key will function correctly, provided it is assigned the appropriate permissions. And given the Generative Language API is enabled for the project.
:::
### Configuring Open WebUI
@@ -31,12 +31,34 @@ Alternatively, if you are using Vertex AI, you can create an API key in Google C
5. Enter the model you wish to use from these [available models](https://ai.google.dev/gemini-api/docs/imagen#model-versions).
6. Set the image size to one of the available [image sizes](https://ai.google.dev/gemini-api/docs/image-generation#aspect_ratios).
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
### Example Minimal Setup
-One minimalistic working setup for Gemini can look like this:
+
+
-#### Create Image
+**Create Image**
+- **Create Image Model**: `gemini-3-pro-image-preview`
+- **Image Size**: `2816x1536`
+- **Image Prompt Generation**: on
+- **Image Generation Engine**: `Gemini`
+- **Gemini Base URL**: `https://generativelanguage.googleapis.com/v1beta`
+- **Gemini API Key**: Enter your API Key
+- **Gemini Endpoint Method**: `generateContent`
+**Edit Image**
+- **Image Edit Engine**: `Gemini`
+- **Model**: `gemini-3-pro-image-preview`
+- **Image Size**: (can be left empty)
+- **Gemini Base URL**: `https://generativelanguage.googleapis.com/v1beta`
+- **Gemini API Key**: Enter your API Key
+
+
+
+
+**Create Image**
- **Create Image Model**: `gemini-2.5-flash-image`
- **Image Size**: `2816x1536`
- **Image Prompt Generation**: on
@@ -45,14 +67,16 @@ One minimalistic working setup for Gemini can look like this:
- **Gemini API Key**: Enter your API Key
- **Gemini Endpoint Method**: `generateContent`
-#### Edit Image
-
+**Edit Image**
- **Image Edit Engine**: `Gemini`
- **Model**: `gemini-2.5-flash-image`
- **Image Size**: (can be left empty)
- **Gemini Base URL**: `https://generativelanguage.googleapis.com/v1beta`
- **Gemini API Key**: Enter your API Key
+
+
+

:::info
@@ -69,12 +93,12 @@ Imagen model endpoint example:
Gemini model endpoint example:
-- `https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash-image:generateContent`.
+- `https://generativelanguage.googleapis.com/v1beta/models/gemini-3-pro-image-preview:generateContent`.
- [Documentation for Gemini models](https://ai.google.dev/gemini-api/docs/image-generation)
-Trying to call a Gemini model, such as gemini-2.5-flash-image aka *Nano Banana* would result in an error due to the difference in supported endpoints for Image Generation.
+Trying to call a Gemini model, such as `gemini-3-pro-image-preview` would result in an error due to the difference in supported endpoints for Image Generation.
-`400: [ERROR: models/gemini-2.5-flash-image is not found for API version v1beta, or is not supported for predict. Call ListModels to see the list of available models and their supported methods.]`
+`400: [ERROR: models/gemini-3-pro-image-preview is not found for API version v1beta, or is not supported for predict. Call ListModels to see the list of available models and their supported methods.]`
:::
diff --git a/docs/features/image-generation-and-editing/image-router.md b/docs/features/image-generation-and-editing/image-router.md
index 2a6034ac4e..6386527569 100644
--- a/docs/features/image-generation-and-editing/image-router.md
+++ b/docs/features/image-generation-and-editing/image-router.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 4
+sidebar_position: 6
title: "Image Router"
---
diff --git a/docs/features/image-generation-and-editing/openai.md b/docs/features/image-generation-and-editing/openai.md
index 016a6a6d8f..28d5cc306c 100644
--- a/docs/features/image-generation-and-editing/openai.md
+++ b/docs/features/image-generation-and-editing/openai.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 3
+sidebar_position: 4
title: "OpenAI"
---
@@ -27,22 +27,30 @@ Open WebUI also supports image generation through the **OpenAI APIs**. This opti
### Azure OpenAI
-Image generation with Azure OpenAI Dall-E or GPT-Image is supported with Open WebUI. Configure the Image Generation as follows:
+Image generation with Azure OpenAI (DALL·E or GPT-Image) is supported. Configure the Image Generation as follows:
1. In Open WebUI, navigate to the **Admin Panel** > **Settings** > **Images** menu.
2. Set the `Image Generation Engine` field to `Open AI` (Azure OpenAI uses the same syntax as OpenAI).
3. Change the API endpoint URL to `https://.cognitiveservices.azure.com/openai/deployments//`. Set the instance and model id as you find it in the settings of the Azure AI Foundry.
-4. Configure the API version to the value you find in the settings of the Azure AI Fountry.
+4. Configure the API version to the value you find in the settings of the Azure AI Foundry.
5. Enter your Azure OpenAI API key.

-:::tip
+:::tip Azure GPT-Image-1.5 Configuration
+For Azure OpenAI **gpt-image-1.5**, use the following settings for successful generation:
+- **Model**: `gpt-image-1.5`
+- **Image Size**: `1024x1024`
+- **API Version**: `2025-04-01-preview`
+- **API Endpoint URL**: `https://.openai.azure.com/openai/deployments//` (ensure the trailing slash is included)
+
+If you encounter the error `[ERROR: azure-openai error: Unknown parameter: 'response_format'.]`, double-check that your API Version is set to `2025-04-01-preview` or later.
+:::
+:::tip
Alternative API endpoint URL tutorial: `https://.openai.azure.com/openai/deployments//` - you can find your endpoint name on https://ai.azure.com/resource/overview, and model name on https://ai.azure.com/resource/deployments.
You can also copy Target URI from your deployment detailed page, but remember to delete strings after model name.
For example, if your Target URI is `https://test.openai.azure.com/openai/deployments/gpt-image-1/images/generations?api-version=2025-04-01-preview`, the API endpoint URL in Open WebUI should be `https://test.openai.azure.com/openai/deployments/gpt-image-1/`.
-
:::
### LiteLLM Proxy with OpenAI Endpoints
diff --git a/docs/features/image-generation-and-editing/usage.md b/docs/features/image-generation-and-editing/usage.md
index 2d777f113e..4a132db1db 100644
--- a/docs/features/image-generation-and-editing/usage.md
+++ b/docs/features/image-generation-and-editing/usage.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 6
+sidebar_position: 1
title: "Usage"
---
@@ -7,24 +7,66 @@ Before you can use image generation, you must ensure that the **Image Generation
## Using Image Generation
-### Method 1
-
1. Toggle the `Image Generation` switch to on.
2. Enter your image generation prompt.
3. Click `Send`.

-### Method 2
+## Native Tool-Based Generation (Agentic)
+
+If your model is configured with **Native Function Calling** (see the [**Central Tool Calling Guide**](/features/plugin/tools#tool-calling-modes-default-vs-native)), it can invoke image generation directly as a tool.
+
+### How it works:
+- **Requirement**: The **Image Generation** feature must be toggled **ON** for the chat or model. This grants the model "permission" to use the tool.
+- **Natural Language**: You can simply ask the model: *"Generate an image of a cybernetic forest."*
+- **Action**: If **Native Mode** is active and the feature is enabled, the model will invoke the `generate_image` tool.
+- **Display**: The generated image is displayed directly in the chat interface.
+- **Editing**: This also supports **Image Editing** (inpainting) via the `edit_image` tool (e.g., *"Make the sky in this image red"*).
+
+This approach allows the model to "reason" about the prompt before generating, or even generate multiple images as part of a complex request.
-
-1. First, use a text generation model to write a prompt for image generation.
-2. After the response has finished, you can click the Picture icon to generate an image.
-3. After the image has finished generating, it will be returned automatically in chat.
:::tip
You can also edit the LLM's response and enter your image generation prompt as the message to send off for image generation instead of using the actual response provided by the LLM.
+
+:::
+
+:::info
+**Legacy "Generate Image" Button:**
+As of Open WebUI v0.7.0, the native "Generate Image" button (which allowed generating an image directly from a message's content) was removed. If you wish to restore this functionality, you can use the community-built **[Generate Image Action](https://openwebui.com/posts/3fadc3ca-c955-4c9e-9582-7438f0911b62)**.
+:::
+
+## Restoring the "Generate Image" Button
+
+If you prefer the workflow where you can click a button on any message to generate an image from its content, you can easily restore it:
+
+1. Visit the **[Generate Image Action](https://openwebui.com/posts/3fadc3ca-c955-4c9e-9582-7438f0911b62)** on the Open WebUI Community site.
+2. Click **Get** to import it into your local instance (or copy the code and paste it into your local instance).
+3. Once imported, go to **Workspace** > **Functions** and ensure the **Generate Image** action is enabled.
+
+This action adds a "Generate Image" icon to the message action bar, allowing you to generate images directly from LLM responses - which is helpful if you want the assistant to first iterate on the image prompt and generate it once you are satisfied.
+
+
+:::info
+**Requirement:** To use **Image Editing** or **Image+Image Generation**, you must have an **Image Generation Model** configured in the Admin Settings that supports these features (e.g., OpenAI DALL-E, or a ComfyUI/Automatic1111 model with appropriate inpainting/img2img capabilities).
:::
+
+## Image Editing (Inpainting)
+
+You can edit an image by providing the image and a text prompt directly in the chat.
+
+1. **Upload an image** to the chat.
+2. **Enter a prompt** describing the change you want to make (e.g., "Change the background to a sunset" or "Add a hat").
+3. The model will generate a new version of the image based on your prompt.
+
+## Image Compositing (Multi-Image Fusion)
+
+Seamlessly combine multiple images into a single cohesive scene—a process professionally known as **Image Compositing** or **Multi-Image Fusion**. This allows you to merge elements from different sources (e.g., placing a subject from one image into the background of another) while harmonizing lighting, perspective, and style.
+
+1. **Upload images** to the chat (e.g., upload an image of a subject and an image of a background).
+2. **Enter a prompt** describing the desired composition (e.g., "Combine these images to show the cat sitting on the park bench, ensuring consistent lighting").
+3. The model will generate a new composite image that fuses the elements according to your instructions.
diff --git a/docs/features/index.mdx b/docs/features/index.mdx
index 6eba78e7e2..d757b5673b 100644
--- a/docs/features/index.mdx
+++ b/docs/features/index.mdx
@@ -9,15 +9,15 @@ import { TopBanners } from "@site/src/components/TopBanners";
## Key Features of Open WebUI ⭐
-- 🚀 **Effortless Setup**: Install seamlessly using Docker, Kubernetes, Podman, Helm Charts (`kubectl`, `kustomize`, `podman`, or `helm`) for a hassle-free experience with support for both `:ollama` image with bundled Ollama and `:cuda` with CUDA support.
+- 🚀 **Effortless Setup**: Install seamlessly using Docker, Kubernetes, Podman, Helm Charts (`kubectl`, `kustomize`, `podman`, or `helm`) for a hassle-free experience with support for both `:ollama` image with bundled Ollama and `:cuda` with CUDA support. [Learn more in our Quick Start Guide](/getting-started/quick-start).
- 🛠️ **Guided Initial Setup**: Complete the setup process with clarity, including an explicit indication of creating an admin account during the first-time setup.
-- 🤝 **OpenAI API Integration**: Effortlessly integrate OpenAI-compatible APIs for versatile conversations alongside Ollama models. The OpenAI API URL can be customized to integrate Open WebUI seamlessly with various third-party applications.
+- 🤝 **Universal API Compatibility**: Effortlessly integrate with any backend that follows the **OpenAI Chat Completions protocol**. This includes official OpenAI endpoints alongside dozens of third-party and local providers. The API URL can be customized to integrate Open WebUI seamlessly into your existing infrastructure. [See Setup Guide](/getting-started/quick-start).
-- 🛡️ **Granular Permissions and User Groups**: By allowing administrators to create detailed user roles, user groups, and permissions across the workspace, we ensure a secure user environment for all users involved. This granularity not only enhances security, but also allows for customized user experiences, fostering a sense of ownership and responsibility amongst users.
+- 🛡️ **Granular Permissions and User Groups**: By allowing administrators to create detailed user roles, user groups, and permissions across the workspace, we ensure a secure user environment for all users involved. This granularity not only enhances security, but also allows for customized user experiences, fostering a sense of ownership and responsibility amongst users. [Learn more about RBAC](/features/rbac).
-- 🔐 **SCIM 2.0 Provisioning**: Enterprise-grade user and group provisioning through SCIM 2.0 protocol, enabling seamless integration with identity providers like Okta, Azure AD, and Google Workspace for automated user lifecycle management.
+- 🔐 **SCIM 2.0 Provisioning**: Enterprise-grade user and group provisioning through SCIM 2.0 protocol, enabling seamless integration with identity providers like Okta, Azure AD, and Google Workspace for automated user lifecycle management. [Read the SCIM Guide](/features/auth/scim).
- 📱 **Responsive Design**: Enjoy a seamless experience across desktop PCs, laptops, and mobile devices.
@@ -29,37 +29,37 @@ import { TopBanners } from "@site/src/components/TopBanners";
:::
-- ✒️🔢 **Full Markdown and LaTeX Support**: Elevate your LLM experience with comprehensive Markdown, LaTex, and Rich Text capabilities for enriched interaction.
+- ✒️🔢 **Full Markdown and LaTeX Support**: Elevate your LLM experience with comprehensive Markdown, LaTex, and Rich Text capabilities for enriched interaction. [Explore Interface Features](/category/interface).
-- 🧩 **Model Builder**: Easily create custom models from base Ollama models directly from Open WebUI. Create and add custom characters and agents, customize model elements, and import models effortlessly through [Open WebUI Community](https://openwebui.com/) integration.
+- 🧩 **Model Builder**: Easily create custom models from base Ollama models directly from Open WebUI. Create and add custom characters and agents, customize model elements, and import models effortlessly through [Open WebUI Community](https://openwebui.com/) integration. [Learn more about Models](/features/workspace/models).
-- 📚 **Advanced RAG Integration with Multiple Vector Databases**: Dive into the future of chat interactions with cutting-edge Retrieval Augmented Generation (RAG) technology. Choose from 9 vector database options: ChromaDB (default), PostgreSQL with PGVector, Qdrant, Milvus, Elasticsearch, OpenSearch, Pinecone, S3Vector, and Oracle 23ai. Documents can be loaded into the `Documents` tab of the Workspace and accessed using the pound key [`#`] before a query, or by starting the prompt with [`#`] followed by a URL for webpage content integration.
+- 📚 **Advanced RAG Integration with Multiple Vector Databases**: Dive into the future of chat interactions with cutting-edge Retrieval Augmented Generation (RAG) technology. Choose from 9 vector database options: ChromaDB (default), PostgreSQL with PGVector, Qdrant, Milvus, Elasticsearch, OpenSearch, Pinecone, S3Vector, and Oracle 23ai. Documents can be loaded into the `Documents` tab of the Workspace and accessed using the pound key [`#`] before a query, or by starting the prompt with [`#`] followed by a URL for webpage content integration. [Learn more about RAG](/features/rag).
-- 📄 **Advanced Document Extraction with Multiple Engines**: Extract text and data from various document formats including PDFs, Word documents, Excel spreadsheets, PowerPoint presentations, and more using your choice of extraction engines: Apache Tika, Docling, Azure Document Intelligence, Mistral OCR, or external custom (self-built) content extraction engines/document loaders. Advanced document processing capabilities enable seamless integration with your knowledge base, preserving structure and formatting while supporting OCR for scanned documents and images.
+- 📄 **Advanced Document Extraction with Multiple Engines**: Extract text and data from various document formats including PDFs, Word documents, Excel spreadsheets, PowerPoint presentations, and more using your choice of extraction engines: Apache Tika, Docling, Azure Document Intelligence, Mistral OCR, or external custom (self-built) content extraction engines/document loaders. Advanced document processing capabilities enable seamless integration with your knowledge base, preserving structure and formatting while supporting OCR for scanned documents and images. [Read about Document Extraction](/features/rag/document-extraction).
-- 🔍 **Web Search for RAG with 15+ Providers**: Perform web searches using 15+ providers including SearXNG, Google PSE, Brave Search, Kagi, Mojeek, Bocha, Tavily, Perplexity (AI models and Search API), serpstack, serper, Serply, DuckDuckGo, SearchAPI, SerpApi, Bing, Jina, Exa, Sougou, Azure AI Search, and Ollama Cloud, injecting results directly into your local Retrieval Augmented Generation (RAG) experience.
+- 🔍 **Web Search for RAG & Agentic Research**: Perform web searches using 15+ providers including SearXNG, Google PSE, Brave Search, Kagi, Mojeek, Bocha, Tavily, Perplexity, and more. When using **Native Function Calling**, models can perform multiple searches sequentially and use the `fetch_url` tool to read full page content for deep research. [Learn about Agentic Search](/features/web-search/agentic-search).
-- 🌐 **Web Browsing Capabilities**: Integrate websites seamlessly into your chat experience by using the `#` command followed by a URL. This feature enables the incorporation of web content directly into your conversations, thereby enhancing the richness and depth of your interactions.
+- 🌐 **Web Browsing & URL Fetching**: Integrate websites by using the `#` command or allow the model to independently visit links using the `fetch_url` tool in Native Mode, extracting full text content for precise analysis.
-- 🎨 **Image Generation & Editing Integration**: Seamlessly create and edit images using multiple engines including OpenAI's DALL-E (generation and editing), Gemini (generation and editing), ComfyUI (local, generation and editing), and AUTOMATIC1111 (local, generation). Support for both text-to-image generation and prompt-based image editing workflows with dynamic visual content.
+- 🎨 **Image Generation & Editing Integration**: Seamlessly create and edit images using engines like DALL-E, Gemini, ComfyUI, and AUTOMATIC1111. Supports **Native Tool Calling**, allowing models to independently generate and refine images during a conversation. [Learn more about Image Gen](/category/create--edit-images).
- ⚙️ **Concurrent Model Utilization**: Effortlessly engage with multiple models simultaneously, harnessing their unique strengths for optimal responses. Leverage a diverse set of model modalities in parallel to enhance your experience.
-- 🔐 **Role-Based Access Control (RBAC)**: Ensure secure access with restricted permissions. Only authorized individuals can access your Ollama, while model creation and pulling rights are exclusively reserved for administrators.
+- 🔐 **Role-Based Access Control (RBAC)**: Ensure secure access with restricted permissions. Only authorized individuals can access your Ollama, while model creation and pulling rights are exclusively reserved for administrators. [Learn more about RBAC](/features/rbac).
- 🌐🌍 **Multilingual Support**: Experience Open WebUI in your preferred language with our internationalization (`i18n`) support. We invite you to join us in expanding our supported languages! We're actively seeking contributors!
-- 💾 **Persistent Artifact Storage**: Built-in key-value storage API for artifacts, enabling features like journals, trackers, leaderboards, and collaborative tools with both personal and shared data scopes that persist across sessions.
+- 💾 **Persistent Artifact Storage**: Built-in key-value storage API for artifacts, enabling features like journals, trackers, leaderboards, and collaborative tools with both personal and shared data scopes that persist across sessions. [Explore Chat Features](/features/chat-features).
-- ☁️ **Cloud Storage Integration**: Native support for cloud storage backends including Amazon S3 (with S3-compatible providers), Google Cloud Storage, and Microsoft Azure Blob Storage for scalable file storage and data management.
+- ☁️ **Cloud Storage Integration**: Native support for cloud storage backends including Amazon S3 (with S3-compatible providers), Google Cloud Storage, and Microsoft Azure Blob Storage for scalable file storage and data management. [See Storage Config](/getting-started/env-configuration#cloud-storage).
-- ☁️ **Enterprise Cloud Integration**: Seamlessly import documents from Google Drive and OneDrive/SharePoint directly through the file picker interface, enabling smooth workflows with enterprise cloud storage solutions.
+- ☁️ **Enterprise Cloud Integration**: Seamlessly import documents from Google Drive and OneDrive/SharePoint directly through the file picker interface, enabling smooth workflows with enterprise cloud storage solutions. [Learn more in Environment Config](/getting-started/env-configuration#onedrive) and check out the [SharePoint Guide](/tutorials/integrations/onedrive-sharepoint/).
-- 📊 **Production Observability with OpenTelemetry**: Built-in OpenTelemetry support for comprehensive monitoring with traces, metrics, and logs export to your existing observability stack (Prometheus, Grafana, Jaeger, etc.), enabling production-grade monitoring and debugging.
+- 📊 **Production Observability with OpenTelemetry**: Built-in OpenTelemetry support for comprehensive monitoring with traces, metrics, and logs export to your existing observability stack (Prometheus, Grafana, Jaeger, etc.), enabling production-grade monitoring and debugging. [See Observability Config](/getting-started/env-configuration/#opentelemetry-configuration).
-- 🔒 **Encrypted Database Support**: Optional at-rest encryption for SQLite databases using SQLCipher, providing enhanced security for sensitive data in smaller deployments without requiring PostgreSQL infrastructure.
+- 🔒 **Encrypted Database Support**: Optional at-rest encryption for SQLite databases using SQLCipher, providing enhanced security for sensitive data in smaller deployments without requiring PostgreSQL infrastructure. [See Database Encryption](/getting-started/env-configuration#encrypted-sqlite-with-sqlcipher).
-- ⚖️ **Horizontal Scalability for Production**: Redis-backed session management and WebSocket support enabling multi-worker and multi-node deployments behind load balancers for high-availability production environments.
+- ⚖️ **Horizontal Scalability for Production**: Redis-backed session management and WebSocket support enabling multi-worker and multi-node deployments behind load balancers for high-availability production environments. [See Advanced Topics](/getting-started/advanced-topics) and our [Multi-Replica Guide](/troubleshooting/multi-replica).
- 🌟 **Continuous Updates**: We are committed to improving Open WebUI with regular updates, fixes, and new features.
@@ -69,9 +69,11 @@ import { TopBanners } from "@site/src/components/TopBanners";
### 🔧 Pipelines Support
-- 🔧 **Pipelines Framework**: Seamlessly integrate and customize your Open WebUI experience with our modular plugin framework for enhanced customization and functionality (https://github.com/open-webui/pipelines). Our framework allows for the easy addition of custom logic and integration of Python libraries, from AI agents to home automation APIs.
+- 🔧 **Pipelines Framework**: Seamlessly integrate and customize your Open WebUI experience with our modular plugin framework for enhanced customization and functionality (https://github.com/open-webui/pipelines). Our framework allows for the easy addition of custom logic and integration of Python libraries, from AI agents to home automation APIs. Perfect for plugin and tool development, as well as creating custom functions and filters. [Learn more about Pipelines](/features/pipelines).
-- 📥 **Upload Pipeline**: Pipelines can be uploaded directly from the `Admin Panel` > `Settings` > `Pipelines` menu, streamlining the pipeline management process.
+- �️ **Native Python Function Calling**: Access the power of Python directly within Open WebUI with native function calling. Easily integrate custom code to build unique features like custom RAG pipelines, web search tools, and even agent-like actions via a built-in code editor to seamlessly develop and integrate function code within the `Tools` and `Functions` workspace. [Learn more about Tools](/features/plugin/tools).
+
+- �📥 **Upload Pipeline**: Pipelines can be uploaded directly from the `Admin Panel` > `Settings` > `Pipelines` menu, streamlining the pipeline management process.
#### The possibilities with our Pipelines framework knows no bounds and are practically limitless. Start with a few pre-built pipelines to help you get started!
@@ -108,7 +110,7 @@ import { TopBanners } from "@site/src/components/TopBanners";
- 🎨 **Splash Screen**: A simple loading splash screen for a smoother user experience.
-- 🌐 **Personalized Interface**: Choose between a freshly designed search landing page and the classic chat UI from Settings > Interface, allowing for a tailored experience.
+- 🌐 **Personalized Interface**: Choose between a freshly designed search landing page and the classic chat UI from Settings > Interface, allowing for a tailored experience. [Explore Interface Options](/category/interface).
- 📦 **Pip Install Method**: Installation of Open WebUI can be accomplished via the command `pip install open-webui`, which streamlines the process and makes it more accessible to new users. For further information, please visit: https://pypi.org/project/open-webui/.
@@ -116,7 +118,7 @@ import { TopBanners } from "@site/src/components/TopBanners";
- 🖼️ **Custom Background Support**: Set a custom background from Settings > Interface to personalize your experience.
-- 📝 **Rich Banners with Markdown**: Create visually engaging announcements with markdown support in banners, enabling richer and more dynamic content.
+- 📝 **Rich Banners with Markdown**: Create visually engaging announcements with markdown support in banners, enabling richer and more dynamic content. [See Banners Documentation](/features/interface/banners).
- 💻 **Code Syntax Highlighting**: Our syntax highlighting feature enhances code readability, providing a clear and concise view of your code.
@@ -162,13 +164,13 @@ import { TopBanners } from "@site/src/components/TopBanners";
- 🔔 **Chat Completion Notifications**: Stay updated with instant in-UI notifications when a chat finishes in a non-active tab, ensuring you never miss a completed response.
-- 🌐 **Notification Webhook Integration**: Receive timely updates for long-running chats or external integration needs with configurable webhook notifications, even when your tab is closed.
+- 🌐 **Notification Webhook Integration**: Receive timely updates for long-running chats or external integration needs with configurable webhook notifications, even when your tab is closed. [Learn more about Webhooks](/features/interface/webhooks).
-- 📚 **Channels (Beta)**: Explore real-time collaboration between users and AIs with Discord/Slack-style chat rooms, build bots for channels, and unlock asynchronous communication for proactive multi-agent workflows.
+- 📚 **Channels (Beta)**: Explore real-time collaboration between users and AIs with Discord/Slack-style chat rooms, build bots for channels, and unlock asynchronous communication for proactive multi-agent workflows. [See Channels](/features/channels).
- 🖊️ **Typing Indicators in Channels**: Enhance collaboration with real-time typing indicators in channels, keeping everyone engaged and informed.
-- 👤 **User Status Indicators**: Quickly view a user's status by clicking their profile image in channels, providing better coordination and availability insights.
+- 👤 **User Status Indicators**: Quickly view a user's status by clicking their profile image in channels, providing better coordination and availability insights. This feature can be globally disabled by an administrator (Admin > Settings > General).
- 💬 **Chat Controls**: Easily adjust parameters for each chat session, offering more precise control over your interactions.
@@ -230,7 +232,7 @@ import { TopBanners } from "@site/src/components/TopBanners";
- Please note that the `{{USER_LOCATION}}` prompt variable requires a secure connection over HTTPS. To utilize this particular prompt variable, please ensure that `{{USER_LOCATION}}` is toggled on from the `Settings` > `Interface` menu.
- Please note that the `{{CLIPBOARD}}` prompt variables requires access to your device's clipboard.
-- 🧠 **Memory Feature**: Manually add information you want your LLMs to remember via the `Settings` > `Personalization` > `Memory` menu. Memories can be added, edited, and deleted.
+- 🧠 **Memory Feature & Tools (Experimental)**: Manage information you want your LLMs to remember via `Settings` > `Personalization` > `Memory`. Capable models can now use `add_memory`, `search_memories`, and `replace_memory_content` tools to dynamically store, retrieve, and update facts about you during chat sessions. [Learn more about Memory](/features/memory).
---
@@ -293,7 +295,7 @@ import { TopBanners } from "@site/src/components/TopBanners";
- ⚔️ **Model Evaluation Arena**: Conduct blind A/B testing of models directly from the Admin Settings for a true side-by-side comparison, making it easier to find the best model for your needs.
-- 🎯 **Topic-Based Rankings**: Discover more accurate rankings with our experimental topic-based re-ranking system, which adjusts leaderboard standings based on tag similarity in feedback.
+- 🎯 **Topic-Based Rankings**: Discover more accurate rankings with our experimental topic-based re-ranking system, which adjusts leaderboard standings based on tag similarity in feedback. [Learn more about Evaluation](/features/evaluation).
- 📂 **Unified and Collaborative Workspace** : Access and manage all your model files, prompts, documents, tools, and functions in one convenient location, while also enabling multiple users to collaborate and contribute to models, knowledge, prompts, or tools, streamlining your workflow and enhancing teamwork.
@@ -321,17 +323,17 @@ import { TopBanners } from "@site/src/components/TopBanners";
### 🎙️ Audio, Voice, & Accessibility
-- 🗣️ **Voice Input Support with Multiple Providers**: Engage with your model through voice interactions using multiple Speech-to-Text providers: Local Whisper (default, with VAD filtering), OpenAI-compatible endpoints, Deepgram, and Azure Speech Services. Enjoy the convenience of talking to your model directly with automatic voice input after 3 seconds of silence for a streamlined experience.
- - Microphone access requires manually setting up a secure connection over HTTPS to work, or [manually whitelisting your URL at your own risk](https://docs.openwebui.com/troubleshooting/microphone-error).
+- 🗣️ **Voice Input Support with Multiple Providers**: Engage with your model through voice interactions using multiple Speech-to-Text providers: Local Whisper (default, with VAD filtering), OpenAI-compatible endpoints, Deepgram, and Azure Speech Services. Enjoy the convenience of talking to your model directly with automatic voice input after 3 seconds of silence for a streamlined experience. [Explore Audio Features](/category/speech-to-text--text-to-speech).
+ - Microphone access requires manually setting up a secure connection over HTTPS to work, or [manually whitelisting your URL at your own risk](/troubleshooting/audio#solutions-for-non-https-connections).
- 😊 **Emoji Call**: Toggle this feature on from the `Settings` > `Interface` menu, allowing LLMs to express emotions using emojis during voice calls for a more dynamic interaction.
- - Microphone access requires manually setting up a secure connection over HTTPS to work, or [manually whitelisting your URL at your own risk](https://docs.openwebui.com/troubleshooting/microphone-error).
+ - Microphone access requires manually setting up a secure connection over HTTPS to work, or [manually whitelisting your URL at your own risk](/troubleshooting/audio#solutions-for-non-https-connections).
- 🎙️ **Hands-Free Voice Call Feature**: Initiate voice calls without needing to use your hands, making interactions more seamless.
- - Microphone access requires manually setting up a secure connection over HTTPS to work, or [manually whitelisting your URL at your own risk](https://docs.openwebui.com/troubleshooting/microphone-error).
+ - Microphone access requires manually setting up a secure connection over HTTPS to work, or [manually whitelisting your URL at your own risk](/troubleshooting/audio#solutions-for-non-https-connections).
- 📹 **Video Call Feature**: Enable video calls with supported vision models like LlaVA and GPT-4o, adding a visual dimension to your communications.
- - Both Camera & Microphone access is required using a secure connection over HTTPS for this feature to work, or [manually whitelisting your URL at your own risk](https://docs.openwebui.com/troubleshooting/microphone-error).
+ - Both Camera & Microphone access is required using a secure connection over HTTPS for this feature to work, or [manually whitelisting your URL at your own risk](/troubleshooting/audio#solutions-for-non-https-connections).
- 👆 **Tap to Interrupt**: Stop the AI’s speech during voice conversations with a simple tap on mobile devices, ensuring seamless control over the interaction.
@@ -362,15 +364,14 @@ import { TopBanners } from "@site/src/components/TopBanners";
### 🐍 Code Execution
- 🚀 **Versatile, UI-Agnostic, OpenAI-Compatible Plugin Framework**: Seamlessly integrate and customize [Open WebUI Pipelines](https://github.com/open-webui/pipelines) for efficient data processing and model training, ensuring ultimate flexibility and scalability.
-
-- 🛠️ **Native Python Function Calling**: Access the power of Python directly within Open WebUI with native function calling. Easily integrate custom code to build unique features like custom RAG pipelines, web search tools, and even agent-like actions via a built-in code editor to seamlessly develop and integrate function code within the `Tools` and `Functions` workspace.
-
- 🐍 **Python Code Execution**: Execute Python code locally in the browser via Pyodide with a range of libraries supported by Pyodide.
- 🌊 **Mermaid Rendering**: Create visually appealing diagrams and flowcharts directly within Open WebUI using the [Mermaid Diagramming and charting tool](https://mermaid.js.org/intro/), which supports Mermaid syntax rendering.
- 🔗 **Iframe Support**: Enables rendering HTML directly into your chat interface using functions and tools.
+
+
---
### 🔒 Integration & Security
@@ -379,7 +380,7 @@ import { TopBanners } from "@site/src/components/TopBanners";
- 🔑 **Simplified API Key Management**: Easily generate and manage secret keys to leverage Open WebUI with OpenAI libraries, streamlining integration and development.
-- 🌐 **HTTP/S Proxy Support**: Configure network settings easily using the `http_proxy` or `https_proxy` environment variable. These variables, if set, should contain the URLs for HTTP and HTTPS proxies, respectively.
+- 🌐 **HTTP/S Proxy Support**: Configure network settings easily using the `http_proxy` or `https_proxy` environment variable. These variables, if set, should contain the URLs for HTTP and HTTPS proxies, respectively. For web search content fetching behind a proxy, enable **Trust Proxy Environment** in Admin Panel > Settings > Web Search (or set `WEB_SEARCH_TRUST_ENV=True`).
- 🌐🔗 **External Ollama Server Connectivity**: Seamlessly link to an external Ollama server hosted on a different address by configuring the environment variable.
@@ -427,7 +428,7 @@ import { TopBanners } from "@site/src/components/TopBanners";
- 🔒 **Prevent Chat Deletion**: Ability for admins to toggle a setting that prevents all users from deleting their chat messages, ensuring that all chat messages are retained for audit or compliance purposes.
-- 🔗 **Webhook Integration**: Subscribe to new user sign-up events via webhook (compatible with `Discord`, `Google Chat`, `Slack` and `Microsoft Teams`), providing real-time notifications and automation capabilities.
+- 🔗 **Webhook Integration**: Subscribe to new user sign-up events via webhook (compatible with `Discord`, `Google Chat`, `Slack` and `Microsoft Teams`), providing real-time notifications and automation capabilities. [See Webhook Guide](/features/interface/webhooks).
- 📣 **Configurable Notification Banners**: Admins can create customizable banners with persistence in config.json, featuring options for content, background color (`info`, `warning`, `error`, or `success`), and dismissibility. Banners are accessible only to logged-in users, ensuring the confidentiality of sensitive information.
@@ -451,9 +452,9 @@ import { TopBanners } from "@site/src/components/TopBanners";
- 🔐 **Group-Based Access Control**: Set granular access to models, knowledge, prompts, and tools based on user groups, allowing for more controlled and secure environments.
-- 🛠️ **Granular User Permissions**: Easily manage workspace permissions, including file uploads, deletions, edits, and temporary chats, as well as model, knowledge, prompt, and tool creation.
+- 🛠️ **Granular User Permissions**: Easily manage workspace permissions, including file uploads, deletions, edits, and temporary chats, as well as model, knowledge, prompt, and tool creation. [See User Permissions Config](/getting-started/env-configuration/#user-permissions).
-- 🔑 **LDAP Authentication**: Enhance security and scalability with LDAP support for user management.
+- 🔑 **LDAP Authentication**: Enhance security and scalability with LDAP support for user management. [Learn more about LDAP](/features/auth/ldap).
- 🔐 **SCIM 2.0 Provisioning**: Automate user and group lifecycle management through SCIM 2.0 protocol integration with identity providers like Okta, Azure AD, and Google Workspace, reducing administrative overhead and ensuring synchronized user management across systems.
diff --git a/docs/features/interface/webhooks.md b/docs/features/interface/webhooks.md
index 16b017f488..b58e2eb0df 100644
--- a/docs/features/interface/webhooks.md
+++ b/docs/features/interface/webhooks.md
@@ -5,12 +5,13 @@ title: "Webhook Integrations"
## Overview
-Open WebUI offers two distinct webhook integrations to help you stay informed about events happening within your instance. These webhooks allow you to receive automated notifications in external services like Discord, Slack, or any other application that supports incoming webhooks.
+Open WebUI offers three distinct webhook integrations to help you stay informed about events happening within your instance and enable external integrations. These webhooks allow you to receive automated notifications in external services like Discord, Slack, or any other application that supports incoming webhooks, as well as post messages from external services into Open WebUI channels.
-There are two types of webhooks available:
+There are three types of webhooks available:
1. **Admin Webhook:** A system-wide webhook that notifies administrators about new user sign-ups.
2. **User Webhook:** A personal webhook that notifies individual users when a response to their chat is ready, especially useful for long-running tasks.
+3. **Channel Webhooks:** Incoming webhooks that allow external services to post messages into specific channels.
## 1. Admin Webhook: New User Notifications
@@ -102,6 +103,134 @@ When a chat response is ready and you are inactive, Open WebUI will send a `POST
}
```
+## 3. Channel Webhooks: External Message Integration
+
+Channel Webhooks allow external services, automation tools, or scripts to post messages directly into Open WebUI channels. This enables seamless integration with monitoring systems, CI/CD pipelines, notification services, or any custom automation.
+
+### Use Cases
+
+- **System Monitoring:** Post alerts from monitoring tools (Prometheus, Grafana, Nagios) directly into team channels.
+- **CI/CD Integration:** Send build status notifications from GitHub Actions, GitLab CI, or Jenkins to development channels.
+- **Custom Automation:** Integrate with n8n, Zapier, or custom scripts to automate message posting.
+- **External Notifications:** Forward notifications from external services into your Open WebUI workspace.
+
+### How it Works
+
+Each channel can have multiple webhooks. Each webhook has:
+- A unique **webhook URL** that external services can POST to
+- A **display name** shown as the message author
+- An optional **profile image** to visually identify the webhook source
+- A **last used timestamp** to track webhook activity
+
+Messages posted via webhooks appear in the channel with the webhook's identity, making it clear they came from an external source rather than a user.
+
+### Managing Channel Webhooks
+
+Only **channel managers** and **administrators** can create and manage webhooks for a channel.
+
+#### Creating a Webhook
+
+1. Navigate to the channel where you want to add a webhook.
+2. Click the channel menu (⋮) and select **Edit Channel**.
+3. In the channel settings modal, locate the **Webhooks** section.
+4. Click **Manage** to open the Webhooks modal.
+5. Click **New Webhook** to create a new webhook.
+6. Configure the webhook:
+ - **Name:** The display name that will appear as the message author
+ - **Profile Image:** (Optional) Upload an image to represent this webhook
+7. Click **Save** to create the webhook.
+8. Copy the generated webhook URL using the **Copy URL** button.
+
+#### Webhook URL Format
+
+```
+{WEBUI_API_BASE_URL}/channels/webhooks/{webhook_id}/{token}
+```
+
+This URL is unique and contains an authentication token. Anyone with this URL can post messages to the channel, so treat it securely.
+
+#### Updating a Webhook
+
+1. Open the **Webhooks** modal from the channel settings.
+2. Click on the webhook you want to edit to expand it.
+3. Modify the **Name** or **Profile Image** as needed.
+4. Click **Save** to apply changes.
+
+The webhook URL remains the same when you update the name or image. Messages posted after the update will show the new name/image, but existing messages retain the webhook identity from when they were posted.
+
+#### Deleting a Webhook
+
+1. Open the **Webhooks** modal from the channel settings.
+2. Click on the webhook you want to delete to expand it.
+3. Click the **Delete** (trash) icon.
+4. Confirm the deletion.
+
+Once deleted, the webhook URL will stop working immediately. Messages previously posted by the webhook will remain in the channel but show "Deleted Webhook" as the author.
+
+### Posting Messages via Webhook
+
+To post a message from an external service, send a `POST` request to the webhook URL with a JSON payload.
+
+#### Request Format
+
+**Endpoint:** `POST {webhook_url}`
+**Headers:** `Content-Type: application/json`
+**Body:**
+
+```json
+{
+ "content": "Your message content here"
+}
+```
+
+#### Example: Using cURL
+
+```bash
+curl -X POST "https://your-instance.com/api/channels/webhooks/{webhook_id}/{token}" \
+ -H "Content-Type: application/json" \
+ -d '{"content": "Deployment to production completed successfully! 🚀"}'
+```
+
+#### Example: Using Python
+
+```python
+import requests
+
+webhook_url = "https://your-instance.com/api/channels/webhooks/{webhook_id}/{token}"
+message = {
+ "content": "Build #1234 failed: Unit tests did not pass."
+}
+
+response = requests.post(webhook_url, json=message)
+print(response.json())
+```
+
+#### Response Format
+
+On success, the webhook will return:
+
+```json
+{
+ "success": true,
+ "message_id": "abc-123-def-456"
+}
+```
+
+### Security Considerations
+
+- **URL Protection:** Webhook URLs contain authentication tokens. Keep them secure and don't expose them in public repositories or logs.
+- **Channel Access:** Anyone with the webhook URL can post to the channel. Only share the URL with trusted services.
+- **Message Content:** Validate and sanitize message content on the sending side to prevent injection attacks.
+- **Regeneration:** If a webhook URL is compromised, delete the webhook and create a new one.
+
+### Webhook Identity
+
+Messages posted via webhooks have a special identity system:
+- They appear with the webhook's **name** and **profile image**
+- The user role is marked as **"webhook"** to distinguish from regular users
+- If a webhook is deleted, its messages remain visible but show "Deleted Webhook" with the current webhook name no longer displayed
+- Each message stores the webhook ID in its metadata, allowing proper attribution even if the webhook is later modified or deleted
+
## Troubleshooting
If you're not receiving webhook notifications, here are a few things to check:
diff --git a/docs/features/mcp.mdx b/docs/features/mcp.mdx
index 5583951152..f99959ce9b 100644
--- a/docs/features/mcp.mdx
+++ b/docs/features/mcp.mdx
@@ -23,6 +23,15 @@ You **MUST** set the `WEBUI_SECRET_KEY` environment variable in your Docker setu
You can now call tools exposed by your MCP server from Open WebUI.
+:::warning Common Mistake: Wrong Connection Type
+If you are adding an MCP server, make sure **Type** is set to **MCP (Streamable HTTP)**, not **OpenAPI**.
+
+Entering MCP-style configuration (with `mcpServers` in JSON) into an OpenAPI connection will cause the UI to crash or display an infinite loading screen. If you encounter this:
+
+1. Disable the problematic tool connection via Admin Settings
+2. Re-add it with the correct **Type** set to **MCP**
+:::
+
## 🧭 When to use MCP vs OpenAPI
:::tip
@@ -46,6 +55,51 @@ Choose **MCP (Streamable HTTP)** if you need:
Browser-based, multi-user deployments increase the surface area (CORS/CSRF, per-user isolation, reconnects). Review your org’s auth, proxy, and rate-limiting policies before exposing MCP externally.
:::
+## ⚙️ Configuration Best Practices
+
+### Authentication Modes
+
+* **None**: Use this for **local MCP servers** or internal networks where no token is required.
+ * **⚠️ Important**: Default to "None" unless your server strictly requires a token. Selecting "Bearer" without providing a key sends an empty Authorization header (`Authorization: Bearer`), which causes many servers to reject the connection immediately.
+* **Bearer**: Use this **only** if your MCP server requires a specific API token. You **must** populate the "Key" field.
+* **OAuth 2.1**: For secured, enterprise deployments requiring Identity Provider flows.
+
+### Connection URLs
+
+If you are running Open WebUI in **Docker** and your MCP server is on the **host machine**:
+* Use `http://host.docker.internal:` (e.g., `http://host.docker.internal:3000/sse`) instead of `localhost`.
+
+### Function Name Filter List
+
+This field restricts which tools are exposed to the LLM.
+* **Default**: Leave empty to expose all tools (in most cases).
+* **Workaround**: If you encounter connection errors with an empty list, try adding a single comma (`,`) to this field. This forces the system to treat it as a valid (but empty) filter, potentially bypassing some parsing issues.
+
+## Troubleshooting
+
+### "Failed to connect to MCP server"
+
+**Symptom**:
+The chat shows "Failed to connect to MCP server" when using a tool, even if the **Verify Connection** button in settings says "Connected".
+
+**Solutions**:
+1. **Check Authentication**: Ensure you haven't selected `Bearer` without a key. Switch to `None` if no token is needed.
+2. **Filter List Bug**: If the "Function Name Filter List" is empty, try adding a comma (`,`) to it.
+
+### Infinite loading screen after adding External Tool
+
+**Symptom**:
+After adding an External Tool connection, the frontend gets stuck on a loading spinner. The browser console shows an error like `Cannot convert undefined or null to object at Object.entries`.
+
+**Cause**:
+You likely configured an **MCP server** using the **OpenAPI** connection type, or entered MCP-style JSON (containing `mcpServers`) into an OpenAPI connection.
+
+**Solution**:
+1. Open **Admin Settings → External Tools** (the sidebar still loads)
+2. **Disable** or **delete** the problematic tool connection
+3. Refresh the page (Ctrl+F5)
+4. Re-add the connection with the correct **Type** set to **MCP (Streamable HTTP)**
+
## ❓ FAQ
**Do you support stdio or SSE transports?**
diff --git a/docs/features/memory.mdx b/docs/features/memory.mdx
new file mode 100644
index 0000000000..24748ff73f
--- /dev/null
+++ b/docs/features/memory.mdx
@@ -0,0 +1,69 @@
+---
+sidebar_position: 700
+title: "Memory & Personalization"
+---
+
+# Memory & Personalization 🧠
+
+:::warning Experimental Feature
+The Memory system is currently in **Beta/Experimental** stage. You may encounter inconsistencies in how models store or retrieve information, and storage formats may change in future updates.
+:::
+
+Open WebUI includes a sophisticated memory system that allows models to remember facts, preferences, and context across different conversations.
+With the introduction of **Native Tool Calling**, this system has been upgraded from a passive injection mechanism to an active, model-managed "long-term memory."
+
+## How it Works
+
+The memory system stores snippets of information about you (e.g., "I prefer Python for backend tasks" or "I live in Vienna"). There are two ways these memories are used:
+
+### 1. Manual Management (Settings)
+
+Users can manually add, edit, or delete memories by navigating to:
+**Settings > Personalization > Memory**
+
+### 2. Native Memory Tools (Agentic Mode)
+
+When using a model with **Native Function Calling (Agentic Mode)** enabled, quality models can manage your memory autonomously using three built-in tools. For a detailed breakdown of how administrators can configure and manage these system-level tools, see the [**Central Tool Calling Guide**](/features/plugin/tools#tool-calling-modes-default-vs-native).
+
+:::tip Quality Models for Memory Management
+Autonomous memory management works best with frontier models (GPT-5, Claude 4.5+, Gemini 3+) that can intelligently decide what facts are worth saving and when to recall relevant memories. Small local models may struggle with appropriate memory selection.
+:::
+
+- **`add_memory`**: Allows the model to proactively save a new fact it learned about you during the conversation.
+- **`search_memories`**: Allows the model to search your memory bank for relevant context. Results include a unique `id` for each memory snippet. The model can optionally specify how many memories to return (default is 5).
+- **`replace_memory_content`**: Allows the model to update or correct a specific existing memory using its `id`.
+
+## Benefits of the New Memory System
+
+- **Proactive Learning**: Instead of you manually typing preferences, a model can say: *"I'll remember that you prefer dark mode for your UI projects"* and call `add_memory` behind the scenes.
+- **Contextual Retrieval**: If a conversation drifts into a topic mentioned months ago, the model can "search its brain" using `search_memories` to find those past details.
+- **Dynamic Correction**: If the model remembers something incorrectly, it can use `replace_memory_content` to fix the fact rather than creating a duplicate.
+- **User Control**: Even though models can add memories, users retain full control. Every memory added by a model can be reviewed and deleted in the Personalization settings.
+
+## Enabling Memory Tools
+
+1. **Administrative Enablement**: Ensure the Memory feature is [enabled globally](#administrative-controls) by an administrator and that you have the required permissions.
+2. **Native Mode (Agentic Mode)**: Enable **Native Function Calling** in the model's advanced parameters (**Admin Panel > Settings > Models > Model Specific Settings > Advanced Parameters**).
+3. **Quality Models Required**: To unlock these features effectively, use frontier models with strong reasoning capabilities (e.g., GPT-5, Claude 4.5 Sonnet, Gemini 3 Flash, MiniMax M2.1) for the best experience. Small local models may not effectively manage memories autonomously.
+
+:::info Central Tool Documentation
+For complete details on all built-in agentic tools (including memory, web search, and knowledge bases) and how to configure them, see the [**Native/Agentic Mode Tools Guide**](/features/plugin/tools#built-in-system-tools-nativeagentic-mode).
+:::
+
+## Administrative Controls
+
+Administrators have full control over the Memory feature, including the ability to disable it globally or restrict it to specific user groups.
+
+### Global Toggle
+The Memory feature can be toggled on or off for the entire instance. When disabled, the "Personalization" tab is hidden from all users, and the memory-related API endpoints are blocked.
+- **Admin UI**: Admin Panel > Settings > General > Features > **Memories**
+- **Environment Variable**: [`ENABLE_MEMORIES`](/getting-started/env-configuration#enable_memories) (Default: `True`)
+
+### Granular Permissions
+Administrators can also control Memory access on a per-role or per-group basis from the Permissions interface.
+- **Admin UI**: Admin Panel > Users > Permissions > Features > **Memories**
+- **Environment Variable**: [`USER_PERMISSIONS_FEATURES_MEMORIES`](/getting-started/env-configuration#user_permissions_features_memories) (Default: `True`)
+
+## Privacy & Security
+
+Memories are stored locally in your Open WebUI database and are specific to your user account. They are never shared across users, and you can clear your entire memory bank at any time.
diff --git a/docs/features/notes.md b/docs/features/notes.md
index da5293621a..88ec453ac6 100644
--- a/docs/features/notes.md
+++ b/docs/features/notes.md
@@ -161,6 +161,21 @@ These can also be configured in **Admin Panel > Settings > Users > Default Permi
---
+## Native Note Management (Agentic)
+
+If you are using a model with **Native Function Calling** enabled (see the [**Central Tool Calling Guide**](/features/plugin/tools#tool-calling-modes-default-vs-native)), the AI can interact with your Notes workspace autonomously using built-in system tools.
+
+### Available Note Tools:
+- **`search_notes`**: The model can search your entire library of notes by title and content.
+- **`view_note`**: After finding a note, the model can "read" its full markdown content.
+- **`write_note`**: The model can proactively create new notes for you (e.g., "I'll save this summary as a new note for you").
+- **`replace_note_content`**: The model can update existing notes (e.g., adding a new item to a todo list note).
+
+### Why use native tool calling for Notes?
+This transforms Notes from a static reference into a dynamic **Long-Term Memory** and **Task Management** system. Instead of manually copying and pasting, you can simply tell the model: *"Search my 'Project X' notes and find the database schema,"* or *"Add a new task to my 'Weekly Todo' note to review the PR."*
+
+---
+
## Use Cases
While Open WebUI has dedicated **Prompts** (for slash commands) and **Documents** (for RAG), **Notes** serves a unique middle ground for iterative work and precise control.
@@ -200,8 +215,8 @@ Because attaching a Note injects the full text into the chat:
* If you have a very large note (e.g., 10,000 words) and attach it to a model with a small context window (e.g., 8k tokens), the model may run out of memory or "forget" the beginning of your conversation.
-### Read-Only Context
+### Write Support (Native Mode)
-When you attach a Note to a standard chat, it is **read-only** for the AI.
+By default, when you manually attach a Note to a chat, it is **read-only**. However, in **Native Mode**, if the model has permission to use the `replace_note_content` tool, it can **actively modify** your notes.
-* The AI in the main chat cannot automatically update the text inside your Note file. If the AI suggests changes to your project, you must manually copy those changes back into the Note editor.
+* **Security Note**: This means a model could potentially overwrite content in your notes if instructed (or if it decides it's necessary for the task). Always review changes and utilize the **Undo/Redo** arrows in the Note editor if an AI makes an unwanted modification.
diff --git a/docs/features/pipelines/filters.md b/docs/features/pipelines/filters.md
index 05c48a7e0d..24c197c02b 100644
--- a/docs/features/pipelines/filters.md
+++ b/docs/features/pipelines/filters.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 1
+sidebar_position: 2
title: "Filters"
---
diff --git a/docs/features/pipelines/index.mdx b/docs/features/pipelines/index.mdx
index 7e55807f8e..354ba4b62d 100644
--- a/docs/features/pipelines/index.mdx
+++ b/docs/features/pipelines/index.mdx
@@ -1,5 +1,5 @@
---
-sidebar_position: 1000
+sidebar_position: 1
title: "Pipelines"
---
diff --git a/docs/features/pipelines/pipes.md b/docs/features/pipelines/pipes.md
index 4e66e73357..849b297b4c 100644
--- a/docs/features/pipelines/pipes.md
+++ b/docs/features/pipelines/pipes.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 2
+sidebar_position: 3
title: "Pipes"
---
diff --git a/docs/features/pipelines/tutorials.md b/docs/features/pipelines/tutorials.md
index bbaed36d38..9d7302b78d 100644
--- a/docs/features/pipelines/tutorials.md
+++ b/docs/features/pipelines/tutorials.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 7
+sidebar_position: 5
title: "Tutorials"
---
diff --git a/docs/features/pipelines/valves.md b/docs/features/pipelines/valves.md
index f99aee731e..05c2abbb92 100644
--- a/docs/features/pipelines/valves.md
+++ b/docs/features/pipelines/valves.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 3
+sidebar_position: 4
title: "Valves"
---
diff --git a/docs/features/plugin/development/events.mdx b/docs/features/plugin/development/events.mdx
index 09f66ab11c..e8a9ec4e71 100644
--- a/docs/features/plugin/development/events.mdx
+++ b/docs/features/plugin/development/events.mdx
@@ -96,7 +96,8 @@ Below is a comprehensive table of **all supported `type` values** for events, al
| `notification` | Show a notification ("toast") in the UI | `{type: "info" or "success" or "error" or "warning", content: "..."}` |
| `confirmation`
(needs `__event_call__`) | Ask for confirmation (OK/Cancel dialog) | `{title: "...", message: "..."}` |
| `input`
(needs `__event_call__`) | Request simple user input ("input box" dialog) | `{title: "...", message: "...", placeholder: "...", value: ...}` |
-| `execute`
(needs `__event_call__`) | Request user-side code execution and return result | `{code: "...javascript code..."}` | |
+| `execute`
(needs `__event_call__`) | Request user-side code execution and return result | `{code: "...javascript code..."}` |
+| `chat:message:favorite` | Update the favorite/pin status of a message | `{"favorite": bool}` |
**Other/Advanced types:**
@@ -268,6 +269,33 @@ await __event_emitter__(
---
+### `chat:message:favorite`
+
+**Update the favorite/pin status of a message:**
+
+```python
+await __event_emitter__(
+ {
+ "type": "chat:message:favorite",
+ "data": {
+ "favorite": True # or False to unpin
+ }
+ }
+)
+```
+
+**What this does exactly:**
+This event forces the Open WebUI frontend to update the "favorite" state of a message in its local cache. Without this emitter, if an **Action Function** modifies the `message.favorite` field in the database directly, the frontend (which maintains its own state) might overwrite your change during its next auto-save cycle. This emitter ensures the UI and database stay perfectly in sync.
+
+**Where it appears:**
+* **Message Toolbar**: When set to `True`, the "Heart" icon beneath the message will fill in, indicating it is favorited.
+* **Chat Overview**: Favorited messages (pins) are highlighted in the conversation overview, making it easier for users to locate key information later.
+
+#### Example: "Pin Message" Action
+For a practical implementation of this event in a real-world plugin, see the **[Pin Message Action on Open WebUI Community](https://openwebui.com/posts/pin_message_action_143594d1)**. This action demonstrates how to toggle the favorite status in the database and immediately sync the UI using the `chat:message:favorite` event.
+
+---
+
### `confirmation` (**requires** `__event_call__`)
**Show a confirm dialog and get user response:**
@@ -421,4 +449,4 @@ Refer to this document for common event types and structures, and explore Open W
---
-**Happy event-driven coding in Open WebUI! 🚀**
\ No newline at end of file
+**Happy event-driven coding in Open WebUI! 🚀**
diff --git a/docs/features/plugin/functions/action.mdx b/docs/features/plugin/functions/action.mdx
index a17d8ee6d0..01ee144565 100644
--- a/docs/features/plugin/functions/action.mdx
+++ b/docs/features/plugin/functions/action.mdx
@@ -1,10 +1,14 @@
---
-sidebar_position: 3
+sidebar_position: 2
title: "Action Function"
---
Action functions allow you to write custom buttons that appear in the message toolbar for end users to interact with. This feature enables more interactive messaging, allowing users to grant permission before a task is performed, generate visualizations of structured data, download an audio snippet of chats, and many other use cases.
+:::warning Use Async Functions for Future Compatibility
+Action functions should always be defined as `async`. The backend is progressively moving toward fully async execution, and synchronous functions may block execution or cause issues in future releases.
+:::
+
Actions are admin-managed functions that extend the chat interface with custom interactive capabilities. When a message is generated by a model that has actions configured, these actions appear as clickable buttons beneath the message.
A scaffold of Action code can be found [in the community section](https://openwebui.com/f/hub/custom_action/). For more Action Function examples built by the community, visit [https://openwebui.com/search](https://openwebui.com/search).
diff --git a/docs/features/plugin/functions/filter.mdx b/docs/features/plugin/functions/filter.mdx
index 2f91efc1ae..aa365f3ec9 100644
--- a/docs/features/plugin/functions/filter.mdx
+++ b/docs/features/plugin/functions/filter.mdx
@@ -1,5 +1,5 @@
---
-sidebar_position: 2
+sidebar_position: 3
title: "Filter Function"
---
@@ -342,6 +342,72 @@ Here's the complete flow from admin configuration to filter execution:
---
+### ⚡ Filter Priority & Execution Order
+
+When multiple filters are active, they execute in a specific order determined by their **priority** value. Understanding this is crucial when building filter chains where one filter depends on another's changes.
+
+#### Setting Filter Priority
+
+Priority is configured via the `Valves` class using a `priority` field:
+
+```python
+class Filter:
+ class Valves(BaseModel):
+ priority: int = Field(
+ default=0,
+ description="Filter execution order. Lower values run first."
+ )
+
+ def __init__(self):
+ self.valves = self.Valves()
+
+ def inlet(self, body: dict) -> dict:
+ # This filter's execution order depends on its priority value
+ return body
+```
+
+#### Priority Ordering Rules
+
+| Priority Value | Execution Order |
+|---------------|-----------------|
+| `0` (default) | Runs first |
+| `1` | Runs after priority 0 |
+| `2` | Runs after priority 1 |
+
+:::tip Lower Priority = Earlier Execution
+Filters are sorted in **ascending** order by priority. A filter with `priority=0` runs **before** a filter with `priority=1`, which runs before `priority=2`, and so forth.
+:::
+
+---
+
+### 🔗 Data Passing Between Filters
+
+When multiple filters are active, each filter in the chain receives the **modified data from the previous filter**. The returned value from one filter becomes the input to the next filter in the priority order.
+
+```
+User Input
+ ↓
+Model Router Filter (priority=0) → changes parts of the body
+ ↓
+Context Manager Filter (priority=1) → receives modified body ✓
+ ↓
+Logging Filter (priority=2) → receives body with all previous changes ✓
+ ↓
+LLM Request (sends final modified body to OpenAI/Ollama API)
+```
+
+:::warning Important: Always Return the Body
+If your filter modifies the `body`, you **must** return it. The returned value is passed to the next filter. If you return `None`, subsequent filters will fail.
+
+```python
+async def inlet(self, body: dict, __event_emitter__) -> dict:
+ body["messages"].append({"role": "system", "content": "Hello"})
+ return body # Don't forget this!
+```
+:::
+
+---
+
### 🎨 UI Indicators & Visual Feedback
#### In the Admin Functions Panel
diff --git a/docs/features/plugin/functions/index.mdx b/docs/features/plugin/functions/index.mdx
index 227d697cd7..e3f719a762 100644
--- a/docs/features/plugin/functions/index.mdx
+++ b/docs/features/plugin/functions/index.mdx
@@ -130,4 +130,4 @@ Whether you’re customizing workflows for specific projects, integrating extern
- [Filter Functions Guide](./filter.mdx)
- [Action Functions Guide](./action.mdx)
-By leveraging Functions, you’ll bring entirely new capabilities to your Open WebUI setup. Start experimenting today! 🚀
\ No newline at end of file
+By leveraging Functions, you’ll bring entirely new capabilities to your Open WebUI setup. Start experimenting today! 🚀
diff --git a/docs/features/plugin/functions/pipe.mdx b/docs/features/plugin/functions/pipe.mdx
index 1919745ccc..5898da2d77 100644
--- a/docs/features/plugin/functions/pipe.mdx
+++ b/docs/features/plugin/functions/pipe.mdx
@@ -1,11 +1,15 @@
---
-sidebar_position: 1
+sidebar_position: 4
title: "Pipe Function"
---
# 🚰 Pipe Function: Create Custom "Agents/Models"
Welcome to this guide on creating **Pipes** in Open WebUI! Think of Pipes as a way to **adding** a new model to Open WebUI. In this document, we'll break down what a Pipe is, how it works, and how you can create your own to add custom logic and processing to your Open WebUI models. We'll use clear metaphors and go through every detail to ensure you have a comprehensive understanding.
+:::warning Use Async Functions for Future Compatibility
+Pipe functions should generally be defined as `async` to ensure compatibility with future Open WebUI versions. The backend is progressively moving toward fully async execution, and synchronous functions may block execution or cause issues in future releases. When in doubt, make your `pipe` function async.
+:::
+
## Introduction to Pipes
Imagine Open WebUI as a **plumbing system** where data flows through pipes and valves. In this analogy:
diff --git a/docs/features/plugin/tools/development.mdx b/docs/features/plugin/tools/development.mdx
index 1ae40d7cd9..0b315843b7 100644
--- a/docs/features/plugin/tools/development.mdx
+++ b/docs/features/plugin/tools/development.mdx
@@ -7,6 +7,10 @@ title: "Development"
Toolkits are defined in a single Python file, with a top level docstring with metadata and a `Tools` class.
+:::warning Use Async Functions for Future Compatibility
+Tool methods should generally be defined as `async` to ensure compatibility with future Open WebUI versions. The backend is progressively moving toward fully async execution, and synchronous functions may block execution or cause issues in future releases.
+:::
+
### Example Top-Level Docstring
```python
@@ -118,26 +122,86 @@ Event Emitters are used to add additional information to the chat interface. Sim
Event Emitter behavior is **significantly different** depending on your function calling mode. The function calling mode is controlled by the `function_calling` parameter:
- **Default Mode**: Uses traditional function calling approach with wider model compatibility
-- **Native Mode**: Leverages model's built-in tool-calling capabilities for reduced latency
+- **Native Mode (Agentic Mode)**: Leverages model's built-in tool-calling capabilities for reduced latency and autonomous behavior
Before using event emitters, you must understand these critical limitations:
- **Default Mode** (`function_calling = "default"`): Full event emitter support with all event types working as expected
-- **Native Mode** (`function_calling = "native"`): **Limited event emitter support** - many event types don't work properly due to native function calling bypassing Open WebUI's custom tool processing pipeline
+- **Native Mode (Agentic Mode)** (`function_calling = "native"`): **Limited event emitter support** - many event types don't work properly due to native function calling bypassing Open WebUI's custom tool processing pipeline
**When to Use Each Mode:**
-- **Use Default Mode** when you need full event emitter functionality, complex tool interactions, or real-time UI updates
-- **Use Native Mode** when you need reduced latency and basic tool calling without complex UI interactions
+For a comprehensive guide on choosing a function calling mode, including model requirements and administrator setup, refer to the [**Central Tool Calling Guide**](/features/plugin/tools#tool-calling-modes-default-vs-native).
+
+In general:
+- **Use Default Mode** when you need full event emitter functionality, complex tool interactions, or real-time UI updates.
+- **Use Native Mode (Agentic Mode)** when you have a quality model and need reduced latency, autonomous tool selection, and system-level tools (Agentic Research, Knowledge Base exploration, Memory) without complex custom emitter requirements.
#### Function Calling Mode Configuration
You can configure the function calling mode in two places:
-1. **Model Settings**: Go to Model page → Advanced Params → Function Calling (set to "Default" or "Native")
-2. **Per-request basis**: Set `params.function_calling = "native"` or `"default"` in your request
+1. **Administrator Level**: Go to **Admin Panel > Settings > Models > Model Specific Settings > Advanced Parameters > Function Calling** (set to "Default" or "Native").
+2. **Per-request basis**: Set `params.function_calling = "native"` or `"default"` in Chat Controls > Advanced Params.
If the model seems to be unable to call the tool, make sure it is enabled (either via the Model page or via the `+` sign next to the chat input field).
+### Built-in System Tools (Native/Agentic Mode)
+
+When **Native Mode (Agentic Mode)** is enabled, Open WebUI automatically injects built-in system tools based on the features enabled for the chat. This enables powerful agentic behaviors where capable models (like GPT-5, Claude 4.5 Sonnet, Gemini 3 Flash, or MiniMax M2.1) can perform multi-step research, explore knowledge bases autonomously, or manage user memory dynamically.
+
+:::warning Quality Models Required for Agentic Behavior
+Agentic tool calling requires **high-quality frontier models** to work reliably. Small local models often struggle with the complex reasoning, proper JSON formatting, and multi-step planning required for effective tool use. For production agentic workflows, use models like **GPT-5**, **Claude 4.5+**, **Gemini 3+**, or **MiniMax M2.1**. Small local models may work better with **Default Mode** instead.
+:::
+
+#### Available Built-in Tools
+
+| Tool | Purpose | Requirements |
+|------|---------|--------------|
+| **Search & Web** | | |
+| `search_web` | Search the public web for information. Best for current events, external references, or topics not covered in internal documents. | `ENABLE_WEB_SEARCH` enabled. |
+| `fetch_url` | Visits a URL and extracts text content via the Web Loader. | Part of Web Search feature. |
+| **Knowledge Base** | | |
+| `list_knowledge_bases` | List the user's accessible knowledge bases with file counts. | Always available. |
+| `search_knowledge_bases` | Search knowledge bases by name and description. | Always available. |
+| `search_knowledge_files` | Search files across accessible knowledge bases by filename. | Always available. |
+| `view_knowledge_file` | Get the full content of a file from a knowledge base. | Always available. |
+| `query_knowledge_bases` | Search internal knowledge bases using semantic/vector search. Should be your first choice for finding information before searching the web. | Always available. |
+| **Image Gen** | | |
+| `generate_image` | Generates a new image based on a prompt (supports `steps`). | `ENABLE_IMAGE_GENERATION` enabled. |
+| `edit_image` | Edits an existing image based on a prompt and URL. | `ENABLE_IMAGE_EDIT` enabled.|
+| **Memory** | | |
+| `search_memories` | Searches the user's personal memory/personalization bank. | Memory feature enabled. |
+| `add_memory` | Stores a new fact in the user's personalization memory. | Memory feature enabled. |
+| `replace_memory_content` | Updates an existing memory record by its unique ID. | Memory feature enabled. |
+| **Notes** | | |
+| `search_notes` | Search the user's notes by title and content. | `ENABLE_NOTES` enabled. |
+| `view_note` | Get the full markdown content of a specific note. | `ENABLE_NOTES` enabled. |
+| `write_note` | Create a new private note for the user. | `ENABLE_NOTES` enabled. |
+| `replace_note_content` | Update an existing note's content or title. | `ENABLE_NOTES` enabled. |
+| **Chat History** | | |
+| `search_chats` | Search across the user's previous conversation history. | Always available. |
+| `view_chat` | Retrieve the full message history of a specific previous chat. | Always available. |
+| **Channels** | | |
+| `search_channels` | Find public or accessible channels by name/description. | `ENABLE_CHANNELS` enabled. |
+| `search_channel_messages` | Search for specific messages inside accessible channels. | `ENABLE_CHANNELS` enabled. |
+| `view_channel_message` | View a specific message or its details in a channel. | `ENABLE_CHANNELS` enabled. |
+| `view_channel_thread` | View a full message thread/replies in a channel. | `ENABLE_CHANNELS` enabled. |
+| **Time Tools** | | |
+| `get_current_timestamp` | Get the current UTC Unix timestamp and ISO date. | Always available. |
+| `calculate_timestamp` | Calculate relative timestamps (e.g., "3 days ago"). | Always available. |
+
+#### Why Use Built-in Tools?
+- **Agentic Research**: Models can invoke `search_web` multiple times to refine results, then use `fetch_url` to read specific deep-dive articles.
+- **Knowledge Base Search**: Models can use `query_knowledge_bases` to search internal documents semantically, or `list_knowledge_bases` and `view_knowledge_file` to browse and read specific files.
+- **Contextual Awareness**: Models can search your previous chat history or notes to find specific details without manual copy-pasting.
+- **Dynamic Personalization**: Models can proactively store important facts about the user using `add_memory` during the conversation.
+- **Improved Context Selection**: Instead of forcing a search before every prompt, the model decides *when* a search or retrieval is actually necessary.
+
+:::info Complete Tool Reference
+This table provides a quick reference for developers. For the complete user-facing guide on how to enable and use these tools, see the [**Tool Calling Modes Guide**](/features/plugin/tools#tool-calling-modes-default-vs-native).
+:::
+
+
#### Complete Event Type Compatibility Matrix
Here's the comprehensive breakdown of how each event type behaves across function calling modes:
diff --git a/docs/features/plugin/tools/index.mdx b/docs/features/plugin/tools/index.mdx
index 18636359b4..fd8f7d1e2c 100644
--- a/docs/features/plugin/tools/index.mdx
+++ b/docs/features/plugin/tools/index.mdx
@@ -1,19 +1,19 @@
---
-sidebar_position: 2
+sidebar_position: 1
title: "Tools"
---
-# ⚙️ What are Tools?
+# What are Tools?
-Tools are the various ways you can extend an LLM's capabilities beyond simple text generation. When enabled, they allow your chatbot to do amazing things — like search the web, scrape data, generate images, talk back using AI voices, and more.
+⚙️ Tools are the various ways you can extend an LLM's capabilities beyond simple text generation. When enabled, they allow your chatbot to do amazing things — like search the web, scrape data, generate images, talk back using AI voices, and more.
Because there are several ways to integrate "Tools" in Open WebUI, it's important to understand which type you are using.
---
-## 🧩 Tooling Taxonomy: Which "Tool" are you using?
+## Tooling Taxonomy: Which "Tool" are you using?
-Users often encounter the term "Tools" in different contexts. Here is how to distinguish them:
+🧩 Users often encounter the term "Tools" in different contexts. Here is how to distinguish them:
| Type | Location in UI | Best For... | Source |
| :--- | :--- | :--- | :--- |
@@ -26,9 +26,13 @@ Users often encounter the term "Tools" in different contexts. Here is how to dis
### 1. Native Features (Built-in)
These are deeply integrated into Open WebUI and generally don't require external scripts.
- **Web Search**: Integrated via engines like SearXNG, Google, or Tavily.
+- **URL Fetching**: Extract text content directly from websites using `#` or native tools.
- **Image Generation**: Integrated with DALL-E, ComfyUI, or Automatic1111.
+- **Memory**: The ability for models to remember facts about you across chats.
- **RAG (Knowledge)**: The ability to query uploaded documents (`#`).
+In [**Native Mode**](#built-in-system-tools-nativeagentic-mode), these features are exposed as **Tools** that the model can call independently.
+
### 2. Workspace Tools (Custom Plugins)
These are **Python scripts** that run directly within the Open WebUI environment.
- **Capability**: Can do anything Python can do (web scraping, complex math, API calls).
@@ -36,8 +40,9 @@ These are **Python scripts** that run directly within the Open WebUI environment
- **Safety**: Always review code before importing, as these run on your server.
- **⚠️ Security Warning**: Normal or untrusted users should **not** be given permission to access the Workspace Tools section. This access allows a user to upload and execute arbitrary Python code on your server, which could lead to a full system compromise.
-### 3. MCP (Model Context Protocol) 🔌
-MCP is an open standard that allows LLMs to interact with external data and tools.
+### 3. MCP (Model Context Protocol)
+
+🔌 MCP is an open standard that allows LLMs to interact with external data and tools.
- **Native HTTP MCP**: Open WebUI can connect directly to any MCP server that exposes an HTTP/SSE endpoint.
- **MCPO (Proxy)**: Most community MCP servers use `stdio` (local command line). To use these in Open WebUI, you use the [**MCPO Proxy**](../../plugin/tools/openapi-servers/mcp.mdx) to bridge the connection.
@@ -46,9 +51,9 @@ Generic web servers that provide an OpenAPI (`.json` or `.yaml`) specification.
---
-## 📦 How to Install & Manage Workspace Tools
+## How to Install & Manage Workspace Tools
-Workspace Tools are the most common way to extend your instance with community features.
+📦 Workspace Tools are the most common way to extend your instance with community features.
1. Go to [Community Tool Library](https://openwebui.com/search)
2. Choose a Tool, then click the **Get** button.
@@ -61,9 +66,9 @@ Never import a Tool you don’t recognize or trust. These are Python scripts and
---
-## 🔧 How to Use Tools in Chat
+## How to Use Tools in Chat
-Once installed or connected, here’s how to enable them for your conversations:
+🔧 Once installed or connected, here’s how to enable them for your conversations:
### Option 1: Enable on-the-fly (Specific Chat)
While chatting, click the **➕ (plus)** icon in the input area. You’ll see a list of available Tools — you can enable them specifically for that session.
@@ -79,32 +84,158 @@ You can also let your LLM auto-select the right Tools using the [**AutoTool Filt
---
-## 🧠 Choosing How Tools Are Used: Default vs Native
+## Tool Calling Modes: Default vs. Native
+
+Open WebUI offers two distinct ways for models to interact with tools: a standard **Default Mode** and a high-performance **Native Mode (Agentic Mode)**. Choosing the right mode depends on your model's capabilities and your performance requirements.
+
+### 🟡 Default Mode (Prompt-based)
+In Default Mode, Open WebUI manages tool selection by injecting a specific prompt template that guides the model to output a tool request.
+- **Compatibility**: Works with **practically any model**, including older or smaller local models that lack native function-calling support.
+- **Flexibility**: Highly customizable via prompt templates.
+- **Caveat**: Can be slower (requires extra tokens) and less reliable for complex, multi-step tool chaining.
+
+### 🟢 Native Mode (Agentic Mode / System Function Calling)
+Native Mode (also called **Agentic Mode**) leverages the model's built-in capability to handle tool definitions and return structured tool calls (JSON). This is the **recommended mode** for high-performance agentic workflows.
+
+:::warning Model Quality Matters
+**Agentic tool calling requires high-quality models to work reliably.** While small local models may technically support function calling, they often struggle with the complex reasoning required for multi-step tool usage. For best results, use frontier models like **GPT-5**, **Claude 4.5 Sonnet**, **Gemini 3 Flash**, or **MiniMax M2.1**. Small local models may produce malformed JSON or fail to follow the strict state management required for agentic behavior.
+:::
+
+#### Why use Native Mode (Agentic Mode)?
+- **Speed & Efficiency**: Lower latency as it avoids bulky prompt-based tool selection.
+- **Reliability**: Higher accuracy in following tool schemas (with quality models).
+- **Multi-step Chaining**: Essential for **Agentic Research** and **Interleaved Thinking** where a model needs to call multiple tools in succession.
+- **Autonomous Decision-Making**: Models can decide when to search, which tools to use, and how to combine results.
-Once Tools are enabled, Open WebUI gives you two different ways to let your LLM interact with them. You can switch this via the chat settings:
+#### How to Enable Native Mode (Agentic Mode)
+Native Mode can be enabled at two levels:
+
+1. **Global/Administrator Level (Recommended)**:
+ * Navigate to **Admin Panel > Settings > Models**.
+ * Scroll to **Model Specific Settings** for your target model.
+ * Under **Advanced Parameters**, find the **Function Calling** dropdown and select `Native`.
+2. **Per-Chat Basis**:
+ * Inside a chat, click the ⚙️ **Chat Controls** icon.
+ * Go to **Advanced Params** and set **Function Calling** to `Native`.

-1. Open a chat with your model.
-2. Click ⚙️ **Chat Controls > Advanced Params**.
-3. Look for the **Function Calling** setting and switch between:
-### 🟡 Default Mode (Prompt-based)
-Here, your LLM doesn’t need to natively support function calling. We guide the model using a smart tool-selection prompt template to select and use a Tool.
-- ✅ Works with **practically any model** (including smaller local models).
-- 💡 **Admin Note**: You can also toggle the default mode for each specific model in the **Admin Panel > Settings > Models > Advanced Parameters**.
-- ❗ Not as reliable as Native Mode when chaining multiple complex tools.
+#### Model Requirements & Caveats
+
+:::tip Recommended Models for Agentic Mode
+For reliable agentic tool calling, use high-tier frontier models:
+- **GPT-5** (OpenAI)
+- **Claude 4.5 Sonnet** (Anthropic)
+- **Gemini 3 Flash** (Google)
+- **MiniMax M2.1**
+
+These models excel at multi-step reasoning, proper JSON formatting, and autonomous tool selection.
+:::
+
+- **Large Local Models**: Some large local models (e.g., Qwen 3 32B, Llama 3.3 70B) can work with Native Mode, but results vary significantly by model quality.
+- **Small Local Models Warning**: **Small local models** (under 30B parameters) often struggle with Native Mode. They may produce malformed JSON, fail to follow strict state management, or make poor tool selection decisions. For these models, **Default Mode** is usually more reliable.
+
+| Feature | Default Mode | Native Mode |
+|:---|:---|:---|
+| **Latency** | Medium/High | Low |
+| **Model Compatibility** | Universal | Requires Tool-Calling Support |
+| **Logic** | Prompt-based (Open WebUI) | Model-native (API/Ollama) |
+| **Complex Chaining** | ⚠️ Limited | ✅ Excellent |
+
+### Built-in System Tools (Native/Agentic Mode)
+
+🛠️ When **Native Mode (Agentic Mode)** is enabled, Open WebUI automatically injects powerful system tools based on the features toggled for the chat. This unlocks truly agentic behaviors where capable models (like GPT-5, Claude 4.5 Sonnet, Gemini 3 Flash, or MiniMax M2.1) can perform multi-step research, explore knowledge bases, or manage user memory autonomously.
+
+| Tool | Purpose | Requirements |
+|------|---------|--------------|
+| **Search & Web** | | |
+| `search_web` | Search the public web for information. Best for current events, external references, or topics not covered in internal documents. | `ENABLE_WEB_SEARCH` enabled. |
+| `fetch_url` | Visits a URL and extracts text content via the Web Loader. | Part of Web Search feature. |
+| **Knowledge Base** | | |
+| `list_knowledge_bases` | List the user's accessible knowledge bases with file counts. | Always available. |
+| `query_knowledge_bases` | Search knowledge bases by semantic similarity to query. Finds KBs whose name/description match the meaning of your query. Use this to discover relevant knowledge bases before querying their files. | Always available. |
+| `search_knowledge_bases` | Search knowledge bases by name and description. | Always available. |
+| `query_knowledge_files` | Search internal knowledge base files using semantic/vector search. This should be your first choice for finding information before searching the web. | Always available. |
+| `search_knowledge_files` | Search files across accessible knowledge bases by filename. | Always available. |
+| `view_knowledge_file` | Get the full content of a file from a knowledge base. | Always available. |
+| **Image Gen** | | |
+| `generate_image` | Generates a new image based on a prompt (supports `steps`). | `ENABLE_IMAGE_GENERATION` enabled. |
+| `edit_image` | Edits an existing image based on a prompt and URL. | `ENABLE_IMAGE_EDIT` enabled.|
+| **Memory** | | |
+| `search_memories` | Searches the user's personal memory/personalization bank. Supports an optional `count` parameter to specify how many memories to return (default: 5). | Memory feature enabled. |
+| `add_memory` | Stores a new fact in the user's personalization memory. | Memory feature enabled. |
+| `replace_memory_content` | Updates an existing memory record by its unique ID. | Memory feature enabled. |
+| **Notes** | | |
+| `search_notes` | Search the user's notes by title and content. | `ENABLE_NOTES` enabled. |
+| `view_note` | Get the full markdown content of a specific note. | `ENABLE_NOTES` enabled. |
+| `write_note` | Create a new private note for the user. | `ENABLE_NOTES` enabled. |
+| `replace_note_content` | Update an existing note's content or title. | `ENABLE_NOTES` enabled. |
+| **Chat History** | | |
+| `search_chats` | Search across the user's previous conversation history. | Always available. |
+| `view_chat` | Retrieve the full message history of a specific previous chat. | Always available. |
+| **Channels** | | |
+| `search_channels` | Find public or accessible channels by name/description. | `ENABLE_CHANNELS` enabled. |
+| `search_channel_messages` | Search for specific messages inside accessible channels. | `ENABLE_CHANNELS` enabled. |
+| `view_channel_message` | View a specific message or its details in a channel. | `ENABLE_CHANNELS` enabled. |
+| `view_channel_thread` | View a full message thread/replies in a channel. | `ENABLE_CHANNELS` enabled. |
+| **Time Tools** | | |
+| `get_current_timestamp` | Get the current UTC Unix timestamp and ISO date. | Always available. |
+| `calculate_timestamp` | Calculate relative timestamps (e.g., "3 days ago"). | Always available. |
+
+:::info Automatic Timezone Detection
+Open WebUI automatically detects and stores your timezone when you log in. This allows time-related tools and features to provide accurate local times without any manual configuration. Your timezone is determined from your browser settings.
+:::
+
+**Why use these?** It allows for **Deep Research** (searching the web multiple times, or querying knowledge bases), **Contextual Awareness** (looking up previous chats or notes), **Dynamic Personalization** (saving facts), and **Precise Automation** (generating content based on existing notes or documents).
-### 🟢 Native Mode (Built-in Function Calling)
-If your model supports native function calling (like GPT-4o, Gemini, Claude, or GPT-5), use this for a faster, more accurate experience where the LLM decides exactly when and how to call tools.
-- ✅ Fast, accurate, and can chain multiple tools in one response.
-- ❗ Requires a model that explicitly supports tool-calling schemas.
+#### Disabling Builtin Tools (Per-Model)
-| Mode | Who it’s for | Pros | Cons |
-|----------|----------------------------------|-----------------------------------------|--------------------------------------|
-| **Default** | Practically any model (basic/local) | Broad compatibility, safer, flexible | May be less accurate or slower |
-| **Native** | GPT-4o, Gemini, Claude, GPT-5, etc. | Fast, smart, excellent tool chaining | Needs proper function call support |
+The **Builtin Tools** capability can be toggled on or off for each model in the **Workspace > Models** editor under **Capabilities**. When enabled (the default), all the system tools listed above are automatically injected when using Native Mode.
+
+**When to disable Builtin Tools:**
+
+| Scenario | Reason to Disable |
+|----------|-------------------|
+| **Model doesn't support function calling** | Smaller or older models may not handle the `tools` parameter correctly |
+| **Simpler/predictable behavior needed** | You want the model to work only with pre-injected context, no autonomous tool calls |
+| **Security/control concerns** | Prevents the model from actively querying knowledge bases, searching chats, accessing memories, etc. |
+| **Token efficiency** | Tool specifications consume tokens; disabling saves context window space |
+
+**What happens when Builtin Tools is disabled:**
+
+1. **No tool injection**: The model won't receive any of the built-in system tools, even in Native Mode.
+2. **RAG still works** (if File Context is enabled): Attached files are still processed via RAG and injected as context.
+3. **No autonomous retrieval**: The model cannot decide to search knowledge bases or fetch additional information—it works only with what's provided upfront.
+
+:::warning Builtin Tools vs File Context
+**Builtin Tools** controls whether the model gets *tools* for autonomous retrieval. It does **not** control whether file content is injected via RAG—that's controlled by the separate **File Context** capability.
+
+- **File Context** = Whether Open WebUI extracts and injects file content (RAG processing)
+- **Builtin Tools** = Whether the model gets tools to autonomously search/retrieve additional content
+
+See [File Context vs Builtin Tools](../../rag/index.md#file-context-vs-builtin-tools) for a detailed comparison.
+:::
+
+### Interleaved Thinking {#interleaved-thinking}
+
+🧠 When using **Native Mode (Agentic Mode)**, high-tier models can engage in **Interleaved Thinking**. This is a powerful "Thought → Action → Thought → Action → Thought → ..." loop where the model can reason about a task, execute one or more tools, evaluate the results, and then decide on its next move.
+
+:::info Quality Models Required
+Interleaved thinking requires models with strong reasoning capabilities. This feature works best with frontier models (GPT-5, Claude 4.5+, Gemini 3+) that can maintain context across multiple tool calls and make intelligent decisions about which tools to use when.
+:::
+
+This is fundamentally different from a single-shot tool call. In an interleaved workflow, the model follows a cycle:
+1. **Reason**: Analyze the user's intent and identify information gaps.
+2. **Act**: Call a tool (e.g., `query_knowledge_files` for internal docs or `search_web` and `fetch_url` for web research).
+3. **Think**: Read the tool's output and update its internal understanding.
+4. **Iterate**: If the answer isn't clear, call another tool (e.g., `view_knowledge_file` to read a specific document or `fetch_url` to read a specific page) or refine the search.
+5. **Finalize**: Only after completing this "Deep Research" cycle does the model provide a final, grounded answer.
+
+This behavior is what transforms a standard chatbot into an **Agentic AI** capable of solving complex, multi-step problems autonomously.
+
+---
---
diff --git a/docs/features/rag/document-extraction/apachetika.md b/docs/features/rag/document-extraction/apachetika.md
index af00ab6bab..24039f74b9 100644
--- a/docs/features/rag/document-extraction/apachetika.md
+++ b/docs/features/rag/document-extraction/apachetika.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 4000
+sidebar_position: 2
title: "Apache Tika Extraction"
---
diff --git a/docs/features/rag/document-extraction/docling.md b/docs/features/rag/document-extraction/docling.md
index 2f6e50e8d3..014aa9a354 100644
--- a/docs/features/rag/document-extraction/docling.md
+++ b/docs/features/rag/document-extraction/docling.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 4000
+sidebar_position: 3
title: "Docling Document Extraction"
---
diff --git a/docs/features/rag/document-extraction/index.md b/docs/features/rag/document-extraction/index.md
index edbf53ab41..74e5e15ac1 100644
--- a/docs/features/rag/document-extraction/index.md
+++ b/docs/features/rag/document-extraction/index.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 6
+sidebar_position: 1
title: "Document Extraction"
---
diff --git a/docs/features/rag/document-extraction/mistral-ocr.md b/docs/features/rag/document-extraction/mistral-ocr.md
index e5b6643964..b5d5cdcf26 100644
--- a/docs/features/rag/document-extraction/mistral-ocr.md
+++ b/docs/features/rag/document-extraction/mistral-ocr.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 4000
+sidebar_position: 4
title: "Mistral OCR"
---
diff --git a/docs/features/rag/index.md b/docs/features/rag/index.md
index e3e838e6f7..816c1b7d96 100644
--- a/docs/features/rag/index.md
+++ b/docs/features/rag/index.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 200
+sidebar_position: 1
title: "Retrieval Augmented Generation (RAG)"
---
@@ -39,6 +39,101 @@ Web pages often contain extraneous information such as navigation and footer. Fo
Customize the RAG template from the `Admin Panel` > `Settings` > `Documents` menu.
+## Markdown Header Splitting
+
+When enabled, documents are first split by markdown headers (H1-H6). This preserves document structure and ensures that sections under the same header are kept together when possible. The resulting chunks are then further processed by the standard character or token splitter.
+
+:::tip
+
+Use the **Chunk Min Size Target** setting (found in **Admin Panel > Settings > Documents**) to intelligently merge small sections after markdown splitting, improving retrieval coherence and reducing the total number of vectors in your database.
+
+:::
+
+## Chunking Configuration
+
+Open WebUI allows you to fine-tune how documents are split into chunks for embedding. This is crucial for optimal retrieval performance.
+
+- **Chunk Size**: Sets the maximum number of characters (or tokens) per chunk.
+- **Chunk Overlap**: Specifies how much content is shared between adjacent chunks to maintain context.
+- **Chunk Min Size Target**: Although [Markdown Header Splitting](#markdown-header-splitting) is excellent for preserving structure, it can often create tiny, fragmented chunks (e.g., a standalone sub-header, a table of contents entry, a single-sentence paragraph, or a short list item) that lack enough semantic context for high-quality embedding. You can counteract this by setting the **Chunk Min Size Target** to intelligently merge these small pieces with their neighbors.
+
+### Why use a Chunk Min Size Target?
+
+Intelligently merging small sections after markdown splitting provides several key advantages:
+
+- **Improves RAG Quality**: Eliminates tiny, meaningless fragments, ensuring better semantic coherence in each retrieve chunk.
+- **Reduces Vector Database Size**: Fewer chunks mean fewer vectors to store, reducing storage costs and memory usage.
+- **Speeds Up Retrieval & Embedding**: A smaller index is faster to search, and fewer chunks require fewer embedding API calls (or less local compute). This significantly accelerates document processing when uploading files to chats or knowledge bases, as there is less data to vectorize.
+- **Efficiency & Impact**: Testing has shown that a well-configured threshold (e.g., 1000 for a chunk size of 2000) can reduce chunk counts by over 90% while **improving accuracy**, increasing embedding speed, and enhancing overall retrieval quality by maintaining semantic context.
+
+
+How the merging algorithm works (technical details)
+
+For most users, the explanation above is all you need: small chunks get merged with their neighbors, resulting in better retrieval with fewer vectors and other performance, cost and storage benefits. But if you're curious about the exact logic and design rationale, here's how it works under the hood.
+
+### Why header-based splitting needs merging
+
+Markdown header splitting is one of the better structural approaches to chunking because headers are explicit semantic boundaries placed by the document author. You're leveraging human judgment about where one topic ends and another begins, which usually produces more coherent chunks than fixed-size windowing that might cut mid-paragraph or mid-thought.
+
+However, real documents often have structural quirks: tables of contents, short introductory sections, single-sentence paragraphs under their own headers, or deeply nested subheadings with minimal content. These produce tiny chunks that cause problems:
+
+- They lack sufficient context to be useful when retrieved in isolation
+- They can produce noisy retrieval results (matching on limited signal but contributing nothing useful)
+- Very short texts sometimes embed less reliably
+- They waste vector storage and slow down retrieval
+- Many chunks take longer to embed than fewer chunks (with the same total content)
+- More embedding operations means more API calls (cost) or more local compute
+
+The merging algorithm addresses this by intelligently combining undersized chunks while respecting document structure and size limits.
+
+### The algorithm: a single forward pass
+
+The merging logic is deliberately simple—a single forward pass through all chunks:
+
+1. Start with the first chunk as the "current" accumulator.
+2. For each **subsequent** chunk, check if it can be absorbed into the current chunk.
+3. A chunk can be absorbed if **all three conditions** are met:
+ - The current accumulated content is still below `CHUNK_MIN_SIZE_TARGET`
+ - Merging wouldn't exceed `CHUNK_SIZE` (the maximum)
+ - Both chunks belong to the same source document
+4. If absorption is possible, merge them (with `\n\n` separation to preserve visual structure) and continue checking the next chunk.
+5. If absorption isn't possible, finalize the current chunk and start fresh with the next one as the new accumulator.
+6. Repeat until all chunks are processed.
+
+**Key point**: The size check is on the *accumulated* content, not individual chunks. This means multiple consecutive tiny chunks (like a table of contents with many small entries) will keep folding together until the combined size reaches the threshold or until merging the next chunk would exceed the maximum.
+
+### Design decisions and why they matter
+
+**Forward-only merging**: Small chunks always merge into the *next* chunk, never backward. This keeps the logic simple and predictable, and preserves the natural "this section introduces what follows" relationship common in documents. A brief intro section merging forward into the content it introduces makes semantic sense.
+
+**Why not backward merging?** Beyond added code complexity, backward merging would frequently fail anyway. By the time any chunk gets finalized, it's in one of two states: either it grew to meet or exceed `CHUNK_MIN_SIZE_TARGET` through absorption (so it's already "satisfied" with limited headroom), or it couldn't absorb the next chunk because that would exceed `CHUNK_SIZE` (so it's already bumping against the ceiling). Either way, a backward merge attempt would often fail the size check, meaning you'd add branching logic and state tracking for something that rarely succeeds.
+
+**No cross-document merging**: Chunks from different source files are never combined, even if both are small. This preserves clear document boundaries for citation, source attribution, and retrieval context.
+
+**Respects maximum size**: If merging two chunks would exceed `CHUNK_SIZE`, both are kept separate. Content is never discarded to force a merge.
+
+**Metadata inheritance**: Merged chunks inherit metadata from the *first* chunk in the merge sequence. This is consistent with forward-merge semantics—source and header information reflects where the merged section "started," which is typically the right choice for retrieval and citation purposes.
+
+**The `\n\n` separator**: When chunks merge, they're joined with double newlines rather than concatenated directly. This preserves visual and structural separation in the combined text, which can matter for both embedding quality and human readability if you inspect your chunks.
+
+### Edge cases
+
+**Consecutive tiny chunks**: Handled naturally. They keep accumulating into a single chunk until the threshold is met or max size would be exceeded.
+
+**Small chunk followed by large chunk**: If a small chunk is followed by a chunk large enough that merging would exceed `CHUNK_SIZE`, the small chunk gets finalized as-is, still undersized. This is unavoidable without backward merging or content splitting, but it's also rare in practice. It typically occurs at natural semantic boundaries (a brief transition before a dense section), and the small chunk being standalone at that boundary is arguably correct anyway.
+
+**Last chunk in document**: If the final chunk is undersized, it stays undersized since there's nothing to merge forward into. Again, unavoidable and usually fine—document endings are natural boundaries.
+
+### Performance characteristics
+
+The algorithm is O(n) in the number of chunks—a single pass with no lookahead or backtracking. This makes it fast even for large document collections.
+
+The efficiency gains from merging scale non-linearly in some ways. Retrieval over 45 vectors versus 588 isn't just ~13x faster in raw compute—you're also getting much cleaner top-k results because you've eliminated the noise of near-empty chunks that might score well on partial keyword matches but contribute nothing useful to the LLM. The quality improvement often matters more than the speed improvement.
+
+Testing has shown that a well-configured threshold (e.g., 1000 for a chunk size of 2000) can reduce chunk counts by over 90% while improving retrieval accuracy, because each remaining chunk carries meaningful semantic context rather than being a fragment that confuses both the embedding model and the retrieval ranking. As positive side effects, it also uses less storage space in the vector database and requires fewer embedding operations, which can be a significant cost saving if outsourcing to an embedding service.
+
+
+
## RAG Embedding Support
Change the RAG embedding model directly in the `Admin Panel` > `Settings` > `Documents` menu. This feature supports Ollama and OpenAI models, enabling you to enhance document processing according to your requirements.
@@ -47,10 +142,83 @@ Change the RAG embedding model directly in the `Admin Panel` > `Settings` > `Doc
The RAG feature allows users to easily track the context of documents fed to LLMs with added citations for reference points. This ensures transparency and accountability in the use of external sources within your chats.
+## File Context vs Builtin Tools
+
+Open WebUI provides two separate capabilities that control how files are handled. Understanding the difference is important for configuring models correctly.
+
+### File Context Capability
+
+The **File Context** capability controls whether Open WebUI performs RAG (Retrieval-Augmented Generation) on attached files:
+
+| File Context | Behavior |
+|--------------|----------|
+| ✅ **Enabled** (default) | Attached files are processed via RAG. Content is retrieved and injected into the conversation context. |
+| ❌ **Disabled** | File processing is **completely skipped**. No content extraction, no injection. The model receives no file content. |
+
+**When to disable File Context:**
+- **Bypassing RAG entirely**: When you don't want Open WebUI to process attached files at all.
+- **Using Builtin Tools only**: If you prefer the model to retrieve file content on-demand via tools like `query_knowledge_bases` rather than having content pre-injected.
+- **Debugging/testing**: To isolate whether issues are related to RAG processing.
+
+:::warning File Context Disabled = No Pre-Injected Content
+When File Context is disabled, file content is **not automatically extracted or injected**. Open WebUI does not forward files to the model's native API. If you disable this, the only way the model can access file content is through builtin tools (if enabled) that query knowledge bases or retrieve attached files on-demand (agentic file processing).
+:::
+
+:::info
+The File Context toggle only appears when **File Upload** is enabled for the model.
+:::
+
+### Builtin Tools Capability
+
+The **Builtin Tools** capability controls whether the model receives native function-calling tools for autonomous retrieval:
+
+| Builtin Tools | Behavior |
+|---------------|----------|
+| ✅ **Enabled** (default) | In Native Function Calling mode, the model receives tools like `query_knowledge_bases`, `view_knowledge_file`, `search_chats`, etc. |
+| ❌ **Disabled** | No builtin tools are injected. The model works only with pre-injected context. |
+
+**When to disable Builtin Tools:**
+- **Model doesn't support function calling**: Smaller or older models may not handle the `tools` parameter.
+- **Predictable behavior needed**: You want the model to work only with what's provided upfront.
+
+### Combining the Two Capabilities
+
+These capabilities work independently, giving you fine-grained control:
+
+| File Context | Builtin Tools | Result |
+|--------------|---------------|--------|
+| ✅ Enabled | ✅ Enabled | **Full Agentic Mode**: RAG content injected + model can autonomously query knowledge bases |
+| ✅ Enabled | ❌ Disabled | **Traditional RAG**: Content injected upfront, no autonomous retrieval tools |
+| ❌ Disabled | ✅ Enabled | **Tools-Only Mode**: No pre-injected content, but model can use tools to query knowledge bases or retrieve attached files on-demand |
+| ❌ Disabled | ❌ Disabled | **No File Processing**: Attached files are ignored, no content reaches the model |
+
+:::tip Choosing the Right Configuration
+- **Most models**: Keep both enabled (defaults) for full functionality.
+- **Small/local models**: Disable Builtin Tools if they don't support function calling.
+- **On-demand retrieval only**: Disable File Context, enable Builtin Tools if you want the model to decide what to retrieve rather than pre-injecting everything.
+:::
+
## Enhanced RAG Pipeline
The togglable hybrid search sub-feature for our RAG embedding feature enhances RAG functionality via `BM25`, with re-ranking powered by `CrossEncoder`, and configurable relevance score thresholds. This provides a more precise and tailored RAG experience for your specific use case.
+## KV Cache Optimization (Performance Tip) 🚀
+
+For professional and high-performance use cases—especially when dealing with long documents or frequent follow-up questions—you can significantly improve response times by enabling **KV Cache Optimization**.
+
+### The Problem: Cache Invalidation
+By default, Open WebUI injects retrieved RAG context into the **user message**. As the conversation progresses, follow-up messages shift the position of this context in the chat history. For many LLM engines—including local engines (like Ollama, llama.cpp, and vLLM) and cloud providers / Model-as-a-Service providers (like OpenAI and Vertex AI)—this shifting position invalidates the **KV (Key-Value) prefix cache** or **Prompt Cache**, forcing the model to re-process the entire context for every single response. This leads to increased latency and potentially higher costs as the conversation grows.
+
+### The Solution: `RAG_SYSTEM_CONTEXT`
+You can fix this behavior by enabling the `RAG_SYSTEM_CONTEXT` environment variable.
+
+- **How it works**: When `RAG_SYSTEM_CONTEXT=True`, Open WebUI injects the RAG context into the **system message** instead of the user message.
+- **The Result**: Since the system message stays at the absolute beginning of the prompt and its position never changes, the provider can effectively cache the processed context. Follow-up questions then benefit from **instant responses** and **cost savings** because the "heavy lifting" (processing the large RAG context) is only done once.
+
+:::tip recommended configuration
+If you are using **Ollama**, **llama.cpp**, **OpenAI**, or **Vertex AI** and frequently "chat with your documents," set `RAG_SYSTEM_CONTEXT=True` in your environment to experience drastically faster follow-up responses!
+:::
+
## YouTube RAG Pipeline
The dedicated RAG pipeline for summarizing YouTube videos via video URLs enables smooth interaction with video transcriptions directly. This innovative feature allows you to incorporate video content into your chats, further enriching your conversation experience.
diff --git a/docs/features/rbac/groups.md b/docs/features/rbac/groups.md
index 09e3ee1e72..7b9d70c4a8 100644
--- a/docs/features/rbac/groups.md
+++ b/docs/features/rbac/groups.md
@@ -22,21 +22,22 @@ Groups can be managed in the **Admin Panel > Groups** section.
### Group Configuration
When creating or editing a group, you can configure its visibility in the system:
-* **Allow Group Sharing**: (Default: **On**)
- * **Enabled**: The group will appear in the "Access Control" dropdowns when users share Chat items, Models, or Knowledge lists. Use this for teams or project groups that need to collaborate on content.
- * **Disabled**: The group is **hidden** from sharing menus. This is designed for groups used solely for **RBAC Permission assignment** (e.g., granting "Image Generation" rights). Hiding these prevents the Sharing UI from becoming cluttered with technical/administrative groups.
+* **Who can share to this group**: (Access Control setting)
+ * **Anyone**: (Default) Any user on the platform can see this group in the "Access Control" menus for Sharing and share Chat items, Models, Prompts, or Knowledge Bases to it.
+ * **Members**: Only users who are **already members** of this group will see it as an option in the "Access Control" menus for Sharing. This is the ideal setting for private team collaboration (e.g., a "Marketing" team), ensuring only teammates can share resources (Models, Prompts, Knowledge) with each other.
+ * **No one**: The group is completely **hidden** from sharing menus for non-administrators. This is perfect for technical groups used exclusively for **RBAC Permission assignment** (e.g., a "High-Tier Users" group) where content sharing is not required.
:::tip Strategy: Permission Groups vs. Sharing Groups
To maintain a clean and manageable system, consider separating your groups into two distinct categories using a naming scheme:
1. **Permission Groups** (e.g., prefix `[Perms]`, `Role-`, or `P-`)
* **Purpose**: Exclusively for granting features (e.g., `[Perms] Image Gen`, `[Perms] Web Search`).
- * **Config**: Disable **Allow Group Sharing**.
+ * **Config**: Set "Who can share" to **No one**.
* **Result**: Users get the features they need, but these technical groups don't clutter the "Share" menu.
2. **Sharing Groups** (e.g., prefix `Team-`, `Project-`, or normal names)
* **Purpose**: Exclusively for organizing people (e.g., `Marketing`, `Engineering`, `Team Alpha`) to share resources.
- * **Config**: Enable **Allow Group Sharing**.
+ * **Config**: Set "Who can share" to **Members** or **Anyone**.
* **Best Practice**: **Disable all permissions** in these groups.
* Rely on *Global Default Permissions* (or separate *Permission Groups*) for feature rights.
* *Why?* This ensures painless **Permission Revocation**. If you decide to disable a feature (e.g., "Web Search") globally, it will instantly take effect for everyone. If your Sharing Groups had "Web Search" enabled, you would have to manually update every single group to remove the right, as the Group's `True` status would override the Global `False`. Keep functional groups clean to maintain Global control.
diff --git a/docs/features/rbac/roles.md b/docs/features/rbac/roles.md
index bc6817196a..d2f646c3f9 100644
--- a/docs/features/rbac/roles.md
+++ b/docs/features/rbac/roles.md
@@ -47,6 +47,7 @@ The `admin` role effectively has `check_permission() == True` for everything. Gr
### Initial Setup
* **First User:** The very first account created on a fresh installation is automatically assigned the **Admin** role.
+* **Headless Admin Creation:** For automated/containerized deployments, you can create the admin account automatically using environment variables (see below).
* **Subsequent Users:** New sign-ups are assigned the **Default User Role**.
### Configuration
@@ -65,3 +66,97 @@ Administrators can change a user's role at any time via **Admin Panel > Users**.
* Promoting a user to `admin` grants them full control.
* Demoting an admin to `user` subjects them to the permission system again.
+## Headless Admin Account Creation
+
+For **automated deployments** (Docker, Kubernetes, cloud platforms) where manual interaction is impractical, Open WebUI supports creating an admin account automatically on first startup using environment variables.
+
+### How It Works
+
+When the following environment variables are configured:
+- `WEBUI_ADMIN_EMAIL`: The admin account email address
+- `WEBUI_ADMIN_PASSWORD`: The admin account password
+- `WEBUI_ADMIN_NAME`: (Optional) The admin display name (defaults to "Admin")
+
+Open WebUI will automatically:
+1. Check if any users exist in the database on startup
+2. If the database is empty (fresh installation), create an admin account using the provided credentials
+3. Securely hash and store the password
+4. Automatically disable sign-up (`ENABLE_SIGNUP=False`) to prevent unauthorized account creation
+
+### Use Cases
+
+This feature is particularly useful for:
+- **CI/CD Pipelines**: Automatically provision Open WebUI instances with admin credentials from secrets management
+- **Docker/Kubernetes Deployments**: Eliminate the time gap between container startup and manual admin creation
+- **Automated Testing**: Create reproducible test environments with pre-configured admin accounts
+- **Headless Servers**: Deploy instances where accessing the UI to manually create an account is impractical
+
+### Example Configuration
+
+#### Docker Compose
+```yaml
+services:
+ open-webui:
+ image: ghcr.io/open-webui/open-webui:main
+ environment:
+ - WEBUI_ADMIN_EMAIL=admin@example.com
+ - WEBUI_ADMIN_PASSWORD=${ADMIN_PASSWORD} # Use secrets management
+ - WEBUI_ADMIN_NAME=System Administrator
+ # ... other configuration
+```
+
+#### Kubernetes Secret
+```yaml
+apiVersion: v1
+kind: Secret
+metadata:
+ name: openwebui-admin
+type: Opaque
+stringData:
+ admin-email: admin@example.com
+ admin-password: your-secure-password
+ admin-name: System Administrator
+---
+apiVersion: v1
+kind: Pod
+metadata:
+ name: open-webui
+spec:
+ containers:
+ - name: open-webui
+ image: ghcr.io/open-webui/open-webui:main
+ env:
+ - name: WEBUI_ADMIN_EMAIL
+ valueFrom:
+ secretKeyRef:
+ name: openwebui-admin
+ key: admin-email
+ - name: WEBUI_ADMIN_PASSWORD
+ valueFrom:
+ secretKeyRef:
+ name: openwebui-admin
+ key: admin-password
+ - name: WEBUI_ADMIN_NAME
+ valueFrom:
+ secretKeyRef:
+ name: openwebui-admin
+ key: admin-name
+```
+
+### Important Notes
+
+:::warning Security Considerations
+- **Use Secrets Management**: Never hardcode `WEBUI_ADMIN_PASSWORD` in Docker Compose files or Dockerfiles. Use Docker secrets, Kubernetes secrets, or environment variable injection.
+- **Strong Passwords**: Use a strong, unique password for production deployments.
+- **Change After Setup**: Consider changing the admin password through the UI after initial deployment for enhanced security.
+- **Automatic Signup Disable**: After admin creation, sign-up is automatically disabled. You can re-enable it later via **Admin Panel > Settings > General** if needed.
+:::
+
+:::info Behavior Details
+- **Only on Fresh Install**: The admin account is created **only** if no users exist in the database. If users already exist, these environment variables are ignored.
+- **Password Hashing**: The password is securely hashed using the same mechanism as manual account creation, ensuring security.
+- **One-Time Operation**: This is a one-time operation on first startup. Subsequent restarts with the same environment variables will not modify the existing admin account.
+:::
+
+For complete documentation on these environment variables, see the [Environment Configuration Guide](../../getting-started/env-configuration.mdx#webui_admin_email).
+
diff --git a/docs/features/web-search/agentic-search.mdx b/docs/features/web-search/agentic-search.mdx
new file mode 100644
index 0000000000..a4fdbdeeb2
--- /dev/null
+++ b/docs/features/web-search/agentic-search.mdx
@@ -0,0 +1,147 @@
+---
+sidebar_position: 1
+title: "Agentic Search & URL Fetching"
+---
+
+# Agentic Web Search & URL Fetching 🌐
+
+Open WebUI's web search has evolved from simple result injection to a fully **agentic research system**. By enabling **Native Function Calling (Agentic Mode)**, you allow quality models to independently explore the web, verify facts, and follow links autonomously.
+
+:::tip Quality Models Required
+Agentic web search works best with frontier models like **GPT-5**, **Claude 4.5+**, **Gemini 3+**, or **MiniMax M2.1** that can reason about search results and decide when to dig deeper. Small local models may struggle with the multi-step reasoning required.
+:::
+
+:::info Central Tool Documentation
+For comprehensive information about all built-in agentic tools (including web search, knowledge bases, memory, and more), see the [**Native/Agentic Mode Tools Guide**](/features/plugin/tools#built-in-system-tools-nativeagentic-mode).
+:::
+
+## Native Mode vs. Traditional RAG
+
+| Feature | Traditional RAG (Default) | Agentic Search (Native Mode) |
+|---------|---------------------------|------------------------------|
+| **Search Decision** | Open WebUI decides based on prompt analysis. | The **Model** decides if and when it needs to search. |
+| **Data Processing** | Fetches ALL results, chunks them, and performs **RAG**. | Returns **Snippets** directly; no chunking or Vector DB. |
+| **Link Following** | Snippets from top results are injected. | Model uses `fetch_url` to read a **Full Page** directly. |
+| **Model Context** | Only gets relevant fragments (Top-K chunks). | Gets the **whole text** (up to ~50k chars) via `fetch_url`. |
+| **Reasoning** | Model processes data *after* system injection. | Model can search, read, check, and search again. |
+
+## How to Enable Agentic Behavior
+
+To unlock these features, your model must support native tool calling and have strong reasoning capabilities (e.g., GPT-5, Claude 4.5 Sonnet, Gemini 3 Flash, MiniMax M2.1). Administrator-level configuration for these built-in system tools is handled via the [**Central Tool Calling Guide**](/features/plugin/tools#tool-calling-modes-default-vs-native).
+
+1. **Enable Web Search**: Ensure a search engine is configured in **Admin Panel > Settings > Web Search**.
+2. **Enable Native Mode (Agentic Mode)**:
+ * Go to **Admin Panel > Settings > Models**.
+ * Navigate to **Model Specific Settings** for your target model.
+ * Under **Advanced Parameters**, set **Function Calling** to `Native`.
+3. **Use a Quality Model**: Ensure you're using a frontier model with strong reasoning capabilities for best results.
+4. **Chat Features**: Ensure the **Web Search** feature is toggled **ON** for your chat session.
+
+## How Native Tools Handle Data (Agentic Mode)
+🔗 It is important to understand that Native Mode (Agentic Mode) works fundamentally differently from the global "Web Search" toggle found in standard models.
+
+### `search_web` (Snippets only)
+When the model invokes `search_web`:
+* **Action**: It queries your search engine and receives a list of titles, links, and snippets.
+* **No RAG**: Unlike traditional search, **no data is stored in a Vector DB**. No chunking or embedding occurs.
+* **Result**: The model sees exactly what a human sees on a search results page. If the snippet contains the answer, the model responds. If not, the model must decide to "deep dive" into a link.
+
+### `fetch_url` (Full Page Context)
+If the model determines that a search snippet is insufficient, it will call `fetch_url`:
+* **Direct Access**: The tool visits the specific URL and extracts the main text using your configured **Web Loader**.
+* **Raw Context**: The extracted text is injected **directly into the model's context window** (hard-coded truncation at exactly 50,000 characters to prevent context overflow).
+* **Agentic Advantage**: Because it doesn't use RAG, the model has the "full picture" of the page rather than isolated fragments. This allows it to follow complex instructions on specific pages (e.g., "Summarize the technical specifications table from this documentation link").
+
+:::tip
+By keeping `search_web` and `fetch_url` separate and RAG-free, the model acts as its own **Information Retrieval** agent, choosing exactly which sources are worth reading in full.
+:::
+
+## Deep Research & Interleaved Thinking 🧠
+
+Because the model can call `search_web` multiple times and decide autonomously when to dive deeper, it can perform genuine "Deep Research" using **Interleaved Thinking**. This creates a powerful research loop where the model acts as its own research assistant.
+
+### How Interleaved Thinking Works
+
+Interleaved Thinking is the ability for models to alternate between **reasoning** and **action** in a continuous cycle. Instead of searching once and answering, the model can:
+
+
+📖 Detailed Research Cycle Example
+
+**User asks:** "What are the latest security vulnerabilities in the React ecosystem?"
+
+**Step 1: Initial Analysis 🤔**
+- Model thinks: "I need current information about React security. Let me start with a broad search."
+
+**Step 2: First Search 🔍**
+```
+Model calls: search_web("React security vulnerabilities 2026")
+```
+- Receives snippets from 5-10 search results
+- Sees references to npm packages and CVE databases
+
+**Step 3: Gap Analysis 🤔**
+- Model thinks: "The snippets mention CVEs but don't have enough detail. I should fetch a specific security advisory page."
+
+**Step 4: Targeted Fetch 📄**
+```
+Model calls: fetch_url("https://example.com/react-security-advisory-2026")
+```
+- Reads full 50k characters of detailed security documentation
+- Identifies specific vulnerability types (XSS, injection, etc.)
+
+**Step 5: Refinement 🤔**
+- Model thinks: "I found server-side vulnerabilities. Let me search specifically for client-side issues."
+
+**Step 6: Refined Search 🔍**
+```
+Model calls: search_web("React client-side XSS CVE 2026")
+```
+- Gets more specific results about cross-site scripting issues
+- Identifies mitigation strategies
+
+**Step 7: Synthesis 🤔**
+- Model thinks: "I now have comprehensive information from multiple authoritative sources. Time to synthesize."
+
+**Step 8: Final Answer ✅**
+- Provides detailed, grounded response citing multiple sources
+- Includes specific CVE numbers, affected versions, and mitigation steps
+
+
+
+### The Agentic Research Loop
+
+The model continuously cycles through these phases until it has sufficient information:
+
+1. **🤔 THINK**: Analyze current knowledge gaps and determine what information is missing
+2. **🔍 ACT**: Search the web or fetch specific URLs to gather relevant content
+3. **📊 EVALUATE**: Assess the quality and completeness of the information retrieved
+4. **❓ DECIDE**: Determine if more research is needed or if enough context has been gathered
+5. **🔄 ITERATE**: If gaps remain, return to step 1 with refined focus and more specific queries
+6. **✅ SYNTHESIZE**: Once sufficient information is gathered, compile and present the final answer
+
+This cycle repeats autonomously until the model has comprehensive, verified information to answer your question with high confidence.
+
+### Key Advantages
+
+**🎯 Adaptive Precision**: The model doesn't just search once and accept whatever results appear. Instead, it continuously refines its search strategy based on what it discovers. If initial broad searches return surface-level information, the model automatically pivots to more specific technical terms, product names, version numbers, or specialized terminology. Each iteration becomes progressively more targeted, drilling down from general concepts to specific details, ensuring the final answer is both comprehensive and precise.
+
+**🔗 Deep Link Following & Discovery**: Unlike traditional RAG systems that only use search result snippets, the model can read full pages when snippets aren't sufficient. Even more powerfully, **when the model uses `fetch_url` to read a page, it can discover and follow new URLs mentioned within that content**. For example, if a fetched page references a technical specification document, an official changelog, or a related research paper, the model can autonomously call `fetch_url` again on those discovered URLs to dive even deeper. This creates a natural "web browsing" behavior where the model follows citation chains, explores linked resources, and builds a comprehensive understanding by reading multiple interconnected sources—just like a human researcher would.
+
+**✅ Fact Verification & Cross-Referencing**: The model can autonomously verify information by cross-referencing multiple independent sources. If one source makes a claim, the model can search for corroborating evidence from authoritative sources, compare version numbers across official documentation, or validate facts against primary sources. This multi-source verification significantly reduces hallucination and increases answer reliability, as the model builds confidence by finding consistent information across diverse, credible sources.
+
+**🧩 Intelligent Gap Filling**: If initial search results miss key information or only partially address the question, the model identifies these gaps and automatically conducts follow-up searches with different terms, alternative phrasings, or more specific queries. For example, if searching for "React performance issues" doesn't yield information about a specific optimization technique, the model might refine its search to "React useMemo optimization" or "React.memo vs useMemo comparison" to fill the knowledge gap. This ensures comprehensive coverage of complex topics that might require multiple search angles.
+
+**🌐 Multi-Source Synthesis**: The model doesn't just return information from a single source—it synthesizes insights from multiple web pages, documentation sites, forums, and articles into a coherent, well-rounded answer. This synthesis provides broader context, acknowledges different perspectives, and presents a more complete picture than any single source could provide.
+
+**📚 Context-Aware Source Selection**: The model intelligently decides whether to rely on search snippets (when they contain sufficient information) or to fetch full pages (when deeper detail is needed). It can also determine when to stop researching—avoiding unnecessary tool calls while ensuring thoroughness. This balance between efficiency and comprehensiveness makes agentic search both fast and reliable.
+
+This iterative loop of **Thought → Action → Thought** continues until the model has sufficient information to answer your request with maximum accuracy.
+
+:::info Learn More About Interleaved Thinking
+For more details on how Interleaved Thinking works across all agentic tools (not just web search), see the [**Interleaved Thinking Guide**](/features/plugin/tools#interleaved-thinking).
+:::
+
+## Next Steps
+
+- **Save your findings**: Learn how to [save web search results directly to your Knowledge Base](./save-to-knowledge).
+- **Troubleshoot**: If you encounter issues, check the [Web Search Troubleshooting Guide](../../troubleshooting/web-search).
diff --git a/docs/features/web-search/bing.md b/docs/features/web-search/bing.md
index 39ee125dd8..76f99d302d 100644
--- a/docs/features/web-search/bing.md
+++ b/docs/features/web-search/bing.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 1
+sidebar_position: 2
title: "Bing"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
:::warning
Bing Search APIs will be retired on 11th August 2025. New deployments are not supported.
diff --git a/docs/features/web-search/brave.md b/docs/features/web-search/brave.md
index e4a9deaa01..40c6f96990 100644
--- a/docs/features/web-search/brave.md
+++ b/docs/features/web-search/brave.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 2
+sidebar_position: 3
title: "Brave"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
## Brave API
### Docker Compose Setup
@@ -31,3 +37,35 @@ services:
RAG_WEB_SEARCH_RESULT_COUNT: 3
RAG_WEB_SEARCH_CONCURRENT_REQUESTS: 1
```
+
+### Rate Limiting (Free Tier)
+
+Brave's free tier API enforces a strict limit of **1 request per second**. If your LLM generates multiple search queries (which is common), you may encounter HTTP 429 "Too Many Requests" errors.
+
+**Recommended configuration for free tier users:**
+
+- Set `RAG_WEB_SEARCH_CONCURRENT_REQUESTS: 1` to ensure requests are processed sequentially rather than in parallel.
+
+**Automatic retry behavior:**
+
+Open WebUI automatically handles 429 rate limit responses from the Brave API. When a rate limit error is received, the system will:
+
+1. Wait 1 second (respecting Brave's rate limit)
+2. Retry the request once
+3. Only fail if the retry also returns an error
+
+This means that even if your connection is fast enough to send multiple sequential requests within a second, the automatic retry mechanism should recover gracefully without user intervention.
+
+:::tip
+If you are on Brave's paid tier with higher rate limits, you can increase `RAG_WEB_SEARCH_CONCURRENT_REQUESTS` for faster parallel searches.
+:::
+
+:::info Understanding Concurrency & Rate Limits
+
+The `RAG_WEB_SEARCH_CONCURRENT_REQUESTS` setting controls concurrency **per individual search request**, not globally across the entire application.
+
+- **When this is NOT an issue**: For single-user instances or low-traffic setups where users rarely hit "Enter" at the exact same second, setting concurrency to `1` is usually sufficient to stay within the Free Tier limits (1 req/sec).
+- **When this IS an issue**: If multiple users trigger web searches at the exact same moment (e.g., 3 users searching in the same second), Open WebUI will process these requests in parallel. Each user's request creates its own connection pool, meaning 3 requests will be sent to the API simultaneously, triggering a rate limit error on the Free Tier.
+
+**Note:** If you are running an environment with multiple concurrent users actively using web search, it is highly recommended to upgrade to a paid API tier. The Free Tier is not designed to support the throughput of a multi-user deployment.
+:::
diff --git a/docs/features/web-search/ddgs.mdx b/docs/features/web-search/ddgs.mdx
index 9b58c1db70..8618d052fe 100644
--- a/docs/features/web-search/ddgs.mdx
+++ b/docs/features/web-search/ddgs.mdx
@@ -1,5 +1,5 @@
---
-sidebar_position: 3
+sidebar_position: 4
title: "DDGS"
---
@@ -15,16 +15,37 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
-## DDGS (Dux Distributed Global Search - previously DuckDuckGo)
+:::tip Troubleshooting
-### Setup
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
-:::note
+:::
-DDGS (Dux Distributed Global Search) was previously DuckDuckGo.
-It is now a metasearch engine.
-This part of the docs needs an update.
+## DDGS (Dux Distributed Global Search)
-Know how to set it up? Submit a PR on [GitHub](https://github.com/open-webui/docs) to edit this page!
+DDGS is a metasearch engine that allows you to search multiple providers through a single interface.
-:::
+### Setup
+
+1. Enable **Web Search** in the Admin Settings.
+2. Select **DDGS** as the **Web Search Engine**.
+3. Choose your desired **DDGS Backend** from the dropdown menu.
+
+#### DDGS Backend Options
+- **Auto (Random)**: Randomly selects from available providers.
+- **Bing**
+- **Brave**
+- **DuckDuckGo**
+- **Google**
+- **Grokipedia**
+- **Mojeek**
+- **Wikipedia**
+- **Yahoo**
+- **Yandex**
+
+#### Environment Variable
+You can also configure the backend using the `DDGS_BACKEND` environment variable.
+
+```bash
+DDGS_BACKEND="google"
+```
diff --git a/docs/features/web-search/exa.md b/docs/features/web-search/exa.md
index b91ddb1916..9281ed9309 100644
--- a/docs/features/web-search/exa.md
+++ b/docs/features/web-search/exa.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 4
+sidebar_position: 5
title: "Exa AI"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
# Exa AI Web Search Integration
This guide provides instructions on how to integrate [Exa AI](https://exa.ai/), a modern AI-powered search engine, with Open WebUI for web search capabilities.
diff --git a/docs/features/web-search/external.md b/docs/features/web-search/external.md
index 23a0776a3a..18c4b6a1af 100644
--- a/docs/features/web-search/external.md
+++ b/docs/features/web-search/external.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 17
+sidebar_position: 19
title: "External"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
## External Web Search API
This option allows you to connect Open WebUI to your own self-hosted web search API endpoint. This is useful if you want to:
diff --git a/docs/features/web-search/google-pse.md b/docs/features/web-search/google-pse.md
index 656128e9c7..4c3f2e4d94 100644
--- a/docs/features/web-search/google-pse.md
+++ b/docs/features/web-search/google-pse.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 5
+sidebar_position: 6
title: "Google PSE"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
## Google PSE API
### Setup
diff --git a/docs/features/web-search/jina.md b/docs/features/web-search/jina.md
index a36d4545b2..73b22552f3 100644
--- a/docs/features/web-search/jina.md
+++ b/docs/features/web-search/jina.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 6
+sidebar_position: 7
title: "Jina"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
# Jina Web Search Integration
This guide provides instructions on how to integrate [Jina AI](https://jina.ai/), a powerful AI-driven search foundation, with Open WebUI. The integration uses Jina's `DeepSearch` API to provide web search capabilities.
@@ -51,7 +57,8 @@ To enable the Jina web search integration, follow these steps in the Open WebUI
2. **Navigate to Web Search Settings:** Go to the **Admin Panel**, then click on **Settings** > **Web Search**.
3. **Select Jina as the Search Engine:** In the "Web Search Engine" dropdown menu, select **Jina**.
4. **Enter Your API Key:** Paste your Jina API key into the **Jina API Key** input field.
-5. **Save Changes:** Scroll down and click the **Save** button to apply the changes.
+5. **(Optional) Enter Jina API Base URL:** If you need to use a specific endpoint (e.g., for EU data processing), enter it in the **Jina API Base URL** field. Default is `https://s.jina.ai/`.
+6. **Save Changes:** Scroll down and click the **Save** button to apply the changes.
### 3. Environment Variable Configuration
@@ -60,6 +67,7 @@ For Docker-based deployments, you can configure the Jina integration using an en
Set the following environment variable for your Open WebUI instance:
- `JINA_API_KEY`: Your Jina API key.
+- `JINA_API_BASE_URL`: (Optional) Custom Jina API endpoint.
**Example Docker `run` command:**
diff --git a/docs/features/web-search/kagi.md b/docs/features/web-search/kagi.md
index 1829ee61c4..1f5c423828 100644
--- a/docs/features/web-search/kagi.md
+++ b/docs/features/web-search/kagi.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 7
+sidebar_position: 8
title: "Kagi"
---
@@ -14,3 +14,9 @@ This tutorial is a community contribution and is not supported by the Open WebUI
For a comprehensive list of all environment variables related to Web Search (including concurrency settings, result counts, and more), please refer to the [Environment Configuration documentation](../../getting-started/env-configuration#web-search).
:::
+
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
diff --git a/docs/features/web-search/mojeek.md b/docs/features/web-search/mojeek.md
index 79b332f929..3c9a0f5c04 100644
--- a/docs/features/web-search/mojeek.md
+++ b/docs/features/web-search/mojeek.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 8
+sidebar_position: 9
title: "Mojeek"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
## Mojeek Search API
### Setup
diff --git a/docs/features/web-search/ollama-cloud.mdx b/docs/features/web-search/ollama-cloud.mdx
index 0874daa816..c67a84237a 100644
--- a/docs/features/web-search/ollama-cloud.mdx
+++ b/docs/features/web-search/ollama-cloud.mdx
@@ -1,5 +1,5 @@
---
-sidebar_position: 4
+sidebar_position: 10
title: "Ollama Cloud Web Search"
---
@@ -14,3 +14,9 @@ This tutorial is a community contribution and is not supported by the Open WebUI
For a comprehensive list of all environment variables related to Web Search (including concurrency settings, result counts, and more), please refer to the [Environment Configuration documentation](../../getting-started/env-configuration#web-search).
:::
+
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
diff --git a/docs/features/web-search/perplexity.mdx b/docs/features/web-search/perplexity.mdx
index 5ab89c7a17..f0a542b29e 100644
--- a/docs/features/web-search/perplexity.mdx
+++ b/docs/features/web-search/perplexity.mdx
@@ -1,5 +1,5 @@
---
-sidebar_position: 30
+sidebar_position: 20
title: "Perplexity"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
## Perplexity API
:::info
diff --git a/docs/features/web-search/perplexity_search.mdx b/docs/features/web-search/perplexity_search.mdx
index b6cec0c0e3..bd778ce07a 100644
--- a/docs/features/web-search/perplexity_search.mdx
+++ b/docs/features/web-search/perplexity_search.mdx
@@ -1,5 +1,5 @@
---
-sidebar_position: 31
+sidebar_position: 21
title: "Perplexity Search"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
## Perplexity Search API
:::info
diff --git a/docs/features/web-search/save-to-knowledge.mdx b/docs/features/web-search/save-to-knowledge.mdx
new file mode 100644
index 0000000000..9319bdaee6
--- /dev/null
+++ b/docs/features/web-search/save-to-knowledge.mdx
@@ -0,0 +1,64 @@
+---
+sidebar_position: 10
+title: "Save Search Results to Knowledge"
+---
+
+# Save Search Results to Knowledge 📚
+
+The **Add Web Sources to Knowledge Action** allows you to save web search result URLs directly to your Knowledge Base with a single click. This feature streamlines the process of building a research bank by automating the fetching, sanitizing, and uploading of web content.
+
+## Features
+
+- **One-Click Saving**: Quickly add selected sources from a chat message to any Knowledge Base.
+- **URL Selection**: Choose specific URLs to save from a numbered list.
+- **Batch Processing**: Handle multiple URLs in a single action.
+- **Duplicate Detection**: Automatically skip URLs that already exist in the target Knowledge Base.
+- **Configurable Defaults**: Set a default Knowledge Base and skip confirmation dialogs for a faster workflow.
+
+## Setup
+
+The "Add Web Sources to Knowledge" feature is implemented as a **Function Action**. To use it:
+
+1. **Download the Action**: Visit the Open WebUI Community Hub and download the [Add Web Sources to Knowledge Action](https://openwebui.com/posts/65d97417-d079-4720-b2cc-a63dd59b7e3e).
+2. **Enable the Action**:
+ * Navigate to **Workspace > Functions**.
+ * Import or create a new function with the provided code.
+ * Enable the action globally or for specific models.
+
+## How to Use
+
+1. **Trigger Web Search**: Ask a question that triggers web search (e.g., using DDGS, Google PSE, etc.).
+2. **Click the Action Button**: Once the model returns citations, click the **folder+** icon in the message toolbar.
+3. **Select Sources**: A dialog will appear. Enter the numbers of the sources you wish to save (e.g., `1,3,5` or `1-3` or `all`).
+4. **Choose Knowledge Base**: Select the target Knowledge Base where the content should be saved.
+5. **Done**: The system will fetch the content using your configured **Web Loader** and add it to the Knowledge Base.
+
+## Configuration (Valves)
+
+You can customize the action's behavior through **Valves** in the function settings.
+
+### Admin Settings (Global Defaults)
+
+| Setting | Default | Description |
+|---------|---------|-------------|
+| `max_urls_per_action` | `10` | Maximum number of URLs to process in a single action. |
+| `enable_duplicate_check` | `True` | Check if URL already exists in the Knowledge Base before adding. |
+| `default_knowledge_base` | `""` | System-wide default Knowledge Base name or ID. |
+| `skip_confirmation` | `False` | Skip confirmation dialogs and use the default Knowledge Base. |
+| `file_name_prefix` | `""` | Prefix for generated file names (e.g., `web_`). |
+
+### User Settings (Personal Overrides)
+
+Users can override global defaults in their own settings:
+- **Default Knowledge Base**: Set a preferred KB to avoid manual selection.
+- **Skip Confirmation**: Enable for instant one-click saving (requires a default KB).
+- **File Name Prefix**: Customize the prefix for your saved sources.
+
+:::tip Power User Tip
+Set your **Default Knowledge Base** and enable **Skip Confirmation** in your User Valves to achieve instant, one-click saving of web sources!
+:::
+
+## Troubleshooting
+
+- **Content Quality**: The quality of the saved content depends on your **Web Loader Engine** settings (Admin > Settings > Documents). For JavaScript-heavy sites, consider using **Firecrawl** or **Playwright**.
+- **No URLs Found**: This action works with web search results that return structured citations. If no URLs are detected, ensure web search is properly enabled and returning results.
diff --git a/docs/features/web-search/searchapi.md b/docs/features/web-search/searchapi.md
index 79b5a91f1e..82fa88a78d 100644
--- a/docs/features/web-search/searchapi.md
+++ b/docs/features/web-search/searchapi.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 9
+sidebar_position: 11
title: "SearchApi"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
## SearchApi API
[SearchApi](https://searchapi.io) is a collection of real-time SERP APIs. Any existing or upcoming SERP engine that returns `organic_results` is supported. The default web search engine is `google`, but it can be changed to `bing`, `baidu`, `google_news`, `bing_news`, `google_scholar`, `google_patents`, and others.
diff --git a/docs/features/web-search/searxng.md b/docs/features/web-search/searxng.md
index 1b49c7e531..3d90c12997 100644
--- a/docs/features/web-search/searxng.md
+++ b/docs/features/web-search/searxng.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 10
+sidebar_position: 12
title: "SearXNG"
---
@@ -345,6 +345,12 @@ docker exec -it open-webui curl http://host.docker.internal:8080/search?q=this+i

+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems, including proxy configuration, connection timeouts, and empty content issues.
+
+:::
+
## 5. Using Web Search in a Chat
To access Web Search, Click the Integrations button next to the + icon.
diff --git a/docs/features/web-search/serpapi.md b/docs/features/web-search/serpapi.md
index 46d06a7cc6..7d4c19a55f 100644
--- a/docs/features/web-search/serpapi.md
+++ b/docs/features/web-search/serpapi.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 15
+sidebar_position: 13
title: "SerpApi"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
## SerpApi API
[SerpApi](https://serpapi.com/) Scrape Google and other search engines from our fast, easy, and complete API. Any existing or upcoming SERP engine that returns `organic_results` is supported. The default web search engine is `google`, but it can be changed to `bing`, `baidu`, `google_news`, `google_scholar`, `google_patents`, and others.
diff --git a/docs/features/web-search/serper.md b/docs/features/web-search/serper.md
index 6b2dd8c8e1..38e71f0c6d 100644
--- a/docs/features/web-search/serper.md
+++ b/docs/features/web-search/serper.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 11
+sidebar_position: 14
title: "Serper"
---
@@ -14,3 +14,9 @@ This tutorial is a community contribution and is not supported by the Open WebUI
For a comprehensive list of all environment variables related to Web Search (including concurrency settings, result counts, and more), please refer to the [Environment Configuration documentation](../../getting-started/env-configuration#web-search).
:::
+
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
diff --git a/docs/features/web-search/serply.md b/docs/features/web-search/serply.md
index 9b11d3507f..e15ef66968 100644
--- a/docs/features/web-search/serply.md
+++ b/docs/features/web-search/serply.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 12
+sidebar_position: 15
title: "Serply"
---
@@ -14,3 +14,9 @@ This tutorial is a community contribution and is not supported by the Open WebUI
For a comprehensive list of all environment variables related to Web Search (including concurrency settings, result counts, and more), please refer to the [Environment Configuration documentation](../../getting-started/env-configuration#web-search).
:::
+
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
diff --git a/docs/features/web-search/serpstack.md b/docs/features/web-search/serpstack.md
index 85cebe7b32..bc56e6a5b0 100644
--- a/docs/features/web-search/serpstack.md
+++ b/docs/features/web-search/serpstack.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 13
+sidebar_position: 16
title: "Serpstack"
---
@@ -14,3 +14,9 @@ This tutorial is a community contribution and is not supported by the Open WebUI
For a comprehensive list of all environment variables related to Web Search (including concurrency settings, result counts, and more), please refer to the [Environment Configuration documentation](../../getting-started/env-configuration#web-search).
:::
+
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
diff --git a/docs/features/web-search/tavily.md b/docs/features/web-search/tavily.md
index 9838c93c06..8853c1f2de 100644
--- a/docs/features/web-search/tavily.md
+++ b/docs/features/web-search/tavily.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 14
+sidebar_position: 17
title: "Tavily"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
## Overview
Integrating Tavily with Open WebUI allows your language model to perform real-time web searches, providing up-to-date and relevant information. This tutorial guides you through configuring Tavily as a web search provider in Open WebUI.
diff --git a/docs/features/web-search/yacy.md b/docs/features/web-search/yacy.md
index 1def81ca51..a3bcfd8f3f 100644
--- a/docs/features/web-search/yacy.md
+++ b/docs/features/web-search/yacy.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 16
+sidebar_position: 18
title: "Yacy"
---
@@ -15,6 +15,12 @@ For a comprehensive list of all environment variables related to Web Search (inc
:::
+:::tip Troubleshooting
+
+Having issues with web search? Check out the [Web Search Troubleshooting Guide](../../troubleshooting/web-search) for solutions to common problems like proxy configuration, connection timeouts, and empty content.
+
+:::
+
## Yacy API
### Setup
diff --git a/docs/features/workspace/index.mdx b/docs/features/workspace/index.mdx
index c6ded82273..6a08a8e303 100644
--- a/docs/features/workspace/index.mdx
+++ b/docs/features/workspace/index.mdx
@@ -1,5 +1,5 @@
---
-sidebar_position: 700
+sidebar_position: 1
title: "Workspace"
---
diff --git a/docs/features/workspace/knowledge.md b/docs/features/workspace/knowledge.md
index 9de1e3a857..e7ec3a3ea8 100644
--- a/docs/features/workspace/knowledge.md
+++ b/docs/features/workspace/knowledge.md
@@ -1,17 +1,17 @@
---
-sidebar_position: 1
+sidebar_position: 4
title: "Knowledge"
---
- Knowledge part of Open WebUI is like a memory bank that makes your interactions even more powerful and context-aware. Let's break down what "Knowledge" really means in Open WebUI, how it works, and why it’s incredibly helpful for enhancing your experience.
+Knowledge part of Open WebUI is like a memory bank that makes your interactions even more powerful and context-aware. Let's break down what "Knowledge" really means in Open WebUI, how it works, and why it's incredibly helpful for enhancing your experience.
## TL;DR
- **Knowledge** is a section in Open WebUI where you can store structured information that the system can refer to during your interactions.
-- It’s like a memory system for Open WebUI that allows it to pull from saved data, making responses more personalized and contextually aware.
+- It's like a memory system for Open WebUI that allows it to pull from saved data, making responses more personalized and contextually aware.
- You can use Knowledge directly in your chats with Open WebUI to access the stored data whenever you need it.
-Setting up Knowledge is straightforward! Simply head to the Knowledge section inside work space and start adding details or data. You don’t need coding expertise or technical setup; it’s built into the core system!
+Setting up Knowledge is straightforward! Simply head to the Knowledge section inside work space and start adding details or data. You don't need coding expertise or technical setup; it's built into the core system!
## What is the "Knowledge" Section?
@@ -23,33 +23,64 @@ Imagine you're working on a long-term project and want the system to remember ce
Some examples of what you might store in Knowledge:
-- Important project parameters or specific data points you’ll frequently reference.
+- Important project parameters or specific data points you'll frequently reference.
- Custom commands, workflows, or settings you want to apply.
- Personal preferences, guidelines, or rules that Open WebUI can follow in every chat.
### How to Use Knowledge in Chats
-Accessing stored Knowledge in your chats is easy! By simply referencing what’s saved (using '#' before the name), Open WebUI can pull in data or follow specific guidelines that you’ve set up in the Knowledge section.
+Accessing stored Knowledge in your chats is easy! By simply referencing what's saved (using '#' before the name), Open WebUI can pull in data or follow specific guidelines that you've set up in the Knowledge section.
For example:
- When discussing a project, Open WebUI can automatically recall your specified project details.
- It can apply custom preferences to responses, like formality levels or preferred phrasing.
-To reference Knowledge in your chats, just ensure it’s saved in the Knowledge section, and Open WebUI will know when and where to bring in the relevant information!
+To reference Knowledge in your chats, just ensure it's saved in the Knowledge section, and Open WebUI will know when and where to bring in the relevant information!
Admins can add knowledge to the workspace, which users can access and use; however, users do not have direct access to the workspace itself.
+### Native Mode (Agentic Mode) Knowledge Tools
+
+When using **Native Function Calling (Agentic Mode)**, quality models can interact with your Knowledge Bases autonomously using built-in tools:
+
+:::tip Quality Models for Knowledge Exploration
+Autonomous knowledge base exploration works best with frontier models (GPT-5, Claude 4.5+, Gemini 3+) that can intelligently search, browse, and synthesize information from multiple documents. Small local models may struggle with multi-step knowledge retrieval.
+:::
+
+- **`query_knowledge_bases`**: Search across knowledge bases using semantic/vector search. This should be the model's first choice for finding information before searching the web.
+- **`list_knowledge_bases`**: Browse available knowledge bases with file counts.
+- **`search_knowledge_bases`**: Find specific knowledge bases by name or description.
+- **`search_knowledge_files`**: Locate files within knowledge bases by filename.
+- **`view_knowledge_file`**: Read the full content of a specific file from a knowledge base.
+
+These tools enable models to autonomously explore and retrieve information from your knowledge bases, making conversations more contextually aware and grounded in your stored documents.
+
+:::info Central Tool Documentation
+For complete details on all built-in agentic tools and how to configure them, see the [**Native/Agentic Mode Tools Guide**](/features/plugin/tools#built-in-system-tools-nativeagentic-mode).
+:::
+
### Setting Up Your Knowledge Base
1. **Navigate to the Knowledge Section**: This area is designed to be user-friendly and intuitive.
2. **Add Entries**: Input information you want Open WebUI to remember. It can be as specific or broad as you like.
3. **Save and Apply**: Once saved, the Knowledge is accessible and ready to enhance your chat interactions.
+### Exporting a Knowledge Base
+
+Admins can export an entire knowledge base as a downloadable zip file. This is useful for backing up your knowledge, migrating data between instances, or sharing curated knowledge collections with others.
+
+To export a knowledge base, open the item menu (three dots) on any knowledge base entry and select **Export**. The system will generate a zip archive containing all files from that knowledge base, converted to `.txt` format for universal compatibility. The zip file will be named after the knowledge base itself.
+
+:::note Admin Only
+The export feature is restricted to admin users. Regular users will not see the Export option in the menu.
+:::
+
## Summary
- The Knowledge section is like Open WebUI's "memory bank," where you can store data that you want it to remember and use.
- **Use Knowledge to keep the system aware** of important details, ensuring a personalized chat experience.
- You can **directly reference Knowledge in chats** to bring in stored data whenever you need it using '#' + name of the knowlege.
+- **Admins can export knowledge bases** as zip files for backup, migration, or sharing purposes.
-🌟 Remember, there’s always more to discover, so dive in and make Open WebUI truly your own!
+🌟 Remember, there's always more to discover, so dive in and make Open WebUI truly your own!
diff --git a/docs/features/workspace/models.md b/docs/features/workspace/models.md
index d8ba3a4f23..62f0361fad 100644
--- a/docs/features/workspace/models.md
+++ b/docs/features/workspace/models.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 0
+sidebar_position: 2
title: "Models"
sidebar_label: "Models"
---
@@ -22,7 +22,7 @@ Both actions lead to the same Model Builder interface, where you can configure t
### Core Configuration
-- **Avatar Photo**: Upload a custom image to represent your model in the chat interface.
+- **Avatar Photo**: Upload a custom image to represent your model in the chat interface. **Animated formats** (GIF and WebP) are supported and their animations will be preserved during the upload process.
- **Model Name & ID**: The display name and unique identifier for your custom preset (e.g., "Python Tutor" or "Meeting Summarizer").
- **Base Model**: The actual model beneath the hood that powers the agent. You can choose *any* model connected to Open WebUI. You can create a custom preset for `gpt-4o` just as easily as `llama3`.
- **Fallback Behavior**: If the configured Base Model is unavailable and the `ENABLE_CUSTOM_MODEL_FALLBACK` environment variable is set to `True`, the system will automatically fall back to the first configured default model (set in Admin Panel > Settings > Models > Default Models). This ensures mission-critical custom models remain functional even if their specific base model is removed or temporarily unavailable.
@@ -80,10 +80,13 @@ You can transform a generic model into a specialized agent by toggling specific
- **Vision**: Toggle to enable image analysis capabilities (requires a vision-capable Base Model).
- **Web Search**: Enable the model to access the configured search provider (e.g., Google, SearxNG) for real-time information.
- **File Upload**: Allow users to upload files to this model.
+ - **File Context**: When enabled (default), attached files are processed via RAG and their content is injected into the conversation. When disabled, file content is **not** extracted or injected—the model receives no file content unless it retrieves it via builtin tools. Only visible when File Upload is enabled. See [File Context vs Builtin Tools](../rag/index.md#file-context-vs-builtin-tools) for details.
- **Code Interpreter**: Enable Python code execution.
- **Image Generation**: Enable image generation integration.
- **Usage / Citations**: Toggle usage tracking or source citations.
- **Status Updates**: Show visible progress steps in the chat UI (e.g., "Searching web...", "Reading file...") during generation. Useful for slower, complex tasks.
+ - **Builtin Tools**: When enabled (default), automatically injects system tools (timestamps, memory, chat history, knowledge base queries, notes, etc.) in [Native Function Calling mode](../plugin/tools/index.mdx#disabling-builtin-tools-per-model). Disable this if the model doesn't support function calling or you need to save context window tokens. Note: This is separate from **File Context**—see [File Context vs Builtin Tools](../rag/index.md#file-context-vs-builtin-tools) for the difference.
+- **TTS Voice**: Set a specific Text-to-Speech voice for this model. When users read responses aloud, this voice will be used instead of the global default. Useful for giving different personas distinct voices. Leave empty to use the user's settings or system default. See [Per-Model TTS Voice](../audio/text-to-speech/openai-tts-integration#per-model-tts-voice) for details.
- **Default Features**: Force specific toggles (like Web Search) to be "On" immediately when a user starts a chat with this model.
## Model Management
diff --git a/docs/features/workspace/prompts.md b/docs/features/workspace/prompts.md
index 38c14dc2da..f01ae02009 100644
--- a/docs/features/workspace/prompts.md
+++ b/docs/features/workspace/prompts.md
@@ -1,5 +1,5 @@
---
-sidebar_position: 2
+sidebar_position: 3
title: "Prompts"
---
diff --git a/docs/getting-started/advanced-topics/index.mdx b/docs/getting-started/advanced-topics/index.mdx
index 99164250ea..1895ef3558 100644
--- a/docs/getting-started/advanced-topics/index.mdx
+++ b/docs/getting-started/advanced-topics/index.mdx
@@ -27,9 +27,9 @@ Ensure secure communication by implementing HTTPS encryption in your deployment.
---
-## 📊 Monitoring
-Learn how to monitor your Open WebUI instance, including health checks, model connectivity, and response testing.
-[Monitoring Guide](/getting-started/advanced-topics/monitoring)
+## 🔑 API Keys & Monitoring
+Learn how to set up API keys for programmatic access, and monitor your Open WebUI instance with health checks and response testing.
+[API Keys & Monitoring Guide](/getting-started/advanced-topics/monitoring)
---
diff --git a/docs/getting-started/advanced-topics/monitoring/index.md b/docs/getting-started/advanced-topics/monitoring/index.md
index 45956b03cf..5e0a097f39 100644
--- a/docs/getting-started/advanced-topics/monitoring/index.md
+++ b/docs/getting-started/advanced-topics/monitoring/index.md
@@ -1,11 +1,11 @@
---
sidebar_position: 6
-title: "Monitoring Your Open WebUI"
+title: "API Keys & Monitoring"
---
-# Keep Your Open WebUI Healthy with Monitoring 🩺
+# API Keys & Monitoring Your Open WebUI 🔑🩺
-Monitoring your Open WebUI instance is crucial for ensuring it runs reliably, performs well, and allows you to quickly identify and resolve any issues. This guide outlines three levels of monitoring, from basic availability checks to in-depth model response testing.
+This guide covers two essential topics: setting up API keys for programmatic access to Open WebUI, and monitoring your instance to ensure reliability and performance.
**Why Monitor?**
@@ -86,24 +86,102 @@ You'll need an API key to access this endpoint. See the "Authentication Setup" s
### Authentication Setup for API Key 🔑
-Before you can monitor the `/api/models` endpoint, you need to enable API keys in Open WebUI and generate one:
+Before you can monitor the `/api/models` endpoint, you need to configure API keys in Open WebUI. **API key access now requires a two-level permission structure**: first, the global API keys feature must be enabled, and second, individual users or groups must be granted API key creation permissions.
-1. **Enable API Keys (Admin Required):**
- - Log in to Open WebUI as an administrator.
- - Go to **Admin Settings** (usually in the top right menu) > **General**.
- - Find the "Enable API Key" setting and **turn it ON**.
- - Click **Save Changes**.
+#### Step 1: Enable API Keys Globally (Admin Required)
-2. **Generate an API Key (User Settings):**
- - Go to your **User Settings** (usually by clicking on your profile icon in the top right).
- - Navigate to the **Account** section.
- - Click **Generate New API Key**.
- - Give the API key a descriptive name (e.g., "Monitoring API Key").
- - **Copy the generated API key** and store it securely. You'll need this for your monitoring setup.
+1. Log in to Open WebUI as an **administrator**.
+2. Click on your **profile icon** in the bottom-left corner of the sidebar, then select **Admin Panel**.
+3. Navigate to **Settings** > **General**.
+4. Scroll down to the **Authentication** section.
+5. Find the **"Enable API Keys"** toggle and **turn it ON**.
+6. *(Optional)* Configure additional API key restrictions:
+ - **API Key Endpoint Restrictions**: Enable this to limit which endpoints can be accessed via API keys.
+ - **Allowed Endpoints**: Specify a comma-separated list of allowed endpoints (e.g., `/api/v1/models,/api/v1/chat/completions`).
+7. Click **Save** at the bottom of the page.
- *(Optional but Recommended):* For security best practices, consider creating a **non-administrator user account** specifically for monitoring and generate an API key for that user. This limits the potential impact if the monitoring API key is compromised.
+:::info
- *If you don't see the API key generation option in your settings, contact your Open WebUI administrator to ensure API keys are enabled.*
+This enables the API key feature globally but does not automatically grant users permission to create API keys. You must also configure user or group permissions in Step 2.
+
+:::
+
+#### Step 2: Grant API Key Permission (Admin Required)
+
+**API key creation is disabled by default for all users, including administrators.** Admins are **not** exempt from this permission requirement—to use API keys, they must also grant themselves the permission. Administrators can grant API key permissions using one of the following methods:
+
+##### Option A: Grant Permission via Default Permissions
+
+This grants the API Keys permission to **all users with the "user" role**:
+
+1. In the **Admin Panel**, navigate to **Users** > **Groups**.
+2. At the bottom of the Groups page, click on **"Default permissions"** (this applies to all users with the "user" role).
+3. In the modal that opens, scroll to the **Features Permissions** section.
+4. Find **"API Keys"** and **toggle it ON**.
+5. Click **Save**.
+
+:::info
+
+**Note for Administrators:** "Default permissions" only applies to accounts with the "user" role. If you are an admin and need API key access, you must use **Option B** (User Groups)—create or select a group with API Keys enabled and add yourself to that group.
+
+:::
+
+:::warning
+
+Enabling API Keys for all users means any user can generate API keys that provide programmatic access to Open WebUI with their account's permissions. Consider using User Groups (Option B) for more restrictive access control.
+
+:::
+
+##### Option B: Grant Permission via User Groups
+
+For more granular control, you can grant API key permissions to specific user groups only:
+
+1. In the **Admin Panel**, navigate to **Users** > **Groups**.
+2. Select the group you want to grant API key permissions to (or click the **+ button** to create a new group).
+3. In the group edit modal, click on the **Permissions** tab.
+4. Scroll to **Features Permissions**.
+5. Find **"API Keys"** and **toggle it ON**.
+6. Click **Save**.
+
+:::tip
+
+Create a dedicated monitoring group (e.g., "Monitoring Users") and add only the accounts that need API key access for monitoring purposes. This follows the principle of least privilege.
+
+:::
+
+#### Step 3: Generate an API Key (User Action)
+
+Once global API keys are enabled and the user has been granted permission:
+
+1. Log in to Open WebUI with a user account that has API key permissions.
+2. Click on your **profile icon** in the bottom-left corner of the sidebar.
+3. Select **Settings** > **Account**.
+4. In the **API Keys** section, click **Generate New API Key**.
+5. Give the API key a descriptive name (e.g., "Monitoring API Key").
+6. **Copy the generated API key** immediately and store it securely—you won't be able to view it again.
+
+:::warning
+
+Treat your API key like a password! Store it securely and never share it publicly. If you suspect an API key has been compromised, delete it immediately and generate a new one.
+
+:::
+
+#### Recommended: Create a Dedicated Monitoring Account
+
+For production monitoring, we recommend:
+
+1. Create a **non-administrator user account** specifically for monitoring (e.g., "monitoring-bot").
+2. Add this account to a group with API key permissions (or ensure default permissions allow API key creation).
+3. Generate an API key from this account.
+
+This approach limits the potential impact if the monitoring API key is compromised—the attacker would only have access to the permissions granted to that specific monitoring account, not administrator privileges.
+
+#### Troubleshooting
+
+If you don't see the API key generation option in your account settings:
+
+- **Check global setting**: Verify that an administrator has enabled API keys globally under **Admin Panel** > **Settings** > **General** > **Enable API Keys**. See [`ENABLE_API_KEYS`](/getting-started/env-configuration#enable_api_keys).
+- **Check your permissions**: Verify that your user account or group has been granted the "API Keys" feature permission under **Features Permissions**. See [`USER_PERMISSIONS_FEATURES_API_KEYS`](/getting-started/env-configuration#user_permissions_features_api_keys).
### Using Uptime Kuma for Model Connectivity Monitoring 🐻
diff --git a/docs/getting-started/advanced-topics/monitoring/otel.md b/docs/getting-started/advanced-topics/monitoring/otel.md
index 52d93cdbe8..01c51f2366 100644
--- a/docs/getting-started/advanced-topics/monitoring/otel.md
+++ b/docs/getting-started/advanced-topics/monitoring/otel.md
@@ -5,6 +5,16 @@ title: "OpenTelemetry"
Open WebUI supports **distributed tracing and metrics** export via the OpenTelemetry (OTel) protocol (OTLP). This enables integration with modern observability stacks such as **Grafana LGTM (Loki, Grafana, Tempo, Mimir)**, as well as **Jaeger**, **Tempo**, and **Prometheus** to monitor requests, database/Redis queries, response times, and more in real-time.
+:::warning Additional Dependencies
+
+If you are running Open WebUI from source or via `pip` (outside of the official Docker images), OpenTelemetry dependencies **may not be installed by default**. You may need to install them manually:
+
+```bash
+pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
+```
+
+:::
+
## 🚀 Quick Start with Docker Compose
The fastest way to get started with observability is with the pre-configured Docker Compose:
diff --git a/docs/getting-started/env-configuration.mdx b/docs/getting-started/env-configuration.mdx
index f92c36756d..ca09ad3a88 100644
--- a/docs/getting-started/env-configuration.mdx
+++ b/docs/getting-started/env-configuration.mdx
@@ -12,7 +12,7 @@ As new variables are introduced, this page will be updated to reflect the growin
:::info
-This page is up-to-date with Open WebUI release version [v0.6.42](https://github.com/open-webui/open-webui/releases/tag/v0.6.42), but is still a work in progress to later include more accurate descriptions, listing out options available for environment variables, defaults, and improving descriptions.
+This page is up-to-date with Open WebUI release version [v0.7.0](https://github.com/open-webui/open-webui/releases/tag/v0.7.0), but is still a work in progress to later include more accurate descriptions, listing out options available for environment variables, defaults, and improving descriptions.
:::
@@ -36,6 +36,34 @@ To disable this behavior and force Open WebUI to always use your environment var
:::
+### Troubleshooting Ignored Environment Variables 🛠️
+
+If you change an environment variable (like `ENABLE_SIGNUP=True`) but don't see the change reflected in the UI (e.g., the "Sign Up" button is still missing), it's likely because a value has already been persisted in the database from a previous run or a persistent Docker volume.
+
+#### Option 1: Using `ENABLE_PERSISTENT_CONFIG` (Temporary Fix)
+Set `ENABLE_PERSISTENT_CONFIG=False` in your environment. This forces Open WebUI to read your variables directly. Note that UI-based settings changes will not persist across restarts in this mode.
+
+#### Option 2: Update via Admin UI (Recommended)
+The simplest and safest way to change `PersistentConfig` settings is directly through the **Admin Panel** within Open WebUI. Even if an environment variable is set, changes made in the UI will take precedence and be saved to the database.
+
+#### Option 3: Manual Database Update (Last Resort / Lock-out Recovery)
+If you are locked out or cannot access the UI, you can manually update the SQLite database via Docker:
+
+```bash
+docker exec -it open-webui sqlite3 /app/backend/data/webui.db "UPDATE config SET data = json_set(data, '$.ENABLE_SIGNUP', json('true'));"
+```
+*(Replace `ENABLE_SIGNUP` and `true` with the specific setting and value needed.)*
+
+#### Option 4: Resetting for a Fresh Install
+If you are performing a clean installation and want to ensure all environment variables are fresh:
+1. Stop the container.
+2. Remove the persistent volume: `docker volume rm open-webui`.
+3. Restart the container.
+
+:::danger
+**Warning:** Removing the volume will delete all user data, including chats and accounts.
+:::
+
## App/Backend
The following environment variables are used by `backend/open_webui/config.py` to provide Open WebUI startup
@@ -77,6 +105,44 @@ Failure to set WEBUI_URL before using OAuth/SSO will result in failure to log in
- Default: `False`
- Description: If set to True, a "Confirm Password" field is added to the sign-up page to help users avoid typos when creating their password.
+#### `WEBUI_ADMIN_EMAIL`
+
+- Type: `str`
+- Default: Empty string (' ')
+- Description: Specifies the email address for an admin account to be created automatically on first startup when no users exist. This enables headless/automated deployments without manual account creation. When combined with `WEBUI_ADMIN_PASSWORD`, the admin account is created during application startup, and `ENABLE_SIGNUP` is automatically disabled to prevent unauthorized account creation.
+
+:::info
+
+This variable is designed for automated/containerized deployments where manual admin account creation is impractical. The admin account is only created if:
+- No users exist in the database (fresh installation)
+- Both `WEBUI_ADMIN_EMAIL` and `WEBUI_ADMIN_PASSWORD` are configured
+
+After the admin account is created, sign-up is automatically disabled for security. You can re-enable it later via the Admin Panel if needed.
+
+:::
+
+#### `WEBUI_ADMIN_PASSWORD`
+
+- Type: `str`
+- Default: Empty string (' ')
+- Description: Specifies the password for the admin account to be created automatically on first startup when no users exist. Must be used in conjunction with `WEBUI_ADMIN_EMAIL`. The password is securely hashed before storage using the same mechanism as manual account creation.
+
+:::danger
+
+**Security Considerations**
+- Use a strong, unique password for production deployments
+- Consider using secrets management (Docker secrets, Kubernetes secrets, environment variable injection) rather than storing the password in plain text configuration files
+- After initial setup, change the admin password through the UI for enhanced security
+- Never commit this value to version control
+
+:::
+
+#### `WEBUI_ADMIN_NAME`
+
+- Type: `str`
+- Default: `Admin`
+- Description: Specifies the display name for the automatically created admin account. This is used when `WEBUI_ADMIN_EMAIL` and `WEBUI_ADMIN_PASSWORD` are configured for headless admin creation.
+
#### `ENABLE_LOGIN_FORM`
- Type: `bool`
@@ -158,6 +224,34 @@ is also being used and set to `True`. **Never disable this if OAUTH/SSO is not b
- Description: Enables or disables channel support.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `ENABLE_FOLDERS`
+
+- Type: `bool`
+- Default: `True`
+- Description: Enables or disables the folders feature, allowing users to organize their chats into folders in the sidebar.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
+#### `FOLDER_MAX_FILE_COUNT`
+
+- Type: `int`
+- Default: `("") empty string`
+- Description: Sets the maximum number of files processing allowed per folder.
+- Persistence: This environment variable is a `PersistentConfig` variable. It can be configured in the **Admin Panel > Settings > General > Folder Max File Count**. Default is none (empty string) which is unlimited.
+
+#### `ENABLE_NOTES`
+
+- Type: `bool`
+- Default: `True`
+- Description: Enables or disables the notes feature, allowing users to create and manage personal notes within Open WebUI.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
+#### `ENABLE_MEMORIES`
+
+- Type: `bool`
+- Default: `True`
+- Description: Enables or disables the [memory feature](/features/memory), allowing models to store and retrieve long-term information about users.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
#### `WEBHOOK_URL`
- Type: `str`
@@ -176,6 +270,18 @@ is also being used and set to `True`. **Never disable this if OAUTH/SSO is not b
- Default: `True`
- Description: Enables admin users to directly access the chats of other users. When disabled, admins can no longer accesss user's chats in the admin panel. If you disable this, consider disabling `ENABLE_ADMIN_EXPORT` too, if you are using SQLite, as the exports also contain user chats.
+#### `ENABLE_ADMIN_WORKSPACE_CONTENT_ACCESS`
+
+- Type: `bool`
+- Default: `True`
+- Description: Enables admin users to access all workspace content (models, knowledge bases, prompts, and tools) regardless of access control settings. When set to `False`, admins will only see workspace items they have been explicitly granted access to.
+
+:::warning **Deprecated**
+
+**This environment variable is deprecated and may be removed in a future release.** Use [`BYPASS_ADMIN_ACCESS_CONTROL`](#bypass_admin_access_control) instead, which provides the same functionality with a clearer name.
+
+:::
+
#### `BYPASS_ADMIN_ACCESS_CONTROL`
- Type: `bool`
@@ -214,18 +320,6 @@ If you are running larger instances, you WILL NEED to set this to a higher value
- Default: `False`
- Description: Controls whether custom models should fall back to a default model if their assigned base model is missing. When set to `True`, if a custom model's base model is not found, the system will use the first model from the configured `DEFAULT_MODELS` list instead of returning an error.
-#### `MODELS_CACHE_TTL`
-
-- Type: `int`
-- Default: `1`
-- Description: Sets the cache time-to-live in seconds for model list responses from OpenAI and Ollama endpoints. This reduces API calls by caching the available models list for the specified duration. Set to empty string to disable caching entirely.
-
-:::info
-
-This caches the external model lists retrieved from configured OpenAI-compatible and Ollama API endpoints (not Open WebUI's internal model configurations). Higher values improve performance by reducing redundant API requests to external providers but may delay visibility of newly added or removed models on those endpoints. A value of 0 disables caching and forces fresh API calls each time. In high-traffic scenarios, increasing this value (e.g., to 300 seconds) can significantly reduce load on external API endpoints while still providing reasonably fresh model data.
-
-:::
-
#### `SHOW_ADMIN_DETAILS`
- Type: `bool`
@@ -240,6 +334,13 @@ This caches the external model lists retrieved from configured OpenAI-compatible
- Description: Controls whether the active user count is visible to all users or restricted to administrators only. When set to `False`, only admin users can see how many users are currently active, reducing backend load and addressing privacy concerns in large deployments.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `ENABLE_USER_STATUS`
+
+- Type: `bool`
+- Default: `True`
+- Description: Globally enables or disables user status functionality. When disabled, the status UI (including blinking active/away indicators and status messages) is hidden across the application, and user status API endpoints are restricted.
+- Persistence: This environment variable is a `PersistentConfig` variable. It can be toggled in the **Admin Panel > Settings > General > User Status**.
+
#### `ADMIN_EMAIL`
- Type: `str`
@@ -455,11 +556,29 @@ allowing the client to wait indefinitely.
- Type: `int`
- Default: `10`
-- Description: Sets the timeout in seconds for fetching the model list. This can be useful in cases where network latency requires a longer timeout duration to successfully retrieve the model list.
+- Description: Sets the timeout in seconds for fetching the model list from Ollama and OpenAI endpoints. This affects how long Open WebUI waits for each configured endpoint when loading available models.
-:::note
+:::note When to Adjust This Value
+
+**Lower the timeout** (e.g., `3`) if:
+- You have multiple endpoints configured and want faster failover when one is unreachable
+- You prefer the UI to load quickly even if some slow endpoints are skipped
+
+**Increase the timeout** (e.g., `30`) if:
+- Your model servers are slow to respond (e.g., cold starts, large model loading)
+- You're connecting over high-latency networks
+- You're using providers like OpenRouter that may have variable response times
+
+:::
+
+:::warning Database Persistence
+
+Connection URLs configured via the Admin Settings UI are **persisted in the database** and take precedence over environment variables. If you save an unreachable URL and the UI becomes unresponsive, you may need to use one of these recovery options:
+
+- `RESET_CONFIG_ON_START=true` - Resets database config to environment variable values on next startup
+- `ENABLE_PERSISTENT_CONFIG=false` - Always use environment variables (UI changes won't persist)
-The AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST is set to 10 seconds by default to help ensure that all necessary connections are available when opening the web UI. This duration allows enough time for retrieving the model list even in cases of higher network latency. You can lower this value if quicker timeouts are preferred, but keep in mind that doing so may lead to some connections being dropped, depending on your network conditions.
+See the [Model List Loading Issues](/troubleshooting/connection-error#️-model-list-loading-issues-slow-ui--unreachable-endpoints) troubleshooting guide for detailed recovery steps.
:::
@@ -472,7 +591,7 @@ The AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST is set to 10 seconds by default to help en
- Type: `bool`
- Default: `True`
-- Description: Controls SSL/TLS verification for AIOHTTP client sessions when connecting to external APIs.
+- Description: Controls SSL/TLS verification for AIOHTTP client sessions when connecting to external APIs (e.g., Ollama Embeddings).
#### `AIOHTTP_CLIENT_TIMEOUT_TOOL_SERVER_DATA`
@@ -486,6 +605,12 @@ The AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST is set to 10 seconds by default to help en
- Default: `True`
- Description: Controls SSL/TLS verification specifically for tool server connections via AIOHTTP client.
+#### `REQUESTS_VERIFY`
+
+- Type: `bool`
+- Default: `True`
+- Description: Controls SSL/TLS verification for synchronous `requests` (e.g., Tika, External Reranker). Set to `False` to bypass certificate verification for self-signed certificates.
+
### Directories
#### `DATA_DIR`
@@ -519,6 +644,18 @@ The AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST is set to 10 seconds by default to help en
- Default: `INFO`
- Description: Sets the global logging level for all Open WebUI components. Valid values: `DEBUG`, `INFO`, `WARNING`, `ERROR`, `CRITICAL`.
+#### `ENABLE_AUDIT_STDOUT`
+
+- Type: `bool`
+- Default: `False`
+- Description: Controls whether audit logs are output to stdout (console). Useful for containerized environments where logs are collected from stdout.
+
+#### `ENABLE_AUDIT_LOGS_FILE`
+
+- Type: `bool`
+- Default: `True`
+- Description: Controls whether audit logs are written to a file. When enabled, logs are written to the location specified by `AUDIT_LOGS_FILE_PATH`.
+
#### `AUDIT_LOGS_FILE_PATH`
- Type: `str`
@@ -1018,9 +1155,22 @@ Output:
- Type: `bool`
- Default: `True`
-- Description: Controls whether users are shown the share to community button.
+- Description: Controls whether users can share content with the Open WebUI Community and access community resources. When enabled, this setting shows the following UI elements across the application:
+ - **Prompts Workspace**: "Made by Open WebUI Community" section with a link to discover community prompts, and a "Share" button in the prompt menu dropdown
+ - **Tools Workspace**: "Made by Open WebUI Community" section with a link to discover community tools, and a "Share" button in the tool menu dropdown
+ - **Models Workspace**: "Made by Open WebUI Community" section with a link to discover community model presets, and a "Share" button in the model menu dropdown
+ - **Functions Admin**: "Made by Open WebUI Community" section with a link to discover community functions
+ - **Share Chat Modal**: "Share to Open WebUI Community" button when sharing a chat conversation
+ - **Evaluation Feedbacks**: "Share to Open WebUI Community" button for contributing feedback history to the community leaderboard
+ - **Stats Sync Modal**: Enables syncing usage statistics with the community
- Persistence: This environment variable is a `PersistentConfig` variable.
+:::info
+
+When `ENABLE_COMMUNITY_SHARING` is set to `False`, all community sharing buttons and community resource discovery sections will be hidden from the UI. Users will still be able to export content locally, but the option to share directly to the Open WebUI Community will not be available.
+
+:::
+
### Tags Generation
#### `ENABLE_TAGS_GENERATION`
@@ -1078,7 +1228,11 @@ This variable replaces the deprecated `ENABLE_API_KEY` environment variable.
:::info
-For API Key creation (and the API keys themselves) to work, you not only need to enable it globally, but also give specific user groups the permission for it
+For API Key creation (and the API keys themselves) to work, you need **both**:
+1. Enable API keys globally using this setting (`ENABLE_API_KEYS`)
+2. Grant the "API Keys" permission to users via Default Permissions or User Groups
+
+**Note:** Administrators are not exempt—they must also be granted the permission via a User Group to use API keys. See the [Authentication Setup for API Key](/getting-started/advanced-topics/monitoring#authentication-setup-for-api-key-) guide for detailed setup instructions.
:::
@@ -1114,6 +1268,58 @@ This variable replaces the deprecated `API_KEY_ALLOWED_ENDPOINTS` environment va
:::
+### Model Caching
+
+#### `ENABLE_BASE_MODELS_CACHE`
+
+- Type: `bool`
+- Default: `False`
+- Description: When enabled, caches the list of base models from connected Ollama and OpenAI-compatible endpoints in memory. This reduces the number of API calls made to external model providers when loading the model selector, improving performance particularly for deployments with many users or slow connections to model endpoints.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
+**How the cache works:**
+
+- **Initialization**: When enabled, base models are fetched and cached during application startup.
+- **Storage**: The cache is stored in application memory (`app.state.BASE_MODELS`).
+- **Cache Hit**: Subsequent requests for models return the cached list without contacting external endpoints.
+- **Cache Refresh**: The cache is refreshed when:
+ - The application restarts
+ - The connection settings are saved in the **Admin Panel > Settings > Connections** (clicking the **Save** button on the bottom right will trigger a refresh and update the cache with the newly fetched models)
+- **No TTL**: There is no automatic time-based expiration.
+
+:::tip Performance Consideration
+
+Enable this setting in production environments where model lists are relatively stable. For development environments or when frequently adding/removing models from Ollama, you may prefer to leave it disabled for real-time model discovery.
+
+:::
+
+#### `MODELS_CACHE_TTL`
+
+- Type: `int`
+- Default: `1`
+- Description: Sets the cache time-to-live in seconds for model list responses from OpenAI and Ollama endpoints. This reduces API calls by caching the available models list for the specified duration. Set to empty string to disable caching entirely.
+
+This caches the external model lists retrieved from configured OpenAI-compatible and Ollama API endpoints (not Open WebUI's internal model configurations). Higher values improve performance by reducing redundant API requests to external providers but may delay visibility of newly added or removed models on those endpoints. A value of 0 disables caching and forces fresh API calls each time.
+
+:::tip High-Traffic Recommendation
+
+In high-traffic scenarios, increasing this value (e.g., to 300 seconds) can significantly reduce load on external API endpoints while still providing reasonably fresh model data.
+
+:::
+
+:::info Two Caching Mechanisms
+
+Open WebUI has **two model caching mechanisms** that work independently:
+
+| Setting | Type | Default | Refresh Trigger |
+|---------|------|---------|-----------------|
+| `ENABLE_BASE_MODELS_CACHE` | In-memory | `False` | App restart OR Admin Save |
+| `MODELS_CACHE_TTL` | TTL-based | `1` second | Automatic after TTL expires |
+
+For maximum performance, enable both: `ENABLE_BASE_MODELS_CACHE=True` with `MODELS_CACHE_TTL=300`.
+
+:::
+
#### `JWT_EXPIRES_IN`
- Type: `str`
@@ -2439,17 +2645,38 @@ Provide a clear and direct response to the user's query, including inline citati
- Description: Specifies how much overlap there should be between chunks.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `CHUNK_MIN_SIZE_TARGET`
+
+- Type: `int`
+- Default: `0`
+- Description: Chunks smaller than this threshold will be intelligently merged with neighboring chunks when possible. This helps prevent tiny, low-quality fragments that can hurt retrieval performance and waste embedding resources. This feature only works when `ENABLE_MARKDOWN_HEADER_TEXT_SPLITTER` is enabled. Set to `0` to disable merging. For more information on the benefits and configuration, see the [RAG guide](/features/rag#chunking-configuration).
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
#### `RAG_TEXT_SPLITTER`
- Type: `str`
- Options:
- `character`
- `token`
- - `markdown_header`
- Default: `character`
-- Description: Sets the text splitter for RAG models.
+- Description: Sets the text splitter for RAG models. Use `character` for RecursiveCharacterTextSplitter or `token` for TokenTextSplitter (Tiktoken-based).
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `ENABLE_MARKDOWN_HEADER_TEXT_SPLITTER`
+
+- Type: `bool`
+- Default: `True`
+- Description: Enables markdown header text splitting as a preprocessing step before character or token splitting. When enabled, documents are first split by markdown headers (h1-h6), then the resulting chunks are further processed by the configured text splitter (`RAG_TEXT_SPLITTER`). This helps preserve document structure and context across chunks.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
+:::info
+
+**Migration from `markdown_header` TEXT_SPLITTER**
+
+The `markdown_header` option has been removed from `RAG_TEXT_SPLITTER`. Markdown header splitting is now a preprocessing step controlled by `ENABLE_MARKDOWN_HEADER_TEXT_SPLITTER`. If you were using `RAG_TEXT_SPLITTER=markdown_header`, switch to `character` or `token` and ensure `ENABLE_MARKDOWN_HEADER_TEXT_SPLITTER` is enabled (it is enabled by default).
+
+:::
+
#### `TIKTOKEN_CACHE_DIR`
- Type: `str`
@@ -2626,6 +2853,12 @@ If you are embedding externally via API, ensure your rate limits are high enough
- Description: Sets a model for reranking results. Locally, a Sentence-Transformer model is used.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `SENTENCE_TRANSFORMERS_CROSS_ENCODER_SIGMOID_ACTIVATION_FUNCTION`
+
+- Type: `bool`
+- Default: `True`
+- Description: When enabled (default), applies sigmoid normalization to local CrossEncoder reranking scores to ensure they fall within the 0-1 range. This allows the relevance threshold setting to work correctly with models like MS MARCO that output raw logits.
+
#### `RAG_EXTERNAL_RERANKER_TIMEOUT`
- Type: `str`
@@ -2743,6 +2976,12 @@ Strictly return in JSON format:
- Description: Specifies whether to use the full context for RAG.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `RAG_SYSTEM_CONTEXT`
+
+- Type: `bool`
+- Default: `False`
+- Description: When enabled, injects RAG context into the **system message** instead of the user message. This is highly recommended for optimizing performance when using models that support **KV prefix caching** or **Prompt Caching**. This includes local engines (like Ollama, llama.cpp, or vLLM) and cloud providers / Model-as-a-Service providers (like OpenAI and Vertex AI). By placing the context in the system message, it remains at a stable position at the start of the conversation, allowing the cache to persist across multiple turns. When disabled (default), context is injected into the user message, which shifts position each turn and invalidates the cache.
+
#### `ENABLE_RAG_LOCAL_WEB_FETCH`
- Type: `bool`
@@ -2963,6 +3202,14 @@ Allow only specific domains: WEB_FETCH_FILTER_LIST="example.com,trusted-site.org
- `yacy`
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `DDGS_BACKEND`
+
+- Type: `str`
+- Default: `auto`
+- Options: `auto` (Random), `bing`, `brave`, `duckduckgo`, `google`, `grokipedia`, `mojeek`, `wikipedia`, `yahoo`, `yandex`.
+- Description: Specifies the backend to be used by the DDGS engine.
+- Persistence: This environment variable is a `PersistentConfig` variable. It can be configured in the **Admin Panel > Settings > Web Search > DDGS Backend** when DDGS is selected as the search engine.
+
#### `BYPASS_WEB_SEARCH_EMBEDDING_AND_RETRIEVAL`
- Type: `bool`
@@ -2970,6 +3217,13 @@ Allow only specific domains: WEB_FETCH_FILTER_LIST="example.com,trusted-site.org
- Description: Bypasses the web search embedding and retrieval process.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `BYPASS_WEB_SEARCH_WEB_LOADER`
+
+- Type: `bool`
+- Default: `False`
+- Description: Bypasses the web loader when performing web search. When enabled, only snippets from the search engine are used, and the full page content is not fetched.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
#### `SEARXNG_QUERY_URL`
- Type: `str`
@@ -3002,6 +3256,12 @@ the search query. Example: `http://searxng.local/search?q=`
- Description: Sets the API key for the Brave Search API.
- Persistence: This environment variable is a `PersistentConfig` variable.
+:::info
+
+Brave's free tier enforces a rate limit of 1 request per second. Open WebUI automatically retries requests that receive HTTP 429 rate limit errors after a 1-second delay. For free tier users, set `WEB_SEARCH_CONCURRENT_REQUESTS` to `1` to ensure sequential request processing. See the [Brave web search documentation](/features/web-search/brave) for more details.
+
+:::
+
#### `KAGI_SEARCH_API_KEY`
- Type: `str`
@@ -3063,6 +3323,13 @@ the search query. Example: `http://searxng.local/search?q=`
- Description: Sets the API key for Jina.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `JINA_API_BASE_URL`
+
+- Type: `str`
+- Default: `https://s.jina.ai/`
+- Description: Sets the Base URL for Jina Search API. Useful for specifying custom or regional endpoints (e.g., `https://eu-s-beta.jina.ai/`).
+- Persistence: This environment variable is a `PersistentConfig` variable. It can be configured in the **Admin Panel > Settings > Web Search > Jina API Base URL**.
+
#### `BING_SEARCH_V7_ENDPOINT`
- Type: `str`
@@ -3295,6 +3562,13 @@ Using a remote Playwright browser via `PLAYWRIGHT_WS_URL` can be beneficial for:
- Description: Sets the API key for Firecrawl API.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `FIRECRAWL_TIMEOUT`
+
+- Type: `int`
+- Default: `None`
+- Description: Specifies the timeout in milliseconds for Firecrawl requests. If not set, the default Firecrawl timeout is used.
+- Persistence: This environment variable is a `PersistentConfig` variable. It can be configured in the **Admin Panel > Settings > Web Search > Firecrawl Timeout**.
+
#### `PLAYWRIGHT_TIMEOUT`
- Type: `int`
@@ -3357,12 +3631,17 @@ Note: If none of the specified languages are available and `en` was not in your
- Default: `${DATA_DIR}/cache/whisper/models`
- Description: Specifies the directory to store Whisper model files.
+#### `WHISPER_COMPUTE_TYPE`
+
+- Type: `str`
+- Default: `int8` (CPU), `float16` (CUDA)
+- Description: Sets the compute type for Whisper model inference. Defaults to `int8` for CPU and `float16` for CUDA (with fallback to `int8/int8_float16`).
+
#### `WHISPER_VAD_FILTER`
- Type: `bool`
- Default: `False`
- Description: Specifies whether to apply a Voice Activity Detection (VAD) filter to Whisper Speech-to-Text.
-- Persistence: This environment variable is a `PersistentConfig` variable.
#### `WHISPER_MODEL_AUTO_UPDATE`
@@ -3376,6 +3655,12 @@ Note: If none of the specified languages are available and `en` was not in your
- Default: `None`
- Description: Specifies the ISO 639-1 language Whisper uses for STT (ISO 639-2 for Hawaiian and Cantonese). Whisper predicts the language by default.
+#### `WHISPER_MULTILINGUAL`
+
+- Type: `bool`
+- Default: `False`
+- Description: Toggles whether to use the multilingual Whisper model. When set to `False`, the system will use the English-only model for better performance in English-centric tasks. When `True`, it supports multiple languages.
+
### Speech-to-Text (OpenAI)
#### `AUDIO_STT_ENGINE`
@@ -3433,6 +3718,20 @@ Note: If none of the specified languages are available and `en` was not in your
- Description: Specifies the locales to use for Azure Speech-to-Text.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `AUDIO_STT_AZURE_BASE_URL`
+
+- Type: `str`
+- Default: `None`
+- Description: Specifies a custom Azure base URL for Speech-to-Text. Use this if you have a custom Azure endpoint.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
+#### `AUDIO_STT_AZURE_MAX_SPEAKERS`
+
+- Type: `int`
+- Default: `3`
+- Description: Sets the maximum number of speakers for Azure Speech-to-Text diarization.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
### Speech-to-Text (Deepgram)
#### `DEEPGRAM_API_KEY`
@@ -3442,6 +3741,38 @@ Note: If none of the specified languages are available and `en` was not in your
- Description: Specifies the Deepgram API key to use for Speech-to-Text.
- Persistence: This environment variable is a `PersistentConfig` variable.
+### Speech-to-Text (Mistral)
+
+#### `AUDIO_STT_MISTRAL_API_KEY`
+
+- Type: `str`
+- Default: `None`
+- Description: Specifies the Mistral API key to use for Speech-to-Text.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
+#### `AUDIO_STT_MISTRAL_API_BASE_URL`
+
+- Type: `str`
+- Default: `https://api.mistral.ai/v1`
+- Description: Specifies the Mistral API base URL to use for Speech-to-Text.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
+#### `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS`
+
+- Type: `bool`
+- Default: `False`
+- Description: When enabled, uses the chat completions endpoint for Mistral Speech-to-Text instead of the dedicated transcription endpoint.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
+### Speech-to-Text (General)
+
+#### `AUDIO_STT_SUPPORTED_CONTENT_TYPES`
+
+- Type: `str`
+- Default: `None`
+- Description: Comma-separated list of supported audio MIME types for Speech-to-Text (e.g., `audio/wav,audio/mpeg,video/*`). Leave empty to use defaults.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
### Text-to-Speech
#### `AUDIO_TTS_API_KEY`
@@ -3494,9 +3825,17 @@ Note: If none of the specified languages are available and `en` was not in your
#### `AUDIO_TTS_AZURE_SPEECH_OUTPUT_FORMAT`
- Type: `str`
+- Default: `audio-24khz-160kbitrate-mono-mp3`
- Description: Sets the output format for Azure Text to Speech.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `AUDIO_TTS_AZURE_SPEECH_BASE_URL`
+
+- Type: `str`
+- Default: `None`
+- Description: Specifies a custom Azure Speech base URL for Text-to-Speech. Use this if you have a custom Azure endpoint.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
### Voice Mode
#### `VOICE_MODE_PROMPT_TEMPLATE`
@@ -3521,6 +3860,14 @@ Note: If none of the specified languages are available and `en` was not in your
- Description: Sets the API key to use for text-to-speech.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `AUDIO_TTS_OPENAI_PARAMS`
+
+- Type: `str` (JSON)
+- Default: `{}`
+- Description: Additional parameters for OpenAI-compatible TTS API in JSON format. Allows customization of API-specific settings.
+- Example: `{"speed": 1.0}`
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
### Elevenlabs Text-to-Speech
#### `ELEVENLABS_API_BASE_URL`
@@ -3715,22 +4062,7 @@ Strictly return in JSON format:
:::tip
-One minimalistic working setup for Gemini can look like this:
-
-- Create Image
- - Model: `gemini-2.5-flash-image`
- - Image Size: `2816x1536`
- - Image Prompt Generation: on
- - Image Generation Engine: `Gemini`
- - Gemini Base URL: `https://generativelanguage.googleapis.com/v1beta`
- - Gemini API Key: Enter your API Key
- - Gemini Endpoint Method: `generateContent`
-- Edit Image
- - Image Edit Engine: `Gemini`
- - Model: `gemini-2.5-flash-image`
- - Image Size: (can be left empty)
- - Gemini Base URL: `https://generativelanguage.googleapis.com/v1beta`
- - Gemini API Key: Enter your API Key
+For a detailed setup guide and example configuration, please refer to the [Gemini Image Generation Guide](/features/image-generation-and-editing/gemini).
:::
@@ -4469,13 +4801,13 @@ This is useful when you need a JWT access token for downstream validation or whe
- Type: `str`
- Default: `None`
-- Description: Sets a single filter to use for LDAP search. Alternative to `LDAP_SEARCH_FILTERS`.
+- Description: Sets additional filter conditions for LDAP user search. This filter is **appended** to the automatically-generated username filter. Open WebUI automatically constructs the username portion of the filter using `LDAP_ATTRIBUTE_FOR_USERNAME`, so you should **not** include user placeholders like `%(user)s` or `%s` — these are not supported. Use this for additional conditions such as group membership restrictions (e.g., `(memberOf=cn=allowed-users,ou=groups,dc=example,dc=com)`). Alternative to `LDAP_SEARCH_FILTERS`.
- Persistence: This environment variable is a `PersistentConfig` variable.
#### `LDAP_SEARCH_FILTERS`
- Type: `str`
-- Description: Sets the filter to use for LDAP search.
+- Description: Sets additional filter conditions for LDAP user search. This is an alias for `LDAP_SEARCH_FILTER`. The filter is appended to the automatically-generated username filter — do **not** include user placeholders like `%(user)s` or `%s`.
- Persistence: This environment variable is a `PersistentConfig` variable.
#### `LDAP_USE_TLS`
@@ -4687,6 +5019,13 @@ This is useful when you need a JWT access token for downstream validation or whe
- Description: Enables or disables user permission to use code interpreter feature.
- Persistence: This environment variable is a `PersistentConfig` variable.
+#### `USER_PERMISSIONS_FEATURES_MEMORIES`
+
+- Type: `str`
+- Default: `True`
+- Description: Enables or disables user permission to use the [memory feature](/features/memory).
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
#### `USER_PERMISSIONS_FEATURES_FOLDERS`
- Type: `str`
@@ -4717,7 +5056,11 @@ This is useful when you need a JWT access token for downstream validation or whe
:::info
-For API Key creation (and the API keys themselves) to work, you not only need to give specific user groups the permission for it, but also enable it globally using `ENABLE_API_KEYS`
+For API Key creation (and the API keys themselves) to work, you need **both**:
+1. Grant the "API Keys" permission to users via this setting or User Groups
+2. Enable API keys globally using `ENABLE_API_KEYS`
+
+**Note:** Administrators are not exempt—they must also be granted the permission via a User Group to use API keys. See the [Authentication Setup for API Key](/getting-started/advanced-topics/monitoring#authentication-setup-for-api-key-) guide for detailed setup instructions.
:::
@@ -4785,6 +5128,16 @@ For API Key creation (and the API keys themselves) to work, you not only need to
- Default: `True`
- Description: Enables or disables public sharing of notes.
+### Settings Permissions
+
+#### `USER_PERMISSIONS_SETTINGS_INTERFACE`
+
+- Type: `bool`
+- Default: `True`
+- Description: Enables or disables user / group permissions for the interface settings section in the Settings Modal.
+- Persistence: This environment variable is a `PersistentConfig` variable.
+
+
## Misc Environment Variables
These variables are not specific to Open WebUI but can still be valuable in certain contexts.
@@ -4891,6 +5244,18 @@ If the endpoint is an S3-compatible provider like MinIO that uses a TLS certific
### OpenTelemetry Configuration
+:::warning Additional Dependencies May Be Required
+
+OpenTelemetry support requires additional Python dependencies that **may not be included by default** depending on your installation method (e.g., standard `pip install open-webui` versus Docker images).
+
+If you encounter `ImportError` or missing module errors related to OpenTelemetry, you may need to install them manually:
+
+```bash
+pip install opentelemetry-api opentelemetry-sdk opentelemetry-exporter-otlp
+```
+
+:::
+
#### `ENABLE_OTEL`
- Type: `bool`
@@ -5049,6 +5414,12 @@ For configuration using individual parameters or encrypted SQLite, see the relev
:::
+#### `ENABLE_DB_MIGRATIONS`
+
+- Type: `bool`
+- Default: `True`
+- Description: Controls whether database migrations are automatically run on startup. In multi-pod or multi-worker deployments, set this to `False` on all pods except one to designate a "master" pod responsible for migrations, preventing race conditions or schema corruption.
+
:::warning
**Required for Multi-Replica Setups**
@@ -5141,6 +5512,14 @@ Ensure the database password is kept secure, as it is needed to decrypt and acce
:::
+:::warning Migrating Existing Data to SQLCipher
+
+**Open WebUI does not support automatic migration from an unencrypted SQLite database to an encrypted SQLCipher database.** If you enable SQLCipher on an existing installation, the application will fail to read your existing unencrypted data.
+
+To use SQLCipher with existing data, you must either start fresh (with users exporting/re-importing chats), manually migrate the database using external SQLite/SQLCipher tools, use filesystem-level encryption (LUKS/BitLocker) instead, or switch to PostgreSQL.
+
+:::
+
#### `DATABASE_SCHEMA`
- Type: `str`
@@ -5469,16 +5848,19 @@ If you use UVICORN_WORKERS, you also need to ensure that related environment var
:::
-:::warning Database Migrations with Multiple Workers
-When `UVICORN_WORKERS > 1`, starting the application can trigger concurrent database migrations from multiple worker processes, potentially causing database schema corruption or inconsistent states.
+:::warning Database Migrations with Multiple Workers / Multi-Pod Deployments
+When `UVICORN_WORKERS > 1` or when running multiple replicas, starting the application can trigger concurrent database migrations from multiple processes, potentially causing database schema corruption or inconsistent states.
**Recommendation:**
-After pulling a new image or installing an update, **always run Open WebUI with a single worker (`UVICORN_WORKERS=1`) first**. This ensures the database migration completes successfully in a single process. Once the migration is finished and the application has started, you can then restart it with your desired number of workers.
+To handle migrations safely in multi-process/multi-pod environments, you can:
+1. **Designate a Master (Recommended):** Set `ENABLE_DB_MIGRATIONS=False` on all but one instance/worker. The instance with `ENABLE_DB_MIGRATIONS=True` (default) will handle the migration, while others will wait or skip it.
+2. **Scale Down:** Temporarily scale down to a single instance/worker to let migrations finish before scaling back up.
-**For Kubernetes, Helm, Minikube, and other orchestrated setups:**
-Ensure that your deployment strategy allows for a single-replica or single-worker init container/job to handle migrations before scaling up to multiple replicas or workers. This is critical to prevent race conditions during schema updates.
+**For Kubernetes, Helm, and Orchestrated Setups:**
+It is recommended to use the `ENABLE_DB_MIGRATIONS` variable to designate a specific pod for migrations, or use an init container/job to handle migrations before scaling up the main application pods. This ensures schema updates are applied exactly once.
:::
+
### Cache Settings
#### `CACHE_CONTROL`
diff --git a/docs/getting-started/quick-start/starting-with-llama-cpp.mdx b/docs/getting-started/quick-start/starting-with-llama-cpp.mdx
index aa5284a425..148a0fb2f2 100644
--- a/docs/getting-started/quick-start/starting-with-llama-cpp.mdx
+++ b/docs/getting-started/quick-start/starting-with-llama-cpp.mdx
@@ -104,6 +104,19 @@ To control and query your locally running model directly from Open WebUI:
💡 Once saved, Open WebUI will begin using your local Llama.cpp server as a backend!
+:::tip Connection Timeout Configuration
+
+If your Llama.cpp server is slow to initialize or you see timeout errors, you can increase the model list fetch timeout:
+
+```bash
+# Increase timeout for slower model loading (default is 10 seconds)
+AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST=30
+```
+
+If you've saved an unreachable URL and the UI becomes unresponsive, see the [Model List Loading Issues](/troubleshooting/connection-error#️-model-list-loading-issues-slow-ui--unreachable-endpoints) troubleshooting guide.
+
+:::
+

---
diff --git a/docs/getting-started/quick-start/starting-with-ollama.mdx b/docs/getting-started/quick-start/starting-with-ollama.mdx
index 63d280c5d4..debccd319f 100644
--- a/docs/getting-started/quick-start/starting-with-ollama.mdx
+++ b/docs/getting-started/quick-start/starting-with-ollama.mdx
@@ -9,6 +9,16 @@ Open WebUI makes it easy to connect and manage your **Ollama** instance. This gu
---
+## Protocol-Oriented Design
+
+Open WebUI is designed to be **Protocol-Oriented**. This means that when we refer to "Ollama", we are specifically referring to the **Ollama API Protocol** (typically running on port `11434`).
+
+While some tools may offer basic compatibility, this connection type is optimized for the unique features of the Ollama service, such as native model management and pulling directly through the Admin UI.
+
+If your backend is primarily based on the OpenAI standard (like LocalAI or Docker Model Runner), we recommend using the [OpenAI-Compatible Server Guide](/getting-started/quick-start/starting-with-openai-compatible) for the best experience.
+
+---
+
## Step 1: Setting Up the Ollama Connection
Once Open WebUI is installed and running, it will automatically attempt to connect to your Ollama instance. If everything goes smoothly, you’ll be ready to manage and use models right away.
@@ -36,6 +46,19 @@ To manage your Ollama instance in Open WebUI, follow these steps:
* **Prefix ID**: If you have multiple Ollama instances serving the same model names, use a prefix (e.g., `remote/`) to distinguish them.
* **Model IDs (Filter)**: Make specific models visible by listing them here (whitelist). Leave empty to show all.
+:::tip Connection Timeout Configuration
+
+When using multiple Ollama instances (especially across networks), connection delays can occur if an endpoint is unreachable. You can adjust the timeout using:
+
+```bash
+# Lower the timeout (default is 10 seconds) for faster failover
+AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST=3
+```
+
+If you've saved an unreachable URL and can't access Settings to fix it, see the [Model List Loading Issues](/troubleshooting/connection-error#️-model-list-loading-issues-slow-ui--unreachable-endpoints) troubleshooting guide.
+
+:::
+
Here’s what the management screen looks like:

@@ -54,6 +77,26 @@ This method is perfect if you want to skip navigating through the Admin Settings
---
+## Using Reasoning / Thinking Models
+
+If you're using reasoning models like **DeepSeek-R1** or **Qwen3** that output thinking/reasoning content in `...` tags, you'll need to configure Ollama with a **reasoning parser** for proper display.
+
+### Configure the Reasoning Parser
+
+Start Ollama with the `--reasoning-parser` flag:
+
+```bash
+ollama serve --reasoning-parser deepseek_r1
+```
+
+This ensures that thinking blocks are properly separated from the final answer and displayed in a collapsible section in Open WebUI.
+
+:::tip
+The `deepseek_r1` parser works for most reasoning models, including Qwen3. If you encounter issues, see our [Reasoning & Thinking Models Guide](/features/chat-features/reasoning-models) for alternative parsers and detailed troubleshooting steps.
+:::
+
+---
+
## All Set!
That’s it! Once your connection is configured and your models are downloaded, you’re ready to start using Ollama with Open WebUI. Whether you’re exploring new models or running your existing ones, Open WebUI makes everything simple and efficient.
diff --git a/docs/getting-started/quick-start/starting-with-openai-compatible.mdx b/docs/getting-started/quick-start/starting-with-openai-compatible.mdx
index 911ea36bc3..a4b4de7ca9 100644
--- a/docs/getting-started/quick-start/starting-with-openai-compatible.mdx
+++ b/docs/getting-started/quick-start/starting-with-openai-compatible.mdx
@@ -7,25 +7,36 @@ title: "Starting with OpenAI-Compatible Servers"
## Overview
-Open WebUI isn't just for OpenAI/Ollama/Llama.cpp—you can connect **any server that implements the OpenAI-compatible API**, running locally or remotely. This is perfect if you want to run different language models, or if you already have a favorite backend or ecosystem. This guide will show you how to:
+Open WebUI isn't just for OpenAI/Ollama/Llama.cpp—you can connect **any server that implements the OpenAI-compatible API**, running locally or remotely. This is perfect if you want to run different language models, or if you already have a favorite backend or ecosystem.
-- Set up an OpenAI-compatible server (with a few popular options)
-- Connect it to Open WebUI
-- Start chatting right away
+---
+
+## Protocol-Oriented Design
+
+Open WebUI is built around **Standard Protocols**. Instead of building specific modules for every individual AI provider (which leads to inconsistent behavior and configuration bloat), Open WebUI focuses on the **OpenAI Chat Completions Protocol**.
-## Step 1: Choose an OpenAI-Compatible Server
+This means that while Open WebUI handles the **interface and tools**, it expects your backend to follow the universal Chat Completions standard.
-There are many servers and tools that expose an OpenAI-compatible API. Here are some of the most popular:
+- **We Support Protocols**: Any provider that follows the OpenAI Chat Completions standard (like Groq, OpenRouter, or LiteLLM) is natively supported.
+- **We Avoid Proprietary APIs**: We do not implement provider-specific, non-standard APIs (such as OpenAI's stateful Responses API or Anthropic's native Messages API) to maintain a universal, maintainable codebase.
-- [Llama.cpp](https://github.com/ggml-org/llama.cpp): Extremely efficient, runs on CPU and GPU
-- [Ollama](https://ollama.com/): Super user-friendly and cross-platform
-- [LM Studio](https://lmstudio.ai/): Rich desktop app for Windows/Mac/Linux
-- [Lemonade](https://lemonade-server.ai/): Fast ONNX-based backend with NPU/iGPU acceleration
+If you are using a provider that requires a proprietary API, we recommend using a proxy tool like **LiteLLM** or **OpenRouter** to bridge them to the standard OpenAI protocol supported by Open WebUI.
-Pick whichever suits your workflow!
+### Popular Compatible Servers and Providers
+
+There are many servers and tools that expose an OpenAI-compatible API. Pick whichever suits your workflow:
+
+- **Local Runners**: [Ollama](https://ollama.com/), [Llama.cpp](https://github.com/ggml-org/llama.cpp), [LM Studio](https://lmstudio.ai/), [vLLM](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html), [LocalAI](https://localai.io/), [Lemonade](https://lemonade-server.ai/), [Docker Model Runner](https://docs.docker.com/ai/model-runner/).
+- **Cloud Providers**: [Groq](https://groq.com/), [Mistral AI](https://mistral.ai/), [Perplexity](https://www.perplexity.ai/), [MiniMax](https://platform.minimax.io/), [DeepSeek](https://platform.deepseek.com/), [OpenRouter](https://openrouter.ai/), [LiteLLM](https://docs.litellm.ai/).
+- **Major Model Ecosystems**:
+ - **Google Gemini**: [OpenAI Endpoint](https://generativelanguage.googleapis.com/v1beta/openai/) (requires a Gemini API key).
+ - **Anthropic**: While they primarily use a proprietary API, they offer a [Chat Completions compatible endpoint](https://platform.claude.com/docs/en/api/openai-sdk) for easier integration.
+ - **Azure OpenAI**: Enterprise-grade OpenAI hosting via Microsoft Azure.
---
+## Step 1: Connect Your Server to Open WebUI
+
#### 🍋 Get Started with Lemonade
Lemonade is a plug-and-play ONNX-based OpenAI-compatible server. Here’s how to try it on Windows:
@@ -57,6 +68,19 @@ See [their docs](https://lemonade-server.ai/) for details.
- **API Key**: Leave blank unless the server requires one.
6. Click **Save**.
+:::tip Connection Timeout Configuration
+
+If your local server is slow to start or you're connecting over a high-latency network, you can adjust the model list fetch timeout:
+
+```bash
+# Adjust timeout for slower connections (default is 10 seconds)
+AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST=5
+```
+
+If you've saved an unreachable URL and the UI becomes unresponsive, see the [Model List Loading Issues](/troubleshooting/connection-error#️-model-list-loading-issues-slow-ui--unreachable-endpoints) troubleshooting guide for recovery options.
+
+:::
+
:::tip
If running Open WebUI in Docker and your model server on your host machine, use `http://host.docker.internal:/v1`.
diff --git a/docs/getting-started/quick-start/starting-with-openai.mdx b/docs/getting-started/quick-start/starting-with-openai.mdx
index dada9c5943..7317380e1d 100644
--- a/docs/getting-started/quick-start/starting-with-openai.mdx
+++ b/docs/getting-started/quick-start/starting-with-openai.mdx
@@ -11,22 +11,28 @@ Open WebUI makes it easy to connect and use OpenAI and other OpenAI-compatible A
---
-## Step 1: Get Your OpenAI API Key
+## Important: Protocols, Not Providers
-To use OpenAI models (such as GPT-4 or o3-mini), you need an API key from a supported provider.
+Open WebUI is a **protocol-centric** platform. While we provide first-class support for OpenAI models, we do so exclusively through the **OpenAI Chat Completions API protocol**.
-You can use:
+We do **not** support proprietary, non-standard APIs such as OpenAI’s new stateful **Responses API**. Instead, Open WebUI focuses on universal standards that are shared across dozens of providers. This approach keeps Open WebUI fast, stable, and truly open-sourced.
+
+---
-- OpenAI directly (https://platform.openai.com/account/api-keys)
-- Azure OpenAI
-- Any OpenAI-compatible service (e.g., LocalAI, FastChat, Helicone, LiteLLM, OpenRouter etc.)
+## Step 1: Get Your OpenAI API Key
-👉 Once you have the key, copy it and keep it handy.
+To use OpenAI models (such as GPT-4 or o3-mini), you need an API key from a supported provider.
-For most OpenAI usage, the default API base URL is:
-https://api.openai.com/v1
+You can use:
-Other providers use different URLs — check your provider’s documentation.
+- **OpenAI** directly (https://platform.openai.com/account/api-keys)
+- **Azure OpenAI**
+- **Anthropic** (via their [OpenAI-compatible endpoint](https://platform.claude.com/docs/en/api/openai-sdk))
+- **Google Gemini** (via their [OpenAI-compatible endpoint](https://generativelanguage.googleapis.com/v1beta/openai/))
+- **DeepSeek** (https://platform.deepseek.com/)
+- **MiniMax** (https://platform.minimax.io/)
+- **Proxies & Aggregators**: OpenRouter, LiteLLM, Helicone.
+- **Local Servers**: Ollama, Llama.cpp, LM Studio, vLLM, LocalAI.
---
@@ -44,7 +50,7 @@ Once Open WebUI is running:
- Use this for **OpenAI**, **DeepSeek**, **OpenRouter**, **LocalAI**, **FastChat**, **Helicone**, **LiteLLM**, etc.
+ Use this for **OpenAI**, **DeepSeek**, **MiniMax**, **OpenRouter**, **LocalAI**, **FastChat**, **Helicone**, **LiteLLM**, etc.
* **Connection Type**: External
* **URL**: `https://api.openai.com/v1` (or your provider's endpoint)
@@ -71,7 +77,13 @@ Once Open WebUI is running:
* *Set*: Acts as an **Allowlist**. Only the specific model IDs you enter here will be visible to users. Use this to hide older or expensive models.
:::tip OpenRouter Recommendation
- When using **OpenRouter**, we **highly recommend** using this allowlist (adding specific Model IDs). OpenRouter exposes thousands of models, which can clutter your model selector and slow down the admin panel if not filtered.
+ When using **OpenRouter**, we **highly recommend**:
+ 1. **Use an allowlist** (add specific Model IDs). OpenRouter exposes thousands of models, which can clutter your model selector and slow down the admin panel if not filtered.
+ 2. **Enable Model Caching** (`Settings > Connections > Cache Base Model List` or `ENABLE_BASE_MODELS_CACHE=True`). Without caching, page loads can take 10-15+ seconds on first visit due to querying a large number of models. See the [Performance Guide](/tutorials/tips/performance) for more details.
+ :::
+
+ :::caution MiniMax Whitelisting
+ Some providers, like **MiniMax**, do not expose their models via a `/models` endpoint. For these providers, you **must** manually add the Model ID (e.g., `MiniMax-M2.1`) to the **Model IDs (Filter)** list for them to appear in the UI.
:::
* **Prefix ID**:
@@ -81,6 +93,19 @@ Once Open WebUI is running:
This securely stores your credentials.
+:::tip Connection Timeout Configuration
+
+If your API provider is slow to respond or you're experiencing timeout issues, you can adjust the model list fetch timeout:
+
+```bash
+# Increase timeout for slow networks (default is 10 seconds)
+AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST=15
+```
+
+If you've saved an unreachable URL and the UI becomes unresponsive, see the [Model List Loading Issues](/troubleshooting/connection-error#️-model-list-loading-issues-slow-ui--unreachable-endpoints) troubleshooting guide for recovery options.
+
+:::
+

---
diff --git a/docs/getting-started/quick-start/starting-with-vllm.mdx b/docs/getting-started/quick-start/starting-with-vllm.mdx
index 53aa96946a..90a4f1b141 100644
--- a/docs/getting-started/quick-start/starting-with-vllm.mdx
+++ b/docs/getting-started/quick-start/starting-with-vllm.mdx
@@ -38,3 +38,16 @@ For remote servers, use the appropriate hostname or IP address.
## Step 3: Start Using Models
Select any model that's available on your vLLM server from the Model Selector and start chatting.
+
+:::tip Connection Timeout Configuration
+
+If your vLLM server is slow to respond (especially during model loading), you can adjust the timeout:
+
+```bash
+# Increase timeout for slower model initialization (default is 10 seconds)
+AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST=30
+```
+
+If you've saved an unreachable URL and the UI becomes unresponsive, see the [Model List Loading Issues](/troubleshooting/connection-error#️-model-list-loading-issues-slow-ui--unreachable-endpoints) troubleshooting guide.
+
+:::
diff --git a/docs/getting-started/quick-start/tab-docker/ManualDocker.md b/docs/getting-started/quick-start/tab-docker/ManualDocker.md
index 26370a85cf..6d5c8159a9 100644
--- a/docs/getting-started/quick-start/tab-docker/ManualDocker.md
+++ b/docs/getting-started/quick-start/tab-docker/ManualDocker.md
@@ -23,7 +23,7 @@ docker pull ghcr.io/open-webui/open-webui:main-slim
You can also pull a specific Open WebUI release version directly by using a versioned image tag. This is recommended for production environments to ensure stable and reproducible deployments.
```bash
-docker pull ghcr.io/open-webui/open-webui:v0.6.42
+docker pull ghcr.io/open-webui/open-webui:v0.7.0
```
## Step 2: Run the Container
diff --git a/docs/getting-started/updating.mdx b/docs/getting-started/updating.mdx
index 8e375be1eb..c87a24b085 100644
--- a/docs/getting-started/updating.mdx
+++ b/docs/getting-started/updating.mdx
@@ -14,7 +14,10 @@ Keeping Open WebUI updated ensures you have the latest features, security patche
- **Backup your data** before major version updates
- **Check release notes** at https://github.com/open-webui/open-webui/releases for breaking changes
- **Clear browser cache** after updating to ensure the latest web interface loads
-- **Running Multiple Workers?** If you use `UVICORN_WORKERS > 1`, you **MUST** run the updated container with `UVICORN_WORKERS=1` first to perform database migrations safely. Once started successfully, you can restart with multiple workers.
+- **Running Multiple Workers?** If you use `UVICORN_WORKERS > 1`, you **MUST** ensure migrations run safely by either:
+ 1. Running the updated container with `UVICORN_WORKERS=1` first.
+ 2. Designating a master worker using `ENABLE_DB_MIGRATIONS=True` (default) on one instance and `False` on others.
+ Once migrations complete, you can run with multiple workers normally.
:::
:::
diff --git a/docs/intro.mdx b/docs/intro.mdx
index e480508dfc..c9b7023245 100644
--- a/docs/intro.mdx
+++ b/docs/intro.mdx
@@ -11,7 +11,7 @@ import { SponsorList } from "@site/src/components/SponsorList";
# Open WebUI
-**Open WebUI is an [extensible](https://docs.openwebui.com/features/plugin/), feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline.** It supports various LLM runners like **Ollama** and **OpenAI-compatible APIs**, with **built-in inference engine** for RAG, making it a **powerful AI deployment solution**.
+**Open WebUI is an [extensible](https://docs.openwebui.com/features/plugin/), feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline.** It is built around universal standards, supporting **Ollama** and **OpenAI-compatible Protocols** (specifically Chat Completions). This protocol-first approach makes it a powerful, provider-agnostic AI deployment solution for both local and cloud-based models.
[](https://openwebui.com)
@@ -99,9 +99,9 @@ ghcr.io/open-webui/open-webui:-
Examples (pinned versions for illustration purposes only):
```
-ghcr.io/open-webui/open-webui:v0.6.42
-ghcr.io/open-webui/open-webui:v0.6.42-ollama
-ghcr.io/open-webui/open-webui:v0.6.42-cuda
+ghcr.io/open-webui/open-webui:v0.7.0
+ghcr.io/open-webui/open-webui:v0.7.0-ollama
+ghcr.io/open-webui/open-webui:v0.7.0-cuda
```
### Using the Dev Branch 🌙
diff --git a/docs/troubleshooting/audio.mdx b/docs/troubleshooting/audio.mdx
new file mode 100644
index 0000000000..c5aa07565c
--- /dev/null
+++ b/docs/troubleshooting/audio.mdx
@@ -0,0 +1,441 @@
+---
+sidebar_position: 3
+title: "Audio Troubleshooting"
+---
+
+import { TopBanners } from "@site/src/components/TopBanners";
+
+
+
+# Audio Troubleshooting Guide
+
+This page covers common issues with Speech-to-Text (STT) and Text-to-Speech (TTS) functionality in Open WebUI, along with their solutions.
+
+## Where to Find Audio Settings
+
+### Admin Settings (Server-Wide)
+
+Admins can configure server-wide audio defaults:
+
+1. Click your **profile icon** (bottom-left corner)
+2. Select **Admin Panel**
+3. Click **Settings** in the top navigation
+4. Select the **Audio** tab
+
+Here you can configure:
+- **Speech-to-Text Engine** — Choose between local Whisper, OpenAI, Azure, Deepgram, or Mistral
+- **Whisper Model** — Select model size for local STT (tiny, base, small, medium, large)
+- **Text-to-Speech Engine** — Choose between OpenAI-compatible, ElevenLabs, Azure, or local Transformers
+- **TTS Voice** — Select the default voice
+- **API Keys and Base URLs** — Configure external service connections
+
+### User Settings (Per-User)
+
+Individual users can customize their audio experience:
+
+1. Click your **profile icon** (bottom-left corner)
+2. Select **Settings**
+3. Click the **Audio** tab
+
+User-level options include:
+- **STT Engine Override** — Use "Web API" for browser-based speech recognition
+- **STT Language** — Set preferred language for transcription
+- **TTS Engine** — Choose "Browser Kokoro" for local in-browser TTS
+- **TTS Voice** — Select from available voices
+- **Auto-playback** — Automatically play AI responses
+- **Playback Speed** — Adjust audio speed
+- **Conversation Mode** — Enable hands-free voice interaction
+
+:::tip
+User settings override admin defaults. If you're having issues, check both locations to ensure settings aren't conflicting.
+:::
+
+## Quick Setup Guide
+
+### Fastest Setup: OpenAI (Paid)
+
+If you have an OpenAI API key, this is the simplest setup:
+
+**In Admin Panel → Settings → Audio:**
+- **STT Engine:** `OpenAI` | **Model:** `whisper-1`
+- **TTS Engine:** `OpenAI` | **Model:** `tts-1` | **Voice:** `alloy`
+- Enter your OpenAI API key in both sections
+
+Or via environment variables:
+```yaml
+environment:
+ - AUDIO_STT_ENGINE=openai
+ - AUDIO_STT_OPENAI_API_KEY=sk-...
+ - AUDIO_TTS_ENGINE=openai
+ - AUDIO_TTS_OPENAI_API_KEY=sk-...
+ - AUDIO_TTS_MODEL=tts-1
+ - AUDIO_TTS_VOICE=alloy
+```
+
+→ See full guides: [Speech-to-Text](/category/speech-to-text) | [Text-to-Speech](/category/text-to-speech)
+
+### Free Setup: Local Whisper + Edge TTS
+
+For a completely free setup:
+
+**STT:** Leave engine empty (uses built-in Whisper)
+```yaml
+environment:
+ - WHISPER_MODEL=base # Options: tiny, base, small, medium, large
+```
+
+**TTS:** Use OpenAI Edge TTS (free Microsoft voices)
+```yaml
+services:
+ openai-edge-tts:
+ image: travisvn/openai-edge-tts:latest
+ ports:
+ - "5050:5050"
+
+ open-webui:
+ environment:
+ - AUDIO_TTS_ENGINE=openai
+ - AUDIO_TTS_OPENAI_API_BASE_URL=http://openai-edge-tts:5050/v1
+ - AUDIO_TTS_OPENAI_API_KEY=not-needed
+```
+
+→ See full guide: [OpenAI Edge TTS](/features/audio/text-to-speech/openai-edge-tts-integration)
+
+### Browser-Only Setup (No Config Needed)
+
+For basic functionality without any server-side setup:
+
+**In User Settings → Audio:**
+- **STT Engine:** `Web API` (uses browser's built-in speech recognition)
+- **TTS Engine:** `Web API` (uses browser's built-in text-to-speech)
+
+:::note
+Browser-based audio has limited accuracy and voice options compared to server-side solutions.
+:::
+
+## Microphone Access Issues
+
+### Understanding Secure Contexts 🔒
+
+For security reasons, accessing the microphone is restricted to pages served over HTTPS or locally from `localhost`. This requirement is meant to safeguard your data by ensuring it is transmitted over secure channels.
+
+### Common Permission Issues 🚫
+
+Browsers like Chrome, Brave, Microsoft Edge, Opera, and Vivaldi, as well as Firefox, restrict microphone access on non-HTTPS URLs. This typically becomes an issue when accessing a site from another device within the same network (e.g., using a mobile phone to access a desktop server).
+
+### Solutions for Non-HTTPS Connections
+
+1. **Set Up HTTPS (Recommended):**
+ - Configure your server to support HTTPS. This not only resolves permission issues but also enhances the security of your data transmissions.
+ - You can use a reverse proxy like Nginx or Caddy with Let's Encrypt certificates.
+
+2. **Temporary Browser Flags (Use with caution):**
+ - These settings force your browser to treat certain insecure URLs as secure. This is useful for development purposes but poses significant security risks.
+
+ **Chromium-based Browsers (e.g., Chrome, Brave):**
+ - Open `chrome://flags/#unsafely-treat-insecure-origin-as-secure`
+ - Enter your non-HTTPS address (e.g., `http://192.168.1.35:3000`)
+ - Restart the browser to apply the changes
+
+ **Firefox-based Browsers:**
+ - Open `about:config`
+ - Search and modify (or create) the string value `dom.securecontext.allowlist`
+ - Add your IP addresses separated by commas (e.g., `http://127.0.0.1:8080`)
+
+:::warning
+While browser flags offer a quick fix, they bypass important security checks which can expose your device and data to vulnerabilities. Always prioritize proper security measures, especially when planning for a production environment.
+:::
+
+### Microphone Not Working
+
+If the microphone icon doesn't respond even on HTTPS:
+
+1. **Check browser permissions:** Ensure your browser has microphone access for the site
+2. **Check system permissions:** On Windows/Mac, ensure the browser has microphone access in system settings
+3. **Check browser compatibility:** Some browsers have limited STT support
+4. **Try a different browser:** Chrome typically has the best support for web audio APIs
+
+---
+
+## Text-to-Speech (TTS) Issues
+
+### TTS Loading Forever / Not Working
+
+If clicking the play button on chat responses causes endless loading, try the following solutions:
+
+#### 1. Hugging Face Dataset Library Conflict (Local Transformers TTS)
+
+**Symptoms:**
+- TTS keeps loading forever
+- Container logs show: `RuntimeError: Dataset scripts are no longer supported, but found cmu-arctic-xvectors.py`
+
+**Cause:** This occurs when using local Transformers TTS (`AUDIO_TTS_ENGINE=transformers`). The `datasets` library is pulled in as an indirect dependency of the `transformers` package and isn't pinned to a specific version in Open WebUI's requirements. Newer versions of `datasets` removed support for dataset loading scripts, causing this error when loading speaker embeddings.
+
+**Solutions:**
+
+**Temporary fix** (re-applies after container restart):
+```bash
+docker exec open-webui bash -lc "pip install datasets==3.6.0" && docker restart open-webui
+```
+
+**Permanent fix using environment variable:**
+Add this to your `docker-compose.yml`:
+```yaml
+environment:
+ - EXTRA_PIP_PACKAGES=datasets==3.6.0
+```
+
+**Verify the installed version:**
+```bash
+docker exec open-webui bash -lc "pip show datasets"
+```
+
+:::tip
+Consider using an external TTS service like [OpenAI Edge TTS](/features/audio/text-to-speech/openai-edge-tts-integration) or [Kokoro](/features/audio/text-to-speech/Kokoro-FastAPI-integration) instead of local Transformers TTS to avoid these dependency conflicts.
+:::
+
+#### 2. Using External TTS Instead of Local
+
+If you continue to have issues with local TTS, configuring an external TTS service is often more reliable. See the example Docker Compose configuration below that uses `openai-edge-tts`:
+
+```yaml
+services:
+ open-webui:
+ image: ghcr.io/open-webui/open-webui:main
+ environment:
+ - AUDIO_TTS_ENGINE=openai
+ - AUDIO_TTS_OPENAI_API_KEY=your-api-key-here
+ - AUDIO_TTS_OPENAI_API_BASE_URL=http://openai-edge-tts:5050/v1
+ depends_on:
+ - openai-edge-tts
+ # ... other configuration
+
+ openai-edge-tts:
+ image: travisvn/openai-edge-tts:latest
+ ports:
+ - "5050:5050"
+ environment:
+ - API_KEY=your-api-key-here
+ restart: unless-stopped
+```
+
+### TTS Voice Not Found / No Audio Output
+
+**Checklist:**
+1. Verify the TTS engine is correctly configured in **Admin Panel → Settings → Audio**
+2. Check that the voice name matches an available voice for your chosen engine
+3. For external TTS services, verify the API Base URL is accessible from the Open WebUI container
+4. Check container logs for any error messages
+
+### Docker Networking Issues with TTS
+
+If Open WebUI can't reach your TTS service:
+
+**Problem:** Using `localhost` in the API Base URL doesn't work from within Docker.
+
+**Solutions:**
+- Use `host.docker.internal` instead of `localhost` (works on Docker Desktop for Windows/Mac)
+- Use the container name if both services are on the same Docker network (e.g., `http://openai-edge-tts:5050/v1`)
+- Use the host machine's IP address
+
+---
+
+## Speech-to-Text (STT) Issues
+
+### Whisper STT Not Working / Compute Type Error
+
+**Symptoms:**
+- Error message: `Error transcribing chunk: Requested int8 compute type, but the target device or backend do not support efficient int8 computation`
+- STT fails to process audio, often showing a persistent loading spinner or a red error toast.
+
+**Cause:** This typically occurs when using the `:cuda` Docker image with an NVIDIA GPU that doesn't support the required `int8` compute operations (common on older Maxwell or Pascal architecture GPUs). In version **v0.6.43**, a regression caused the compute type to be incorrectly defaulted or hardcoded to `int8` in some scenarios.
+
+**Solutions:**
+
+#### 1. Upgrade to the Latest Version (Recommended)
+The most reliable fix is to upgrade to the latest version of Open WebUI. Recent updates ensure that `WHISPER_COMPUTE_TYPE` is correctly respected and provides optimized defaults for CUDA environments.
+
+#### 2. Manually Set Compute Type
+If you are on an affected version or still experiencing issues on GPU, explicitly set the compute type to `float16`:
+
+```yaml
+environment:
+ - WHISPER_COMPUTE_TYPE=float16
+```
+
+#### 3. Switch to the Standard Image
+If your GPU is very old or compatibility persists, switch to the standard (CPU-based) image. For smaller models like Whisper, CPU mode often provides comparable performance without compatibility issues:
+
+```bash
+# Instead of:
+# ghcr.io/open-webui/open-webui:cuda
+
+# Use:
+ghcr.io/open-webui/open-webui:main
+```
+
+:::info
+The CUDA image primarily accelerates RAG embedding/reranking models and Whisper STT. For smaller models like Whisper, CPU mode often provides comparable performance without the compatibility issues.
+:::
+
+#### Adjust Whisper Compute Type
+
+If you want to keep GPU acceleration, try changing the compute type:
+
+```yaml
+environment:
+ - WHISPER_COMPUTE_TYPE=float16 # Recommended for GPU
+```
+
+**Available compute types (from faster-whisper):**
+
+| Compute Type | Best For | Notes |
+|--------------|----------|-------|
+| `int8` | **CPU (default)** | Fastest, but doesn't work on older GPUs |
+| `float16` | **CUDA/GPU (recommended)** | Best balance of speed and compatibility for GPUs |
+| `int8_float16` | GPU with hybrid precision | Uses int8 for weights, float16 for computation |
+| `float32` | Maximum compatibility | Slowest, but works on all hardware |
+
+:::info Default Behavior
+- **CPU mode:** Defaults to `int8` for best performance
+- **CUDA mode:** The `:cuda` image may default to `int8`, which can cause errors on older GPUs. Set `float16` explicitly for GPUs.
+:::
+
+### STT Not Recognizing Speech Correctly
+
+**Tips for better recognition:**
+
+1. **Set the correct language:**
+ ```yaml
+ environment:
+ - WHISPER_LANGUAGE=en # Use ISO 639-1 language code
+ ```
+
+2. **Try a larger Whisper model** for better accuracy (at the cost of speed):
+ ```yaml
+ environment:
+ - WHISPER_MODEL=medium # Options: tiny, base, small, medium, large
+ ```
+
+3. **Check microphone permissions** in your browser (see above)
+
+4. **Use the Web API engine** as an alternative:
+ - Go to user settings (not admin panel)
+ - Under STT Settings, try switching Speech-to-Text Engine to "Web API"
+ - This uses the browser's built-in speech recognition
+
+---
+
+## ElevenLabs Integration
+
+ElevenLabs is natively supported in Open WebUI. To configure:
+
+1. Go to **Admin Panel → Settings → Audio**
+2. Select **ElevenLabs** as the TTS engine
+3. Enter your ElevenLabs API key
+4. Select the voice and model
+5. Save settings
+
+**Using environment variables:**
+
+```yaml
+environment:
+ - AUDIO_TTS_ENGINE=elevenlabs
+ - AUDIO_TTS_API_KEY=sk_... # Your ElevenLabs API key
+ - AUDIO_TTS_VOICE=EXAVITQu4vr4xnSDxMaL # Voice ID from ElevenLabs dashboard
+ - AUDIO_TTS_MODEL=eleven_multilingual_v2
+```
+
+:::note
+You can find your Voice ID in the ElevenLabs dashboard under the voice settings. Common model options are `eleven_multilingual_v2` or `eleven_monolingual_v1`.
+:::
+
+---
+
+## General Debugging Tips
+
+### Check Container Logs
+
+```bash
+# View Open WebUI logs
+docker logs open-webui -f
+
+# View logs for external TTS service (if applicable)
+docker logs openai-edge-tts -f
+```
+
+### Check Browser Console
+
+1. Open browser developer tools (F12 or right-click → Inspect)
+2. Go to the Console tab
+3. Look for error messages when attempting to use audio features
+
+### Verify Service Health
+
+For external TTS services, test directly:
+
+```bash
+# Test OpenAI Edge TTS
+curl -X POST http://localhost:5050/v1/audio/speech \
+ -H "Content-Type: application/json" \
+ -H "Authorization: Bearer your_api_key_here" \
+ -d '{"input": "Hello, this is a test.", "voice": "alloy"}' \
+ --output test.mp3
+```
+
+### Network Connectivity
+
+Verify the Open WebUI container can reach external services:
+
+```bash
+# Enter the container
+docker exec -it open-webui bash
+
+# Test connectivity (if curl is available)
+curl http://your-tts-service:port/health
+```
+
+---
+
+## Quick Reference: Environment Variables
+
+### TTS Environment Variables
+
+| Variable | Description |
+|----------|-------------|
+| `AUDIO_TTS_ENGINE` | TTS engine: empty (disabled), `openai`, `elevenlabs`, `azure`, `transformers` |
+| `AUDIO_TTS_MODEL` | TTS model to use (default: `tts-1`) |
+| `AUDIO_TTS_VOICE` | Default voice for TTS (default: `alloy`) |
+| `AUDIO_TTS_API_KEY` | API key for ElevenLabs or Azure TTS |
+| `AUDIO_TTS_OPENAI_API_BASE_URL` | Base URL for OpenAI-compatible TTS |
+| `AUDIO_TTS_OPENAI_API_KEY` | API key for OpenAI-compatible TTS |
+
+### STT Environment Variables
+
+| Variable | Description |
+|----------|-------------|
+| `WHISPER_MODEL` | Whisper model: `tiny`, `base`, `small`, `medium`, `large` (default: `base`) |
+| `WHISPER_COMPUTE_TYPE` | Compute type: `int8`, `float16`, `int8_float16`, `float32` (default: `int8`) |
+| `WHISPER_LANGUAGE` | ISO 639-1 language code (empty = auto-detect) |
+| `WHISPER_VAD_FILTER` | Enable Voice Activity Detection filter (default: `False`) |
+| `AUDIO_STT_ENGINE` | STT engine: empty (local Whisper), `openai`, `azure`, `deepgram` |
+| `AUDIO_STT_OPENAI_API_BASE_URL` | Base URL for OpenAI-compatible STT |
+| `AUDIO_STT_OPENAI_API_KEY` | API key for OpenAI-compatible STT |
+| `DEEPGRAM_API_KEY` | Deepgram API key |
+
+For a complete list of audio environment variables, see [Environment Variable Configuration](/getting-started/env-configuration#audio).
+
+---
+
+## Still Having Issues?
+
+If you've tried the above solutions and still experience problems:
+
+1. **Search existing issues** on GitHub for similar problems
+2. **Check the discussions** for community solutions
+3. **Create a new issue** with:
+ - Open WebUI version
+ - Docker image being used
+ - Complete error logs
+ - Very detailed steps to reproduce
+ - Your environment details (OS, GPU if applicable)
diff --git a/docs/troubleshooting/connection-error.mdx b/docs/troubleshooting/connection-error.mdx
index 4676f30b48..3d7928fbcb 100644
--- a/docs/troubleshooting/connection-error.mdx
+++ b/docs/troubleshooting/connection-error.mdx
@@ -77,7 +77,10 @@ WebSocket support is required for Open WebUI v0.5.0 and later. If WebSockets are
1. **Check your reverse proxy configuration** - Ensure `Upgrade` and `Connection` headers are properly set
2. **Verify CORS settings** - WebSocket connections respect CORS policies
3. **Check browser console** - Look for WebSocket connection errors
-4. **Test direct connection** - Try connecting directly to Open WebUI without the proxy to isolate the issue
+4. **Test direct connection** - Try connecting directly to Open WebUI without the proxy to isolate the issue.
+5. **Check for HTTP/2 WebSocket Issues** - Some proxies (like HAProxy 3.x) enable HTTP/2 by default. If your proxy handles client connections via HTTP/2 but the backend/application doesn't support RFC 8441 (WebSockets over H2) properly, the instance may "freeze" or stop responding.
+ - **Fix for HAProxy**: Add `option h2-workaround-bogus-websocket-clients` to your configuration or force the backend connection to use HTTP/1.1.
+ - **Fix for Nginx**: Ensure you are using `proxy_http_version 1.1;` in your location block (which is the default in many Open WebUI examples).
For multi-instance deployments, configure Redis for WebSocket management:
```bash
@@ -141,6 +144,81 @@ docker run -d --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=
```
🔗 After running the above, your WebUI should be available at `http://localhost:8080`.
+## ⏱️ Model List Loading Issues (Slow UI / Unreachable Endpoints)
+
+If your Open WebUI takes a long time to load models, or the model selector spins indefinitely, it may be due to an unreachable or slow API endpoint configured in your connections.
+
+### Common Symptoms
+
+- Model selector shows a loading spinner for extended periods
+- `500 Internal Server Error` on `/api/models` endpoint
+- UI becomes unresponsive when opening Settings
+- Docker/server logs show: `Connection error: Cannot connect to host...`
+
+### Cause: Unreachable Endpoints
+
+When you configure multiple Ollama or OpenAI base URLs (for load balancing or redundancy), Open WebUI attempts to fetch models from **all** configured endpoints. If any endpoint is unreachable, the system waits for the full connection timeout before returning results.
+
+By default, Open WebUI waits **10 seconds** per unreachable endpoint when fetching the model list. With multiple bad endpoints, this delay compounds.
+
+### Solution 1: Adjust the Timeout
+
+Lower the timeout for model list fetching using the `AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST` environment variable:
+
+```bash
+# Set a shorter timeout (in seconds) for faster failure on unreachable endpoints
+AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST=3
+```
+
+This reduces how long Open WebUI waits for each endpoint before giving up and continuing.
+
+### Solution 2: Fix or Remove Unreachable Endpoints
+
+1. Go to **Admin Settings → Connections**
+2. Review your Ollama and OpenAI base URLs
+3. Remove or correct any unreachable IP addresses or hostnames
+4. Save the configuration
+
+### Solution 3: Recover from Database-Persisted Bad Configuration
+
+If you saved an unreachable URL and now can't access the Settings UI to fix it, the bad configuration is persisted in the database and takes precedence over environment variables. Use one of these recovery methods:
+
+**Option A: Reset configuration on startup**
+```bash
+# Forces environment variables to override database values on next startup
+RESET_CONFIG_ON_START=true
+```
+
+**Option B: Always use environment variables**
+```bash
+# Prevents database values from taking precedence (changes in UI won't persist across restarts)
+ENABLE_PERSISTENT_CONFIG=false
+```
+
+**Option C: Manual database cleanup (advanced)**
+
+If using SQLite, stop the container and run:
+```bash
+sqlite3 webui.db "DELETE FROM config WHERE id LIKE '%urls%';"
+```
+
+:::warning
+
+Manual database manipulation should be a last resort. Always back up your database first.
+
+:::
+
+### Related Environment Variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST` | `10` | Timeout (seconds) for fetching model lists |
+| `AIOHTTP_CLIENT_TIMEOUT` | `300` | General API request timeout |
+| `RESET_CONFIG_ON_START` | `false` | Reset database config to env var values on startup |
+| `ENABLE_PERSISTENT_CONFIG` | `true` | Whether database config takes precedence over env vars |
+
+See the [Environment Configuration](/getting-started/env-configuration#aiohttp_client_timeout_model_list) documentation for more details.
+
## 🔒 SSL Connection Issue with Hugging Face
Encountered an SSL error? It could be an issue with the Hugging Face server. Here's what to do:
@@ -156,6 +234,35 @@ Encountered an SSL error? It could be an issue with the Hugging Face server. Her
docker run -d -p 3000:8080 -e HF_ENDPOINT=https://hf-mirror.com/ --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
```
+## 🔐 SSL Certificate Issues with Internal Tools
+
+If you are using external tools like Tika, Ollama (for embeddings), or an external reranker with self-signed certificates, you might encounter SSL verification errors.
+
+### Common Symptoms
+
+- Logs show `[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self-signed certificate`
+- Tika document ingestion fails
+- Embedding generation fails with SSL errors
+- Reranking fails with SSL errors
+
+### Solution
+
+You can disable SSL verification for these internal tool connections using the following environment variables:
+
+1. **For synchronous requests (Tika, External Reranker):**
+ ```bash
+ REQUESTS_VERIFY=false
+ ```
+
+2. **For asynchronous requests (Ollama Embeddings):**
+ ```bash
+ AIOHTTP_CLIENT_SESSION_SSL=false
+ ```
+
+:::warning
+Disabling SSL verification reduces security. Only do this if you trust the network and the services you are connecting to (e.g., functioning within a secure internal network).
+:::
+
## 🍏 Podman on MacOS
Running on MacOS with Podman? Here’s how to ensure connectivity:
@@ -171,3 +278,59 @@ Running on MacOS with Podman? Here’s how to ensure connectivity:
podman run -d -p 3000:8080 -e OLLAMA_BASE_URL=http://host.containers.internal:11434 -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
```
+
+## 🛠️ MCP Tool Connections
+
+If you are having trouble connecting to MCP tools (e.g. "Failed to connect to MCP server"):
+* **Authentication**: Ensure you aren't using "Bearer" without a token.
+* **Filters**: Try adding a comma to the Function Name Filter List.
+
+See the [MCP Feature Documentation](/features/mcp#troubleshooting) for detailed troubleshooting steps.
+
+## 🔐 SSL/TLS Errors with Web Search
+
+If you are encountering SSL errors while using the Web Search feature, they usually fall into two categories: Proxy configuration issues or Certificate verification issues.
+
+### Certificate Verification Issues
+
+If you are seeing SSL verification errors when Open WebUI tries to fetch content from websites (Web Loader):
+
+- **Symptom**: `[SSL: CERTIFICATE_VERIFY_FAILED]` when loading search results.
+- **Solution**: You can disable SSL verification for the Web Loader (scraper) specifically.
+ ```bash
+ ENABLE_WEB_LOADER_SSL_VERIFICATION=false
+ ```
+ > **Note**: This setting applies to the *fetching* of web pages. If you are having SSL issues with the Search Engine itself (e.g., local SearXNG) or subsequent steps (Embedding/Reranking), see the sections below.
+
+### Proxy Configuration Issues
+
+If you're seeing SSL errors like `UNEXPECTED_EOF_WHILE_READING` or `Max retries exceeded` when using web search providers (Bocha, Tavily, etc.):
+
+#### Common Symptoms
+
+- `SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING]'))`
+- `Max retries exceeded with url: /v1/web-search`
+- Web search works in standalone Python scripts but fails in Open WebUI
+
+#### Cause: HTTP Proxy Configured for HTTPS Traffic
+
+This typically happens when you have an **HTTP proxy** configured for **HTTPS traffic**. The HTTP proxy cannot properly handle TLS connections, causing SSL handshake failures.
+
+Check your environment for these variables:
+- `HTTP_PROXY` / `http_proxy`
+- `HTTPS_PROXY` / `https_proxy`
+
+If your `https_proxy` points to `http://...` (HTTP) instead of `https://...` (HTTPS), SSL handshakes will fail because the proxy terminates the connection unexpectedly.
+
+#### Solutions
+
+1. **Fix proxy configuration**: Use an HTTPS-capable proxy for HTTPS traffic, or configure your HTTP proxy to properly support CONNECT tunneling for SSL
+2. **Bypass proxy for specific hosts**: Set `NO_PROXY` environment variable:
+ ```bash
+ NO_PROXY=api.bochaai.com,api.tavily.com,api.search.brave.com
+ ```
+3. **Disable proxy if not needed**: Unset the proxy environment variables entirely
+
+#### Why Standalone Scripts Work
+
+When you run a Python script directly, it may not inherit the same proxy environment variables that your Open WebUI service is using. The service typically inherits environment variables from systemd, Docker, or your shell profile, which may have different proxy settings.
diff --git a/docs/troubleshooting/image-generation.md b/docs/troubleshooting/image-generation.md
new file mode 100644
index 0000000000..40796d26ec
--- /dev/null
+++ b/docs/troubleshooting/image-generation.md
@@ -0,0 +1,71 @@
+---
+sidebar_position: 100
+title: "Image Generation"
+---
+
+## 🎨 Image Generation Troubleshooting
+
+### General Issues
+
+- **Image Not Generating**:
+ - Check the **Images** settings in the **Admin Panel** > **Settings** > **Images**. Ensure "Image Generation" is toggled **ON**.
+ - Verify your **API Key** and **Base URL** (for OpenAI, ComfyUI, Automatic1111) are correct.
+ - Ensure the selected model is available and loaded in your backend service (e.g., check the ComfyUI or Automatic1111 console for activity).
+ - **Azure OpenAI**: If you see `[ERROR: azure-openai error: Unknown parameter: 'response_format'.]`, ensure you are using API version `2025-04-01-preview` or later.
+
+### ComfyUI Issues
+
+- **Incompatible Workflow / JSON Errors**:
+ - **API Format Required**: Open WebUI requires workflows to be in the **API Format**.
+ - In ComfyUI:
+ 1. Click the "Settings" (gear icon).
+ 2. Enable "Enable Dev mode Options".
+ 3. Click "Save (API Format)" in the menu.
+ - **Do not** use the standard "Save" button or standard JSON export.
+
+- **Image Editing / Image Variation Fails**:
+ - If you are using Image Editing or Image+Image generation, your custom workflow **must** have nodes configured to accept an input image (usually a `LoadImage` node replaced/linked effectively).
+ - Check the default "Image Editing" workflow in the Open WebUI settings for the required node structure to ensure compatibility.
+
+### Automatic1111 Issues
+
+- **Connection Refused / "Api Not Found"**:
+ - Ensure you are running Automatic1111 with the `--api` flag enabled in your command line arguments.
+
+- **Docker Connectivity**:
+ - If Open WebUI is running in Docker and Automatic1111 is on your host machine:
+ - Use `http://host.docker.internal:7860` as the Base URL.
+ - Ensure `host.docker.internal` is resolvable (added via `--add-host=host.docker.internal:host-gateway` in your Docker run command).
+
+### Environment Variables & Configuration
+
+For advanced configuration, you can set the following environment variables.
+
+#### General Image Generation
+- `ENABLE_IMAGE_GENERATION`: Set to `true` to enable image generation.
+- `IMAGE_GENERATION_ENGINE`: The engine to use (e.g., `openai`, `comfyui`, `automatic1111`, `gemini`).
+- `IMAGE_GENERATION_MODEL`: The model ID to use for generation.
+- `IMAGE_SIZE`: Default image size (e.g., `512x512`).
+
+#### Engine Specifics
+
+**OpenAI / Compatible**
+- `IMAGES_OPENAI_API_BASE_URL`: Base URL for OpenAI-compatible image generation API.
+- `IMAGES_OPENAI_API_KEY`: API Key for the image generation service.
+
+**ComfyUI**
+- `COMFYUI_BASE_URL`: Base URL for your ComfyUI instance.
+- `COMFYUI_API_KEY`: API Key (if authentication is enabled).
+- `COMFYUI_WORKFLOW`: Custom workflow JSON (must be API format).
+
+**Automatic1111**
+- `AUTOMATIC1111_BASE_URL`: Base URL for your Automatic1111 instance.
+- `AUTOMATIC1111_API_AUTH`: Authentication credentials (username:password).
+
+**Gemini**
+- `IMAGES_GEMINI_API_KEY`: API Key for Gemini.
+- [View Gemini Configuration Guide](/features/image-generation-and-editing/gemini)
+
+:::tip
+For a complete list of environment variables and detailed configuration options, please refer to the [Environment Configuration Guide](/getting-started/env-configuration).
+:::
diff --git a/docs/troubleshooting/index.mdx b/docs/troubleshooting/index.mdx
index 0e0f218d30..543d302b32 100644
--- a/docs/troubleshooting/index.mdx
+++ b/docs/troubleshooting/index.mdx
@@ -11,6 +11,7 @@ import { TopBanners } from "@site/src/components/TopBanners";
Encountering issues? Don't worry, we're here to help! 😊 Start with this important step:
- 🔄 Make sure you're using the **latest version** of the software.
+- 💾 **Check for Configuration Persistence:** Open WebUI prioritizes settings stored in its internal database over environment variables for certain settings (marked as `PersistentConfig`). If your environment changes (like `ENABLE_SIGNUP=True`) seem to be ignored, see the **[Environment Variable Configuration](/getting-started/env-configuration#important-note-on-persistentconfig-environment-variables)** page for how to force updates or manually edit the database.
With this project constantly evolving, updates and fixes are regularly added. Keeping your software up-to-date is crucial to take advantage of all the enhancements and fixes, ensuring the best possible experience. 🚀
diff --git a/docs/troubleshooting/microphone-error.mdx b/docs/troubleshooting/microphone-error.mdx
deleted file mode 100644
index 59446d57f5..0000000000
--- a/docs/troubleshooting/microphone-error.mdx
+++ /dev/null
@@ -1,38 +0,0 @@
----
-sidebar_position: 2
-title: "Troubleshooting Microphone Access"
----
-
-Ensuring your application has the proper microphone access is crucial for functionality that depends on audio input. This guide covers how to manage and troubleshoot microphone permissions, particularly under secure contexts.
-
-## Understanding Secure Contexts 🔒
-
-For security reasons, accessing the microphone is restricted to pages served over HTTPS or locally from `localhost`. This requirement is meant to safeguard your data by ensuring it is transmitted over secure channels.
-
-## Common Permission Issues 🚫
-
-Browsers like Chrome, Brave, Microsoft Edge, Opera, and Vivaldi, as well as Firefox, restrict microphone access on non-HTTPS URLs. This typically becomes an issue when accessing a site from another device within the same network (e.g., using a mobile phone to access a desktop server). Here's how you can manage these issues:
-
-### Solutions for Non-HTTPS Connections
-
-1. **Set Up HTTPS:**
- - It is highly recommended to configure your server to support HTTPS. This not only resolves permission issues but also enhances the security of your data transmissions.
-
-2. **Temporary Browser Flags (Use with caution):**
- - These settings force your browser to treat certain insecure URLs as secure. This is useful for development purposes but poses significant security risks. Here's how to adjust these settings for major browsers:
-
- #### Chromium-based Browsers (e.g., Chrome, Brave)
- - Open `chrome://flags/#unsafely-treat-insecure-origin-as-secure`.
- - Enter your non-HTTPS address (e.g., `http://192.168.1.35:3000`).
- - Restart the browser to apply the changes.
-
- #### Firefox-based Browsers
- - Open `about:config`.
- - Search and modify (or create) the string value `dom.securecontext.allowlist`.
- - Add your IP addresses separated by commas (e.g., `http://127.0.0.1:8080`).
-
-### Considerations and Risks 🚨
-
-While browser flags offer a quick fix, they bypass important security checks which can expose your device and data to vulnerabilities. Always prioritize proper security measures, especially when planning for a production environment.
-
-By following these best practices, you can ensure that your application properly accesses the microphone while maintaining the security and integrity of your data.
\ No newline at end of file
diff --git a/docs/troubleshooting/multi-replica.mdx b/docs/troubleshooting/multi-replica.mdx
index 052e34c434..9d7dc25dda 100644
--- a/docs/troubleshooting/multi-replica.mdx
+++ b/docs/troubleshooting/multi-replica.mdx
@@ -12,7 +12,7 @@ This guide addresses common issues encountered when deploying Open WebUI in **mu
Before troubleshooting specific errors, ensure your deployment meets these **absolute requirements** for a multi-replica setup. Missing any of these will cause instability, login loops, or data loss.
1. **Shared Secret Key:** [`WEBUI_SECRET_KEY`](/getting-started/env-configuration#webui_secret_key) **MUST** be identical on all replicas.
-2. **External Database:** You **MUST** use an external PostgreSQL database (see [`DATABASE_URL`](/getting-started/env-configuration#database_server)). SQLite is **NOT** supported for multiple instances.
+2. **External Database:** You **MUST** use an external PostgreSQL database (see [`DATABASE_URL`](/getting-started/env-configuration#database_url)). SQLite is **NOT** supported for multiple instances.
3. **Redis for WebSockets:** [`ENABLE_WEBSOCKET_SUPPORT=True`](/getting-started/env-configuration#enable_websocket_support) and [`WEBSOCKET_MANAGER=redis`](/getting-started/env-configuration#websocket_manager) with a valid [`WEBSOCKET_REDIS_URL`](/getting-started/env-configuration#websocket_redis_url) are required.
4. **Shared Storage:** A persistent volume (RWX / ReadWriteMany if possible, or ensuring all replicas map to the same underlying storage for `data/`) is critical for RAG (uploads/vectors) and generated images.
5. **External Vector Database (Recommended):** While embedded Chroma works with shared storage, using a dedicated external Vector DB (e.g., [PGVector](/getting-started/env-configuration#pgvector_db_url), Milvus, Qdrant) is **highly recommended** to avoid file locking issues and improve performance.
@@ -116,6 +116,18 @@ The `/app/backend/data` directory is not shared or is not consistent across repl
- **Kubernetes:** Use a `PersistentVolumeClaim` with `ReadWriteMany` (RWX) access mode if your storage provider supports it (e.g., NFS, CephFS, AWS EFS).
- **Docker Swarm/Compose:** Mount a shared volume (e.g., NFS mount) to `/app/backend/data` on all containers.
+### 6. Slow Performance in Cloud vs. Local Kubernetes
+
+**Symptoms:**
+- Open WebUI performs well locally but experiences significant degradation or timeouts when deployed to cloud providers (AKS, EKS, GKE).
+- Performance drops sharply under concurrent load despite adequate resource allocation.
+
+**Cause:**
+This is typically caused by infrastructure latency (Network Latency to the database or Disk I/O latency for SQLite) that is inherently higher in cloud environments compared to local NVMe/SSD storage and local networks.
+
+**Solution:**
+Refer to the **[Cloud Infrastructure Latency](/tutorials/tips/performance#%EF%B8%8F-cloud-infrastructure-latency)** section in the Performance Guide for a detailed breakdown of diagnosis and mitigation strategies.
+
---
## Deployment Best Practices
@@ -123,16 +135,27 @@ The `/app/backend/data` directory is not shared or is not consistent across repl
### Updates and Migrations
:::danger Critical: Avoid Concurrent Migrations
-**Always scale down to 1 replica (and 1 worker) before upgrading Open WebUI versions.**
+**Always ensure only one process is running database migrations when upgrading Open WebUI versions.**
:::
-Database migrations run automatically on startup. If multiple replicas (or multiple workers within a single container) start simultaneously with a new version, they may try to run migrations concurrently, leading to race conditions or database schema corruption.
+Database migrations run automatically on startup. If multiple replicas (or multiple workers within a single container) start simultaneously with a new version, they may try to run migrations concurrently, potentially leading to race conditions or database schema corruption.
**Safe Update Procedure:**
-1. **Scale Down:** Set replicas to `1` (and ensure `UVICORN_WORKERS=1` if you customized it).
-2. **Update Image:** Application restarts with the new version.
-3. **Wait for Health Check:** Ensure the single instance starts up fully and completes DB migrations.
-4. **Scale Up:** Increase replicas (or `UVICORN_WORKERS`) back to your desired count.
+
+There are two ways to safely handle migrations in a multi-replica environment:
+
+#### Option 1: Designate a Master Migration Pod (Recommended)
+1. Identify one pod/replica as the "master" for migrations.
+2. Set `ENABLE_DB_MIGRATIONS=True` (default) on the master pod.
+3. Set `ENABLE_DB_MIGRATIONS=False` on all other pods.
+4. When updating, the master pod will handle the database schema update while other pods skip the migration step.
+
+#### Option 2: Scale Down During Update
+1. **Scale Down:** Set replicas to `1` (and ensure `UVICORN_WORKERS=1`).
+2. **Update Image:** Update the image or version.
+3. **Wait for Health Check:** Wait for the single instance to start fully and complete migrations.
+4. **Scale Up:** Increase replicas back to your desired count.
+
### Session Affinity (Sticky Sessions)
While Open WebUI is designed to be stateless with proper Redis configuration, enabling **Session Affinity** (Sticky Sessions) at your Load Balancer / Ingress level can improve performance and reduce occasional jitter in WebSocket connections.
@@ -145,6 +168,7 @@ While Open WebUI is designed to be stateless with proper Redis configuration, en
## Related Documentation
- [Environment Variable Configuration](/getting-started/env-configuration)
+- [Optimization, Performance & RAM Usage](/tutorials/tips/performance)
- [Troubleshooting Connection Errors](/troubleshooting/connection-error)
- [Logging Configuration](/getting-started/advanced-topics/logging)
diff --git a/docs/troubleshooting/rag.mdx b/docs/troubleshooting/rag.mdx
index 09d17888e0..105d2541f8 100644
--- a/docs/troubleshooting/rag.mdx
+++ b/docs/troubleshooting/rag.mdx
@@ -133,9 +133,52 @@ If you're not sure whether the issue is with retrieval, token limits, or embeddi
- GPT-4o handles larger inputs (128k tokens!)
- Provides a great benchmark to evaluate your system's RAG reliability
+
---
-## Summary Checklist ✅
+### 6. Upload Limits and Restrictions 🛑
+
+Open WebUI implements various limits to ensure system stability and prevent abuse. It is important to understand how these limits apply to different upload methods:
+
+* **Chat Uploads:** Subject to global file size and count limits.
+ * **Max File Size:** Controlled by `RAG_FILE_MAX_SIZE` (default: Unlimited). Configurable in **Admin Panel > Settings > Documents > General > Max Upload Size**.
+ * **Max File Count:** Controlled by `RAG_FILE_MAX_COUNT` (default: Unlimited). Configurable in **Admin Panel > Settings > Documents > General > Max Upload Count**.
+ * **Allowed File Extensions:** Controlled by `RAG_ALLOWED_FILE_EXTENSIONS` (default: All). Configurable in **Admin Panel > Settings > Documents > General > Allowed File Extensions**.
+* **Folder Uploads:** Subject to the `FOLDER_MAX_FILE_COUNT` [environment variable](/getting-started/env-configuration/#folder_max_file_count) (defaults to 100). This limit applies to the number of files directly associated with a folder.
+* **Knowledge Base Uploads:**
+ * **File Limit:** Subject to the same `RAG_FILE_MAX_SIZE` limit as chats, but **not** subject to the `RAG_FILE_MAX_COUNT` limit, allowing for unlimited file uploads.
+ * **RAG Enforcement:** All files uploaded to a Knowledge Base are automatically indexed. However, similar to chat uploads, Knowledge Bases can also be used in **Full Context Mode** (accessible in chat settings), which feeds the full document content to the model instead of using vector search retrieval.
+
+:::info
+By separating these limits, administrators can better manage resource usage across different features. For example, you might want to allow larger uploads in a curated Knowledge Base while restricting the number of files in ad-hoc Folder uploads.
+:::
+
+---
+
+### 7. Fragmented or Tiny Chunks 🧩
+
+When using the **Markdown Header Splitter**, documents can sometimes be split into very small fragments (e.g., just a table of contents entry or a short sub-header). These tiny chunks often lack enough semantic context for the embedding model to represent them accurately, leading to poor RAG results and unnecessary overhead.
+
+✅ Solution:
+
+- Go to **Admin Settings > Documents**.
+- Increase the **Chunk Min Size Target**.
+- Setting this to a value like `1000` (or ~50-60% of your `CHUNK_SIZE`) will force the system to merge small fragments with neighboring chunks when possible, resulting in better semantic coherence and fewer total chunks.
+
+---
+
+### 8. Slow Follow-up Responses (KV Cache Invalidation) 🐌
+
+If your initial response is fast but follow-up questions become increasingly slow, you are likely experiencing **KV Cache invalidation**.
+
+**The Problem**: By default, Open WebUI injects RAG context into the **user message**. As the chat progresses, new messages shift the position of this context, forcing models (like Ollama, llama.cpp, or vLLM) and cloud providers (like OpenAI or Vertex AI) to re-process the entire context for every turn.
+
+✅ Solution:
+- Set the environment variable `RAG_SYSTEM_CONTEXT=True`.
+- This injects the RAG context into the **system message**, which stays at a fixed position at the start of the conversation.
+- This allows providers to effectively use **KV prefix caching** or **Prompt Caching**, resulting in nearly instant follow-up responses even with large documents.
+
+---
| Problem | Fix |
|--------|------|
@@ -143,6 +186,9 @@ If you're not sure whether the issue is with retrieval, token limits, or embeddi
| 🧹 Only part of content used | Enable Full Context Mode or Bypass Embedding |
| ⏱ Limited by 2048 token cap | Increase model context length (Admin Panel > Models > Settings > Advanced Parameters for Ollama) or use large-context LLM |
| 📉 Inaccurate retrieval | Switch to a better embedding model, then reindex |
+| ❌ Upload limits bypass | Use Folder uploads (with `FOLDER_MAX_FILE_COUNT`) but note that Knowledge Base limits are separate |
+| 🧩 Fragmented/Tiny Chunks | Increase **Chunk Min Size Target** to merge small sections |
+| 🐌 Slow follow-up responses | Enable `RAG_SYSTEM_CONTEXT=True` to fix KV cache invalidation |
| Still confused? | Test with GPT-4o and compare outputs |
---
diff --git a/docs/troubleshooting/web-search.mdx b/docs/troubleshooting/web-search.mdx
new file mode 100644
index 0000000000..2e95e6e0f1
--- /dev/null
+++ b/docs/troubleshooting/web-search.mdx
@@ -0,0 +1,114 @@
+---
+sidebar_position: 4
+title: "Troubleshooting Web Search"
+---
+
+Web Search in Open WebUI allows language models to access real-time information from the internet. When things don't work as expected, this guide will help you diagnose and fix common issues.
+
+## Common Web Search Issues and How to Fix Them 🛠️
+
+### 1. Web Search Fails Behind HTTP Proxy 🌐🔒
+
+If you're running Open WebUI behind an HTTP proxy, you might notice that web search queries succeed (e.g., SearXNG returns results), but the subsequent content fetching fails with errors like:
+
+- `[Errno -3] Temporary failure in name resolution`
+- `Connection timeout to host`
+- `The content provided is empty`
+
+This happens because the web content fetcher doesn't use your `http_proxy`/`https_proxy` environment variables by default.
+
+✅ **Solution:**
+
+1. Navigate to: **Admin Panel > Settings > Web Search**
+2. Enable **Trust Proxy Environment**
+3. Save changes
+
+Alternatively, set the environment variable [`WEB_SEARCH_TRUST_ENV`](../getting-started/env-configuration#web_search_trust_env):
+
+```bash
+WEB_SEARCH_TRUST_ENV=True
+```
+
+:::info
+
+This is a **PersistentConfig** variable, meaning it can be set via environment variable on startup OR configured through the Admin Panel UI. Once set in the UI, the database value takes precedence over the environment variable.
+
+This setting tells Open WebUI's web content loader to respect the proxy settings from your environment variables (`http_proxy`, `https_proxy`). Without this, even if your search engine works through the proxy, fetching content from the returned URLs will fail.
+
+:::
+
+---
+
+### 2. 403 Forbidden Errors from SearXNG
+
+If you're using SearXNG and seeing `403 Client Error: Forbidden` in your logs, the JSON format is not enabled.
+
+✅ **Solution:**
+
+Edit your SearXNG `settings.yml` and add `json` to the formats list:
+
+```yaml
+search:
+ formats:
+ - html
+ - json
+```
+
+Restart SearXNG after making this change.
+
+---
+
+### 3. Empty Content or Poor Results
+
+If web search returns empty content or poor quality results, the issue is often related to context window size or content extraction.
+
+✅ **Solutions:**
+
+- **Increase context length**: Web pages often contain 4,000-8,000+ tokens. If your model has a 2048-token limit, you're missing most of the content. Increase to 16384+ tokens in **Admin Panel > Models > Settings > Advanced Parameters** (anything below will be subpar for web content).
+
+- **Check result count**: Adjust `WEB_SEARCH_RESULT_COUNT` to control how many results are fetched.
+
+- **Try different loaders**: Configure `WEB_LOADER_ENGINE` to use `playwright` for JavaScript-heavy sites or `firecrawl`/`tavily` for better extraction.
+
+For more details on context window issues, see the [RAG Troubleshooting Guide](./rag).
+
+---
+
+### 4. Connection Timeouts
+
+If web searches are timing out:
+
+✅ **Solutions:**
+
+- **Reduce concurrent requests**: Set `WEB_SEARCH_CONCURRENT_REQUESTS=1` for sequential execution (required for Brave free tier).
+
+- **Adjust loader concurrency**: Lower `WEB_LOADER_CONCURRENT_REQUESTS` if fetching many pages simultaneously.
+
+- **Check network connectivity**: Ensure Open WebUI can reach both the search engine and the result URLs.
+
+---
+
+## Environment Variables Reference
+
+For a comprehensive list of all web search environment variables, see the [Environment Configuration documentation](../getting-started/env-configuration#web-search).
+
+Key variables:
+
+| Variable | Description |
+|----------|-------------|
+| `WEB_SEARCH_TRUST_ENV` | Enable proxy support for content fetching |
+| `WEB_SEARCH_RESULT_COUNT` | Number of search results to fetch |
+| `WEB_SEARCH_CONCURRENT_REQUESTS` | Concurrent requests to search engine |
+| `WEB_LOADER_CONCURRENT_REQUESTS` | Concurrent page fetches |
+| `WEB_LOADER_ENGINE` | Content extraction engine |
+
+---
+
+## Still Having Issues?
+
+If you're still experiencing problems:
+
+1. Check the Open WebUI logs for detailed error messages
+2. Verify your search engine configuration is correct
+3. Test connectivity from the Open WebUI container to your search engine
+4. Review all [Web Search environment variables](../getting-started/env-configuration#web-search) for additional configuration options
diff --git a/docs/tutorials/https/haproxy.md b/docs/tutorials/https/haproxy.md
index 8463374d59..7446965a3f 100644
--- a/docs/tutorials/https/haproxy.md
+++ b/docs/tutorials/https/haproxy.md
@@ -117,6 +117,20 @@ backend owui_chat
http-request add-header X-CLIENT-IP %[src]
http-request set-header X-Forwarded-Proto https if { ssl_fc }
server chat :3000
+
+## WebSocket and HTTP/2 Compatibility
+
+Starting with recent versions (including HAProxy 3.x), HAProxy may enable HTTP/2 by default. While HTTP/2 supports WebSockets (RFC 8441), some clients or backend configurations may experience "freezes" or unresponsiveness when icons or data start loading via WebSockets over an H2 tunnel.
+
+If you experience these issues:
+1. **Force HTTP/1.1 for WebSockets**: Add `option h2-workaround-bogus-websocket-clients` to your `frontend` or `defaults` section. This prevents HAProxy from advertising RFC 8441 support to the client, forcing a fallback to the more stable HTTP/1.1 Upgrade mechanism.
+2. **Backend Version**: Ensure your backend connection is using HTTP/1.1 (the default for `mode http`).
+
+Example addition to your `defaults` or `frontend`:
+```shell
+defaults
+ # ... other settings
+ option h2-workaround-bogus-websocket-clients
```
You will see that we have ACL records (routers) for both Open WebUI and Let's Encrypt. To use WebSocket with OWUI, you need to have an SSL configured, and the easiest way to do that is to use Let's Encrypt.
diff --git a/docs/tutorials/https/nginx.md b/docs/tutorials/https/nginx.md
index 3de6d8eeab..8989ea5068 100644
--- a/docs/tutorials/https/nginx.md
+++ b/docs/tutorials/https/nginx.md
@@ -25,6 +25,17 @@ A very common and difficult-to-debug issue with WebSocket connections is a misco
Failure to do so will cause WebSocket connections to fail, even if you have enabled "Websockets support" in Nginx Proxy Manager.
+### HTTP/2 and WebSockets
+
+If you enable **HTTP/2** on your Nginx server, ensure that your proxy configuration still uses **HTTP/1.1** for the connection to the Open WebUI backend. This is crucial as most WebUI features (like streaming and real-time updates) rely on WebSockets, which are more stable when handled via HTTP/1.1 `Upgrade` than over the newer RFC 8441 (WebSockets over H2) in many proxy environments.
+
+In your Nginx location block, always include:
+```nginx
+proxy_http_version 1.1;
+proxy_set_header Upgrade $http_upgrade;
+proxy_set_header Connection "upgrade";
+```
+
:::
Choose the method that best fits your deployment needs.
diff --git a/docs/tutorials/integrations/mcp-notion.mdx b/docs/tutorials/integrations/mcp-notion.mdx
index ce9ae0e723..b36d1b4fa2 100644
--- a/docs/tutorials/integrations/mcp-notion.mdx
+++ b/docs/tutorials/integrations/mcp-notion.mdx
@@ -166,8 +166,7 @@ Once MCPO is running and configured with Notion:
"spec_type": "url",
"spec": "",
"path": "openapi.json",
- "auth_type": "bearer",
- "key": "",
+ "auth_type": "none",
"info": {
"id": "notion-local",
"name": "Notion (Local)",
diff --git a/docs/tutorials/tips/improve-performance-local.md b/docs/tutorials/tips/improve-performance-local.md
deleted file mode 100644
index a52944bd13..0000000000
--- a/docs/tutorials/tips/improve-performance-local.md
+++ /dev/null
@@ -1,205 +0,0 @@
----
-sidebar_position: 12
-title: "Improve Local LLM Performance with Dedicated Task Models"
----
-
-## Improve Performance with Dedicated Task Models
-
-Open-WebUI provides several automated features—such as title generation, tag creation, autocomplete, and search query generation—to enhance the user experience. However, these features can generate multiple simultaneous requests to your local model, which may impact performance on systems with limited resources.
-
-This guide explains how to optimize your setup by configuring a dedicated, lightweight task model or by selectively disabling automation features, ensuring that your primary chat functionality remains responsive and efficient.
-
----
-
-:::tip
-
-## Why Does Open-WebUI Feel Slow?
-
-By default, Open-WebUI has several background tasks that can make it feel like magic but can also place a heavy load on local resources:
-
-- **Title Generation**
-- **Tag Generation**
-- **Autocomplete Generation** (this function triggers on every keystroke)
-- **Search Query Generation**
-
-Each of these features makes asynchronous requests to your model. For example, continuous calls from the autocomplete feature can significantly delay responses on devices with limited memory or processing power, such as a Mac with 32GB of RAM running a 32B quantized model.
-
-Optimizing the task model can help isolate these background tasks from your main chat application, improving overall responsiveness.
-
-:::
-
----
-
-## ⚡ How to Optimize Task Model Performance
-
-Follow these steps to configure an efficient task model:
-
-### Step 1: Access the Admin Panel
-
-1. Open Open-WebUI in your browser.
-2. Navigate to the **Admin Panel**.
-3. Click on **Settings** in the sidebar.
-
-### Step 2: Configure the Task Model
-
-1. Go to **Interface > Set Task Model**.
-2. Choose one of the following options based on your needs:
-
- - **Lightweight Local Model (Recommended)**
- - Select a compact model such as **Llama 3.2 3B** or **Qwen2.5 3B**.
- - These models offer rapid responses while consuming minimal system resources.
-
- - **Hosted API Endpoint (For Maximum Speed)**
- - Connect to a hosted API service to handle task processing.
- - This can be very cheap. For example, OpenRouter offers Llama and Qwen models at less than **1.5 cents per million input tokens**.
- :::tip OpenRouter Recommendation
- When using **OpenRouter**, we highly recommend configuring the **Model IDs (Allowlist)** in the connection settings. Importing thousands of models can clutter your selector and degrade admin panel performance.
- :::
-
- - **Disable Unnecessary Automation**
- - If certain automated features are not required, disable them to reduce extraneous background calls—especially features like autocomplete.
-
-
-
-### Step 3: Save Your Changes and Test
-
-1. Save the new configuration.
-2. Interact with your chat interface and observe the responsiveness.
-3. If necessary, adjust by further disabling unused automation features or experimenting with different task models.
-
----
-
-## 🚀 Recommended Setup for Local Models
-
-| Optimization Strategy | Benefit | Recommended For |
-|---------------------------------|------------------------------------------|----------------------------------------|
-| **Lightweight Local Model** | Minimizes resource usage | Systems with limited hardware |
-| **Hosted API Endpoint** | Offers the fastest response times | Users with reliable internet/API access|
-| **Disable Automation Features** | Maximizes performance by reducing load | Those focused on core chat functionality|
-
-Implementing these recommendations can greatly improve the responsiveness of Open-WebUI while allowing your local models to efficiently handle chat interactions.
-
----
-
-## ⚙️ Environment Variables for Performance
-
-You can also configure performance-related settings via environment variables. Add these to your Docker Compose file or `.env` file.
-
-:::tip
-
-Many of these settings can also be configured directly in the **Admin Panel > Settings** interface. Environment variables are useful for initial deployment configuration or when managing settings across multiple instances.
-
-:::
-
-### Task Model Configuration
-
-Set a dedicated lightweight model for background tasks:
-
-```bash
-# For Ollama models
-TASK_MODEL=llama3.2:3b
-
-# For OpenAI-compatible endpoints
-TASK_MODEL_EXTERNAL=gpt-4o-mini
-```
-
-### Disable Unnecessary Features
-
-```bash
-# Disable automatic title generation
-ENABLE_TITLE_GENERATION=False
-
-# Disable follow-up question suggestions
-ENABLE_FOLLOW_UP_GENERATION=False
-
-# Disable autocomplete suggestions (triggers on every keystroke - high impact!)
-ENABLE_AUTOCOMPLETE_GENERATION=False
-
-# Disable automatic tag generation
-ENABLE_TAGS_GENERATION=False
-
-# Disable search query generation for RAG (if not using web search)
-ENABLE_SEARCH_QUERY_GENERATION=False
-
-# Disable retrieval query generation
-ENABLE_RETRIEVAL_QUERY_GENERATION=False
-```
-
-### Enable Caching and Optimization
-
-```bash
-# Cache model list responses (seconds) - reduces API calls
-MODELS_CACHE_TTL=300
-
-# Cache LLM-generated search queries - eliminates duplicate LLM calls when both web search and RAG are active
-ENABLE_QUERIES_CACHE=True
-
-# Convert base64 images to file URLs - reduces response size and database strain
-ENABLE_CHAT_RESPONSE_BASE64_IMAGE_URL_CONVERSION=True
-
-# Batch streaming tokens to reduce CPU load (recommended: 5-10 for high concurrency)
-CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE=5
-
-# Enable gzip compression for HTTP responses (enabled by default)
-ENABLE_COMPRESSION_MIDDLEWARE=True
-```
-
-### Database and Persistence
-
-```bash
-# Disable real-time chat saving for better performance (trades off data persistence)
-ENABLE_REALTIME_CHAT_SAVE=False
-```
-
-### Network Timeouts
-
-```bash
-# Increase timeout for slow models (default: 300 seconds)
-AIOHTTP_CLIENT_TIMEOUT=300
-
-# Faster timeout for model list fetching (default: 10 seconds)
-AIOHTTP_CLIENT_TIMEOUT_MODEL_LIST=5
-```
-
-### RAG Performance
-
-```bash
-# Enable parallel embedding for faster document processing (requires sufficient resources)
-RAG_EMBEDDING_BATCH_SIZE=100
-```
-
-### High Concurrency Settings
-
-For larger instances with many concurrent users:
-
-```bash
-# Increase thread pool size (default is 40)
-THREAD_POOL_SIZE=500
-```
-
-:::info
-
-For a complete list of environment variables, see the [Environment Variable Configuration](/getting-started/env-configuration) documentation.
-
-:::
-
----
-
-## 💡 Additional Tips
-
-- **Monitor System Resources:** Use your operating system’s tools (such as Activity Monitor on macOS or Task Manager on Windows) to keep an eye on resource usage.
-- **Reduce Parallel Model Calls:** Limiting background automation prevents simultaneous requests from overwhelming your LLM.
-- **Experiment with Configurations:** Test different lightweight models or hosted endpoints to find the optimal balance between speed and functionality.
-- **Stay Updated:** Regular updates to Open-WebUI often include performance improvements and bug fixes, so keep your software current.
-
----
-
-## Related Guides
-
-- [Reduce RAM Usage](/tutorials/tips/reduce-ram-usage) - For memory-constrained environments like Raspberry Pi
-- [SQLite Database Overview](/tutorials/tips/sqlite-database) - Database schema, encryption, and advanced configuration
-- [Environment Variable Configuration](/getting-started/env-configuration) - Complete list of all configuration options
-
----
-
-By applying these configuration changes, you'll support a more responsive and efficient Open-WebUI experience, allowing your local LLM to focus on delivering high-quality chat interactions without unnecessary delays.
diff --git a/docs/tutorials/tips/performance.md b/docs/tutorials/tips/performance.md
new file mode 100644
index 0000000000..007e11219e
--- /dev/null
+++ b/docs/tutorials/tips/performance.md
@@ -0,0 +1,316 @@
+---
+sidebar_position: 10
+title: "Optimization, Performance & RAM Usage"
+---
+
+# Optimization, Performance & RAM Usage
+
+This guide provides a comprehensive overview of strategies to optimize Open WebUI. Your ideal configuration depends heavily on your specific deployment goals. Consider which of these scenarios describes you best:
+
+1. **Maximum Privacy on Weak Hardware (e.g., Raspberry Pi)**:
+ * *Goal*: keep everything local; minimize resource usage.
+ * *Trade-off*: You must use lightweight local models (SentenceTransformers) and disable heavy features to prevent crashes.
+
+2. **Maximum Quality for Single User (e.g., Desktop)**:
+ * *Goal*: Best possible experience with high speed and quality.
+ * *Strategy*: Leverage external APIs (OpenAI/Anthropic) for embeddings and task models to offload compute from your local machine.
+
+3. **High Scale for Many Users (e.g., Enterprise/Production)**:
+ * *Goal*: Stability and concurrency.
+ * *Strategy*: Requires dedicated Vector DBs (Milvus/Qdrant), increased thread pools, caching to handle load, and **PostgreSQL** instead of SQLite.
+
+---
+
+## ⚡ Performance Tuning (Speed & Responsiveness)
+
+If Open WebUI feels slow or unresponsive, especially during chat generation or high concurrency, specialized optimizations can significantly improve the user experience.
+
+### 1. Dedicated Task Models
+
+By default, Open WebUI automates background tasks like title generation, tagging, and autocomplete. These run in the background and can slow down your main chat model if they share the same resources.
+
+**Recommendation**: Use a **very fast, small, and cheap NON-REASONING model** for these tasks. Avoid using large reasoning models (like o1, r1, or Claude) as they are too slow and expensive for simple background tasks.
+
+**Configuration:**
+There are two separate settings in **Admin Panel > Settings > Interface**. The system intelligently selects which one to use based on the model you are currently chatting with:
+* **Task Model (External)**: Used when you are chatting with an external model (e.g., OpenAI).
+* **Task Model (Local)**: Used when you are chatting with a locally hosted model (e.g., Ollama).
+
+**Best Options (2025):**
+* **External/Cloud**: `gpt-5-nano`, `gemini-2.5-flash-lite`, `llama-3.1-8b-instant` (OpenAI/Google/Groq/OpenRouter).
+* **Local**: `qwen3:1b`, `gemma3:1b`, `llama3.2:3b`.
+
+### 2. Caching & Latency Optimization
+
+Configure these settings to reduce latency and external API usage.
+
+#### Model Caching
+Drastically reduces startup time and API calls to external providers.
+
+:::warning Important for OpenRouter and Multi-Model Providers
+If you are using **OpenRouter** or any provider with hundreds/thousands of models, enabling model caching is **highly recommended**. Without caching, initial page loads can take **10-15+ seconds** as the application queries all available models. Enabling the cache reduces this to near-instant.
+:::
+
+- **Admin Panel**: `Settings > Connections > Cache Base Model List`
+- **Env Var**: `ENABLE_BASE_MODELS_CACHE=True`
+ * *Note*: Caches the list of models in memory. Only refreshes on App Restart or when clicking **Save** in Connections settings.
+
+- **Env Var**: `MODELS_CACHE_TTL=300`
+ * *Note*: Sets a 5-minute cache for external API responses.
+
+#### Search Query Caching
+Reuses the LLM-generated Web-Search search queries for RAG search within the same chat turn. This prevents redundant LLM calls when multiple retrieval features act on the same user prompt.
+
+- **Env Var**: `ENABLE_QUERIES_CACHE=True`
+ * *Note*: If enabled, the same search query will be reused for RAG instead of generating new queries for RAG, saving on inference cost and API calls, thus improving performance.
+
+I.e. the LLM generates "US News 2025" as a Web Search query, if this setting is enabled, the same search query will be reused for RAG instead of generating new queries for RAG, saving on inference cost and API calls, thus improving performance.
+
+#### KV Cache Optimization (RAG Performance)
+Drastically improves the speed of follow-up questions when chatting with large documents or knowledge bases.
+
+- **Env Var**: `RAG_SYSTEM_CONTEXT=True`
+- **Effect**: Injects RAG context into the **system message** instead of the user message.
+- **Why**: Many LLM engines (like Ollama, llama.cpp, vLLM) and cloud providers (OpenAI, Vertex AI) support **KV prefix caching** or **Prompt Caching**. System messages stay at the start of the conversation, while user messages shift position each turn. Moving RAG context to the system message ensures the cache remains valid, leading to **near-instant follow-up responses** instead of re-processing large contexts every turn.
+
+---
+
+## 📦 Database Optimization
+
+For high-scale deployments, your database configuration is the single most critical factor for stability.
+
+### PostgreSQL (Mandatory for Scale)
+For any multi-user or high-concurrency setup, **PostgreSQL is mandatory**. SQLite (the default) is not designed for high concurrency and will become a bottleneck (database locking errors).
+
+- **Variable**: `DATABASE_URL`
+- **Example**: `postgres://user:password@localhost:5432/webui`
+
+### Chat Saving Strategy
+By default, Open WebUI saves chats in **real-time**. This ensures no data loss but creates massive database write pressure because *every single chunk* of text received from the LLM triggers a database update.
+
+- **Env Var**: `ENABLE_REALTIME_CHAT_SAVE=False`
+- **Effect**: Chats are saved only when the generation is complete (or periodically).
+- **Recommendation**: **Highly Recommended** for any high-user setup to reduce DB load substantially.
+
+### Vector Database (RAG)
+For multi-user setups, the choice of Vector DB matters.
+
+- **ChromaDB**: **NOT RECOMMENDED** for multi-user environments due to performance limitations and locking issues.
+- **Recommendations**:
+ * **Milvus** or **Qdrant**: Best for improved scale and performance.
+ * **PGVector**: Excellent choice if you are already using PostgreSQL.
+- **Multitenancy**: If using Milvus or Qdrant, enabling multitenancy offers better resource sharing.
+ * `ENABLE_MILVUS_MULTITENANCY_MODE=True`
+ * `ENABLE_QDRANT_MULTITENANCY_MODE=True`
+
+### Optimizing Document Chunking
+
+The way your documents are chunked directly impacts both storage efficiency and retrieval quality.
+
+- **Use Markdown Header Splitting**: This preserves the semantic structure of your documents.
+- **Set a Chunk Min Size Target**: When using the markdown header splitter, tiny chunks (e.g., just a single sub-header) can be created. These are inefficient to store and poor for retrieval.
+ - **Env Var**: `CHUNK_MIN_SIZE_TARGET=1000` (Example value)
+ - **Benefit**: Intelligently merges small chunks with neighbors, significantly reducing the total vector count and improving RAG performance.
+
+---
+
+## 📈 Scaling Infrastructure (Multi-Tenancy & Kubernetes)
+
+If you are deploying for **enterprise scale** (hundreds of users), simple Docker Compose setups may not suffice. You will need to move to a clustered environment.
+
+* **Kubernetes / Helm**: For deploying on K8s with multiple replicas, see the **[Multi-Replica & High Availability Guide](/troubleshooting/multi-replica)**.
+* **Redis (Mandatory)**: When running multiple workers (`UVICORN_WORKERS > 1`) or multiple replicas, **Redis is required** to handle WebSocket connections and session syncing. See **[Redis Integration](/tutorials/integrations/redis)**.
+* **Load Balancing**: Ensure your Ingress controller supports **Session Affinity** (Sticky Sessions) for best performance.
+* **Reverse Proxy Caching**: Configure your reverse proxy (e.g., Nginx, Caddy, Cloudflare) to **cache static assets** (JS, CSS, Images). This significantly reduces load on the application server. See **[Nginx Config](/tutorials/https/nginx)** or **[Caddy Config](/tutorials/https/caddy)**.
+
+---
+
+## ⚡ High-Concurrency & Network Optimization
+
+For setups with many simultaneous users, these settings are crucial to prevent bottlenecks.
+
+#### Batch Streaming Tokens
+By default, Open WebUI streams *every single token* arriving from the LLM. High-frequency streaming increases network IO and CPU usage on the server. If real-time saving is enabled, it also destroys database performance (you can disable it with `ENABLE_REALTIME_CHAT_SAVE=False`).
+
+Increasing the chunk size buffers these updates, sending them to the client in larger groups. The only downside is a slightly choppier UI experience when streaming the response, but it can make a big difference in performance.
+
+- **Env Var**: `CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE=7`
+ * *Recommendation*: Set to **5-10** for high-concurrency instances.
+
+#### Thread Pool Size
+Defines the number of worker threads available for handling requests.
+* **Default**: 40
+* **High-Traffic Recommendation**: **2000+**
+* **Warning**: **NEVER decrease this value.** Even on low-spec hardware, an idle thread pool does not consume significant resources. Setting this too low (e.g., 10) **WILL cause application freezes** and request timeouts.
+
+- **Env Var**: `THREAD_POOL_SIZE=2000`
+
+---
+
+## ☁️ Cloud Infrastructure Latency
+
+When deploying Open WebUI in cloud Kubernetes environments (AKS, EKS, GKE), you may notice significant performance degradation compared to local Kubernetes (Rancher Desktop, kind, Minikube) or bare-metal deployments—even with identical resource allocations. This is almost always caused by **latency** in the underlying infrastructure.
+
+### Network Latency (Database & Services)
+
+The most common cause of cloud performance issues is **network latency between Open WebUI and its database**.
+
+Many cloud deployments place the database on a separate node, availability zone, or even a managed database service. While this is architecturally sound, it introduces latency to *every single database query*. Open WebUI makes multiple database calls per request, so even 10-20ms of network latency per query can compound into multi-second response times under concurrent load.
+
+**Symptoms:**
+- Health check endpoints show high response times instead of being near-instant.
+- Simple API calls or normal chat completions become sluggish under concurrent load, even when CPU and Memory usage appear low.
+- Significant performance gap between local development/testing and cloud production environments.
+
+**Diagnosis:**
+- Check network latency between your Open WebUI pod and your database. From within the pod:
+ ```bash
+ # For PostgreSQL
+ psql -h -U -c "SELECT 1" -d
+
+ # Or use ping/nc to check raw latency
+ nc -zv 5432
+ ```
+- **Ideal Latency Target:** You should aim for **1-2ms or lower** for database queries. If network latency to your database exceeds **5ms**, it is highly not recommended for production deployments and will likely be your primary performance bottleneck.
+
+**Solutions:**
+1. **Co-locate services:** Deploy Open WebUI and PostgreSQL in the same availability zone, or even on the same node pool if possible, to minimize network hops.
+2. **Managed DB Consideration:** Note that "one-click" managed database solutions in the cloud, while scalable, often introduce significant network latency compared to a self-hosted DB on the same node. This tradeoff must be carefully considered.
+3. **Enable caching:** Use `ENABLE_BASE_MODELS_CACHE=True` and other caching options to reduce the frequency of database queries.
+4. **Reduce database writes:** Set `ENABLE_REALTIME_CHAT_SAVE=False` to batch database updates and reduce IOPS pressure.
+
+### Disk I/O Latency (SQLite & Storage)
+
+If you're using **SQLite** (the default) in a cloud environment, you may be trading network latency for **disk latency**.
+
+Cloud storage (Azure Disks, AWS EBS, GCP Persistent Disks) often has significantly higher latency and lower IOPS than local NVMe/SSD storage—especially on lower-tier storage classes.
+
+:::warning Warning: Performance Risk with Network File Systems
+Using Network-attached File Systems like **NFS, SMB, or Azure Files** for your database storage (especially for SQLite) **may** introduce severe latency into the file locking and synchronous write operations that SQLite relies on.
+:::
+
+SQLite is particularly sensitive to disk performance because it performs synchronous writes. Moving from local SSDs to a network share can increase latency by 10x or more per operation.
+
+**Symptoms:**
+- Performance is acceptable with a single user but degrades rapidly as concurrency increases.
+- High "I/O Wait" on the server despite low CPU usage.
+
+**Solutions:**
+1. **Use high-performance Block Storage:**
+ - Ensure you are using SSD-backed **Block Storage** classes (e.g., `Premium_LRS` on Azure Disks, `gp3` on AWS EBS, `pd-ssd` on GCP). Avoid "File" based storage classes (like `azurefile-csi`) for database workloads.
+2. **Use PostgreSQL instead:** For any medium to large production deployment, **Postgres is mandatory**. SQLite is generally not recommended at scale in cloud environments due to the inherent latency of network-attached storage and the compounding effect of file locking over the network.
+
+### Other Cloud-Specific Considerations
+
+| Factor | Impact | Mitigation |
+|--------|--------|------------|
+| **Burstable VMs** (e.g., Azure B-series, AWS T-series) | CPU throttling under sustained load, even at low reported usage | Use standard/compute-optimized node pools |
+| **DNS Resolution** | CoreDNS overhead on every external request | Ensure CoreDNS is properly scaled; consider node-local DNS cache |
+| **Service Mesh Sidecars** | Istio/Linkerd proxies add latency to every request | Check for unexpected sidecar containers in your pods |
+| **Network Policies** | CNI processing overhead | Audit and simplify network policies if possible |
+| **Cross-Zone Traffic** | Latency + egress costs when services span zones | Pin services to the same availability zone |
+
+---
+
+## 📉 Resource Efficiency (Reducing RAM)
+
+If deploying on memory-constrained devices (Raspberry Pi, small VPS), use these strategies to prevent the application from crashing due to OOM (Out of Memory) errors.
+
+### 1. Offload Auxiliary Models (Local Deployments Only)
+
+Open WebUI loads local ML models for features like RAG and STT. **This section is only relevant if you are running models LOCALLY.**
+
+#### RAG Embeddings
+- **Low-Spec Recommendation**:
+ * **Option A (Easiest)**: Keep the default **SentenceTransformers** (all-MiniLM-L6-v2). It is lightweight, runs on CPU, and is significantly more efficient than running a full Ollama instance on the same Raspberry Pi.
+ * **Option B (Best Performance)**: Use an **External API** (OpenAI/Cloud).
+
+- **Configuration**:
+ * **Admin Panel**: `Settings > Documents > Embedding Model Engine`
+ * **Env Var**: `RAG_EMBEDDING_ENGINE=openai` (to offload completely)
+
+#### Speech-to-Text (STT)
+Local Whisper models are heavy (~500MB+ RAM).
+
+- **Recommendation**: Use **WebAPI** (Browser-based). It uses the user's device capabilities, costing 0 server RAM.
+- **Configuration**:
+ * **Admin Panel**: `Settings > Audio > STT Engine`
+ * **Env Var**: `AUDIO_STT_ENGINE=webapi`
+
+### 2. Disable Unused Features
+
+Prevent the application from loading **local** models you don't use.
+
+- **Image Generation**: `ENABLE_IMAGE_GENERATION=False` (Admin: `Settings > Images`)
+- **Code Interpreter**: `ENABLE_CODE_INTERPRETER=False` (Admin: `Settings > Tools`)
+
+### 3. Disable Background Tasks
+
+If resource usage is critical, disable automated features that constantly trigger model inference.
+
+**Recommendation order (Highest Impact first):**
+
+1. **Autocomplete**: `ENABLE_AUTOCOMPLETE_GENERATION=False` (**High Impact**: Triggers on every keystroke!)
+ * Admin: `Settings > Interface > Autocomplete`
+2. **Follow-up Questions**: `ENABLE_FOLLOW_UP_GENERATION=False`
+ * Admin: `Settings > Interface > Follow-up`
+3. **Title Generation**: `ENABLE_TITLE_GENERATION=False`
+ * Admin: `Settings > Interface > Chat Title`
+4. **Tag Generation**: `ENABLE_TAGS_GENERATION=False`
+
+---
+
+## 🚀 Recommended Configuration Profiles
+
+### Profile 1: Maximum Privacy (Weak Hardware/RPi)
+*Target: 100% Local, Raspberry Pi / <4GB RAM.*
+
+1. **Embeddings**: Default (SentenceTransformers) - *Runs on CPU, lightweight.*
+2. **Audio**: `AUDIO_STT_ENGINE=webapi` - *Zero server load.*
+3. **Task Model**: Disable or use tiny model (`llama3.2:1b`).
+4. **Scaling**: Keep default `THREAD_POOL_SIZE` (40).
+5. **Disable**: Image Gen, Code Interpreter, Autocomplete, Follow-ups.
+6. **Database**: SQLite is fine.
+
+### Profile 2: Single User Enthusiast
+*Target: Max Quality & Speed, Local + External APIs.*
+
+1. **Embeddings**: `RAG_EMBEDDING_ENGINE=openai` (or `ollama` with `nomic-embed-text` on a fast server).
+2. **Task Model**: `gpt-5-nano` or `llama-3.1-8b-instant`.
+3. **Caching**: `MODELS_CACHE_TTL=300`.
+4. **Database**: `ENABLE_REALTIME_CHAT_SAVE=True` (Persistence is usually preferred over raw write speed here).
+5. **Vector DB**: PGVector (recommended) or ChromaDB (either is fine unless dealing with massive data).
+
+### Profile 3: High Scale / Enterprise
+*Target: Many concurrent users, Stability > Persistence.*
+
+1. **Database**: **PostgreSQL** (Mandatory).
+2. **Workers**: `THREAD_POOL_SIZE=2000` (Prevent timeouts).
+3. **Streaming**: `CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE=7` (Reduce CPU/Net/DB writes).
+4. **Chat Saving**: `ENABLE_REALTIME_CHAT_SAVE=False`.
+5. **Vector DB**: **Milvus**, **Qdrant**, or **PGVector**. **Avoid ChromaDB.**
+6. **Task Model**: External/Hosted (Offload compute).
+7. **Caching**: `ENABLE_BASE_MODELS_CACHE=True`, `MODELS_CACHE_TTL=300`, `ENABLE_QUERIES_CACHE=True`.
+
+---
+
+## 🔗 Environment Variable References
+
+For detailed information on all available variables, see the [Environment Configuration](/getting-started/env-configuration) guide.
+
+| Variable | Description & Link |
+| :--- | :--- |
+| `TASK_MODEL` | [Task Model (Local)](/getting-started/env-configuration#task_model) |
+| `TASK_MODEL_EXTERNAL` | [Task Model (External)](/getting-started/env-configuration#task_model_external) |
+| `ENABLE_BASE_MODELS_CACHE` | [Cache Model List](/getting-started/env-configuration#enable_base_models_cache) |
+| `MODELS_CACHE_TTL` | [Model Cache TTL](/getting-started/env-configuration#models_cache_ttl) |
+| `ENABLE_QUERIES_CACHE` | [Queries Cache](/getting-started/env-configuration#enable_queries_cache) |
+| `DATABASE_URL` | [Database URL](/getting-started/env-configuration#database_url) |
+| `ENABLE_REALTIME_CHAT_SAVE` | [Realtime Chat Save](/getting-started/env-configuration#enable_realtime_chat_save) |
+| `CHAT_RESPONSE_STREAM_DELTA_CHUNK_SIZE` | [Streaming Chunk Size](/getting-started/env-configuration#chat_response_stream_delta_chunk_size) |
+| `THREAD_POOL_SIZE` | [Thread Pool Size](/getting-started/env-configuration#thread_pool_size) |
+| `RAG_EMBEDDING_ENGINE` | [Embedding Engine](/getting-started/env-configuration#rag_embedding_engine) |
+| `AUDIO_STT_ENGINE` | [STT Engine](/getting-started/env-configuration#audio_stt_engine) |
+| `ENABLE_IMAGE_GENERATION` | [Image Generation](/getting-started/env-configuration#enable_image_generation) |
+| `ENABLE_AUTOCOMPLETE_GENERATION` | [Autocomplete](/getting-started/env-configuration#enable_autocomplete_generation) |
+| `RAG_SYSTEM_CONTEXT` | [RAG System Context](/getting-started/env-configuration#rag_system_context) |
diff --git a/docs/tutorials/tips/reduce-ram-usage.md b/docs/tutorials/tips/reduce-ram-usage.md
deleted file mode 100644
index 769007b234..0000000000
--- a/docs/tutorials/tips/reduce-ram-usage.md
+++ /dev/null
@@ -1,168 +0,0 @@
----
-sidebar_position: 10
-title: "Reduce RAM Usage"
----
-
-## Reduce RAM Usage
-
-If you are deploying Open WebUI in a RAM-constrained environment (such as a Raspberry Pi, small VPS, or shared hosting), there are several strategies to significantly reduce memory consumption.
-
-On a Raspberry Pi 4 (arm64) with version v0.3.10, these optimizations reduced idle memory consumption from >1GB to ~200MB (as observed with `docker container stats`).
-
----
-
-## Quick Start
-
-Set the following environment variables for immediate RAM savings:
-
-```bash
-# Use external embedding instead of local SentenceTransformers
-RAG_EMBEDDING_ENGINE=ollama
-
-# Use external Speech-to-Text instead of local Whisper
-AUDIO_STT_ENGINE=openai
-```
-
-:::tip
-
-These settings can also be configured in the **Admin Panel > Settings** interface - set RAG embedding to Ollama or OpenAI, and Speech-to-Text to OpenAI or WebAPI.
-
-:::
-
----
-
-## Why Does Open WebUI Use So Much RAM?
-
-Much of the memory consumption comes from locally loaded ML models. Even when using an external LLM (OpenAI or separate Ollama instance), Open WebUI may load additional models for:
-
-| Feature | Default | RAM Impact | Solution |
-|---------|---------|------------|----------|
-| **RAG Embedding** | Local SentenceTransformers | ~500-800MB | Use Ollama or OpenAI embeddings |
-| **Speech-to-Text** | Local Whisper | ~300-500MB | Use OpenAI or WebAPI |
-| **Reranking** | Disabled | ~200-400MB when enabled | Keep disabled or use external |
-| **Image Generation** | Disabled | Variable | Keep disabled if not needed |
-
----
-
-## ⚙️ Environment Variables for RAM Reduction
-
-### Offload Embedding to External Service
-
-The biggest RAM saver is using an external embedding engine:
-
-```bash
-# Option 1: Use Ollama for embeddings (if you have Ollama running separately)
-RAG_EMBEDDING_ENGINE=ollama
-
-# Option 2: Use OpenAI for embeddings
-RAG_EMBEDDING_ENGINE=openai
-OPENAI_API_KEY=your-api-key
-```
-
-### Offload Speech-to-Text
-
-Local Whisper models consume significant RAM:
-
-```bash
-# Use OpenAI's Whisper API
-AUDIO_STT_ENGINE=openai
-
-# Or use browser-based WebAPI (no external service needed)
-AUDIO_STT_ENGINE=webapi
-```
-
-### Disable Unused Features
-
-Disable features you don't need to prevent model loading:
-
-```bash
-# Disable image generation (prevents loading image models)
-ENABLE_IMAGE_GENERATION=False
-
-# Disable code execution (reduces overhead)
-ENABLE_CODE_EXECUTION=False
-
-# Disable code interpreter
-ENABLE_CODE_INTERPRETER=False
-```
-
-### Reduce Background Task Overhead
-
-These settings reduce memory usage from background operations:
-
-```bash
-# Disable autocomplete (high resource usage)
-ENABLE_AUTOCOMPLETE_GENERATION=False
-
-# Disable automatic title generation
-ENABLE_TITLE_GENERATION=False
-
-# Disable tag generation
-ENABLE_TAGS_GENERATION=False
-
-# Disable follow-up suggestions
-ENABLE_FOLLOW_UP_GENERATION=False
-```
-
-### Database and Cache Optimization
-
-```bash
-# Disable real-time chat saving (reduces database overhead)
-ENABLE_REALTIME_CHAT_SAVE=False
-
-# Reduce thread pool size for low-resource systems
-THREAD_POOL_SIZE=10
-```
-
-### Vector Database Multitenancy
-
-If using Milvus or Qdrant, enable multitenancy mode to reduce RAM:
-
-```bash
-# For Milvus
-ENABLE_MILVUS_MULTITENANCY_MODE=True
-
-# For Qdrant
-ENABLE_QDRANT_MULTITENANCY_MODE=True
-```
-
----
-
-## 🚀 Recommended Minimal Configuration
-
-For extremely RAM-constrained environments, use this combined configuration:
-
-```bash
-# Offload ML models to external services
-RAG_EMBEDDING_ENGINE=ollama
-AUDIO_STT_ENGINE=openai
-
-# Disable all non-essential features
-ENABLE_IMAGE_GENERATION=False
-ENABLE_CODE_EXECUTION=False
-ENABLE_CODE_INTERPRETER=False
-ENABLE_AUTOCOMPLETE_GENERATION=False
-ENABLE_TITLE_GENERATION=False
-ENABLE_TAGS_GENERATION=False
-ENABLE_FOLLOW_UP_GENERATION=False
-
-# Reduce worker overhead
-THREAD_POOL_SIZE=10
-```
-
----
-
-## 💡 Additional Tips
-
-- **Monitor Memory Usage**: Use `docker container stats` or `htop` to monitor RAM consumption
-- **Restart After Changes**: Environment variable changes require a container restart
-- **Fresh Deployments**: Some environment variables only take effect on fresh deployments without an existing `config.json`
-- **Consider Alternatives**: For very constrained systems, consider running Open WebUI on a more capable machine and accessing it remotely
-
----
-
-## Related Guides
-
-- [Improve Local LLM Performance](/tutorials/tips/improve-performance-local) - For optimizing performance without reducing features
-- [Environment Variable Configuration](/getting-started/env-configuration) - Complete list of all configuration options
-
diff --git a/docs/tutorials/tips/sqlite-database.md b/docs/tutorials/tips/sqlite-database.md
index 9a4e8a8750..1310959582 100644
--- a/docs/tutorials/tips/sqlite-database.md
+++ b/docs/tutorials/tips/sqlite-database.md
@@ -10,7 +10,7 @@ This tutorial is a community contribution and is not supported by the Open WebUI
:::
> [!WARNING]
-> This documentation was created/updated based on version 0.6.42 and updated for recent migrations.
+> This documentation was created/updated based on version 0.7.0 and updated for recent migrations.
## Open-WebUI Internal SQLite Database
@@ -764,6 +764,19 @@ When these are set and a full `DATABASE_URL` is **not** explicitly defined, Open
:::
+:::warning Migrating Existing Data to SQLCipher
+
+**Open WebUI does not support automatic migration from an unencrypted SQLite database to an encrypted SQLCipher database.** If you enable SQLCipher on an existing installation, the application will fail to read your existing unencrypted data.
+
+To use SQLCipher with existing data, you must either:
+
+1. **Start fresh** - Enable SQLCipher on a new installation and have users export/re-import their chats manually
+2. **Manual database migration** - Use external SQLite/SQLCipher tools to export data from the unencrypted database and import it into a new encrypted database (advanced users only)
+3. **Use filesystem-level encryption** - Consider alternatives like LUKS (Linux) or BitLocker (Windows) for at-rest encryption without database-level changes
+4. **Switch to PostgreSQL** - For multi-user deployments, PostgreSQL with TLS provides encryption in transit and can be combined with encrypted storage
+
+:::
+
### Related Database Environment Variables
| Variable | Default | Description |
diff --git a/static/images/folder-demo.gif b/static/images/folder-demo.gif
deleted file mode 100644
index 9623d466ed..0000000000
Binary files a/static/images/folder-demo.gif and /dev/null differ
diff --git a/static/images/tag-demo.gif b/static/images/tag-demo.gif
deleted file mode 100644
index 56b4a092dc..0000000000
Binary files a/static/images/tag-demo.gif and /dev/null differ