Skip to content

Commit c39281d

Browse files
authored
Merge pull request #923 from open-webui/dev
2 parents 970887c + 82ab559 commit c39281d

File tree

114 files changed

+4964
-752
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

114 files changed

+4964
-752
lines changed

docs/faq.mdx

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,10 @@ Everything you need to run Open WebUI, including your data, remains within your
128128
docker run -d -p 3000:8080 -e HF_ENDPOINT=https://hf-mirror.com/ --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
129129
```
130130

131+
### Q: Why are my reasoning model's thinking blocks showing as raw text instead of being hidden?
132+
133+
**A:** This happens if the model's thinking tags are not recognized by Open WebUI. You can customize the tags in the model's Advanced Parameters. For more details, see the **[Reasoning & Thinking Models](/features/chat-features/reasoning-models)** guide.
134+
131135
### Q: RAG with Open WebUI is very bad or not working at all. Why?
132136

133137
**A:** If you're using **Ollama**, be aware that Ollama sets the context length to **2048 tokens by default**. This means that none of the retrieved data might be used because it doesn't fit within the available context window.
@@ -136,10 +140,66 @@ To improve the performance of Retrieval-Augmented Generation (**RAG**) with Open
136140

137141
To do this, configure your **Ollama model params** to allow a larger context window. You can check and modify this setting in your chat directly or from model editor page to enhance the RAG experience significantly.
138142

143+
### Q: I asked the model what it is and it gave the wrong answer. Is Open WebUI routing to the wrong model?
144+
145+
**A:** No—**LLMs do not reliably know their own identity.** When you ask a model "What model are you?" or "Are you GPT-4?", the response is not a system diagnostic. It's simply the model generating text based on patterns in its training data.
146+
147+
Models frequently:
148+
- Claim to be a different model (e.g., a Llama model claiming to be ChatGPT)
149+
- Give outdated information about themselves
150+
- Hallucinate version numbers or capabilities
151+
- Change their answer depending on how you phrase the question
152+
153+
**To verify which model you're actually using:**
154+
1. Check the model selector in the Open WebUI interface
155+
2. Look at the **Admin Panel > Settings > Connections** to confirm your API endpoints
156+
3. Check your provider's dashboard/logs for the actual API calls being made
157+
158+
Asking the model itself is **not** a valid way to diagnose routing issues. If you suspect a configuration problem, check your connection settings and API keys instead.
159+
160+
### Q: But why can models on official chat interfaces (like ChatGPT or Claude.ai) correctly identify themselves?
161+
162+
**A:** Because the provider **injects a system prompt** that explicitly tells the model what it is. When you use ChatGPT, OpenAI's interface includes a hidden system message like "You are ChatGPT, a large language model trained by OpenAI..." before your conversation begins.
163+
164+
The model isn't "aware" of itself—it's simply been instructed to claim a specific identity. You can do the same thing in Open WebUI by adding a system prompt to your model configuration (e.g., "You are Llama 3.3 70B..."). The model will then confidently repeat whatever identity you've told it to claim.
165+
166+
This is also why the same model accessed through different interfaces might give different answers about its identity—it depends entirely on what system prompt (if any) was provided.
167+
168+
### Q: Why am I seeing multiple API requests when I only send one message? Why is my token usage higher than expected?
169+
170+
**A:** Open WebUI uses **Task Models** to power background features that enhance your chat experience. When you send a single message, additional API calls may be made for:
171+
172+
- **Title Generation**: Automatically generating a title for new chats
173+
- **Tag Generation**: Auto-tagging chats for organization
174+
- **Query Generation**: Creating optimized search queries for RAG (when you attach files or knowledge)
175+
- **Web Search Queries**: Generating search terms when web search is enabled
176+
- **Autocomplete Suggestions**: If enabled
177+
178+
By default, these tasks use the **same model** you're chatting with. If you're using an expensive API model (like GPT-4 or Claude), this can significantly increase your costs.
179+
180+
**To reduce API costs:**
181+
1. Go to **Admin Panel > Settings > Interface** (for title/tag generation settings)
182+
2. Configure a **Task Model** under **Admin Panel > Settings > Models** to use a smaller, cheaper model (like GPT-4o-mini) or a local model for background tasks
183+
3. Disable features you don't need (auto-title, auto-tags, etc.)
184+
185+
:::tip Cost-Saving Recommendation
186+
Set your Task Model to a fast, inexpensive model (or a local model via Ollama) while keeping your primary chat model as a more capable one. This gives you the best of both worlds: smart responses for your conversations, cheap/free processing for background tasks.
187+
:::
188+
189+
For more optimization tips, see the **[Performance Tips Guide](tutorials/tips/performance)**.
190+
139191
### Q: Is MCP (Model Context Protocol) supported in Open WebUI?
140192

141193
**A:** Yes, Open WebUI now includes **native support for MCP Streamable HTTP**, enabling direct, first-class integration with MCP tools that communicate over the standard HTTP transport. For any **other MCP transports or non-HTTP implementations**, you should use our official proxy adapter, **MCPO**, available at 👉 [https://github.com/open-webui/mcpo](https://github.com/open-webui/mcpo). MCPO provides a unified OpenAPI-compatible layer that bridges alternative MCP transports into Open WebUI safely and consistently. This architecture ensures maximum compatibility, strict security boundaries, and predictable tool behavior across different environments while keeping Open WebUI backend-agnostic and maintainable.
142194

195+
### Q: Why doesn't Open WebUI support [Specific Provider]'s latest API (e.g. OpenAI Responses API)?
196+
197+
**A:** Open WebUI is built around **universal protocols**, not specific providers. Our core philosophy is to support standard, widely-adopted APIs like the **OpenAI Chat Completions protocol**.
198+
199+
This protocol-centric design ensures that Open WebUI remains backend-agnostic and compatible with dozens of providers (like OpenRouter, LiteLLM, vLLM, and Groq) simultaneously. We avoid implementing proprietary, provider-specific APIs (such as OpenAI's stateful Responses API or Anthropic's Messages API) to prevent unsustainable architectural bloat and to maintain a truly open ecosystem.
200+
201+
If you need functionality exclusive to a proprietary API (like OpenAI's hidden reasoning traces), we recommend using a proxy like **LiteLLM** or **OpenRouter**, which translate those proprietary features into the standard Chat Completions protocol that Open WebUI supports.
202+
143203
### Q: Why is the frontend integrated into the same Docker image? Isn't this unscalable or problematic?
144204

145205
The assumption that bundling the frontend with the backend is unscalable comes from a misunderstanding of how modern Single-Page Applications work. Open WebUI’s frontend is a static SPA, meaning it consists only of HTML, CSS, and JavaScript files with no runtime coupling to the backend. Because these files are static, lightweight, and require no separate server, including them in the same image has no impact on scalability. This approach simplifies deployment, ensures every replica serves the exact same assets, and eliminates unnecessary moving parts. If you prefer, you can still host the SPA on any CDN or static hosting service and point it to a remote backend, but packaging both together is the standard and most practical method for containerized SPAs.

docs/features/audio/speech-to-text/env-variables.md

Lines changed: 117 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,120 @@ For a complete list of all Open WebUI environment variables, see the [Environmen
1111

1212
:::
1313

14-
The following is a summary of the environment variables for speech to text (STT).
15-
16-
# Environment Variables For Speech To Text (STT)
17-
18-
| Variable | Description |
19-
|----------|-------------|
20-
| `WHISPER_MODEL` | Sets the Whisper model to use for local Speech-to-Text |
21-
| `WHISPER_MODEL_DIR` | Specifies the directory to store Whisper model files |
22-
| `WHISPER_LANGUAGE` | Specifies the ISO 639-1 (ISO 639-2 for Hawaiian and Cantonese) Speech-to-Text language to use for Whisper (language is predicted unless set) |
23-
| `AUDIO_STT_ENGINE` | Specifies the Speech-to-Text engine to use (empty for local Whisper, or `openai`) |
24-
| `AUDIO_STT_MODEL` | Specifies the Speech-to-Text model for OpenAI-compatible endpoints |
25-
| `AUDIO_STT_OPENAI_API_BASE_URL` | Sets the OpenAI-compatible base URL for Speech-to-Text |
26-
| `AUDIO_STT_OPENAI_API_KEY` | Sets the OpenAI API key for Speech-to-Text |
14+
The following is a summary of the environment variables for speech to text (STT) and text to speech (TTS).
15+
16+
:::tip UI Configuration
17+
Most of these settings can also be configured in the **Admin Panel → Settings → Audio** tab. Environment variables take precedence on startup but can be overridden in the UI.
18+
:::
19+
20+
## Speech To Text (STT) Environment Variables
21+
22+
### Local Whisper
23+
24+
| Variable | Description | Default |
25+
|----------|-------------|---------|
26+
| `WHISPER_MODEL` | Whisper model size | `base` |
27+
| `WHISPER_MODEL_DIR` | Directory to store Whisper model files | `{CACHE_DIR}/whisper/models` |
28+
| `WHISPER_COMPUTE_TYPE` | Compute type for inference (see note below) | `int8` |
29+
| `WHISPER_LANGUAGE` | ISO 639-1 language code (empty = auto-detect) | empty |
30+
| `WHISPER_MULTILINGUAL` | Use the multilingual Whisper model | `false` |
31+
| `WHISPER_MODEL_AUTO_UPDATE` | Auto-download model updates | `false` |
32+
| `WHISPER_VAD_FILTER` | Enable Voice Activity Detection filter | `false` |
33+
34+
:::info WHISPER_COMPUTE_TYPE Options
35+
- `int8` — CPU default, fastest but may not work on older GPUs
36+
- `float16`**Recommended for CUDA/GPU**
37+
- `int8_float16` — Hybrid mode (int8 weights, float16 computation)
38+
- `float32` — Maximum compatibility, slowest
39+
40+
If using the `:cuda` Docker image with an older GPU, set `WHISPER_COMPUTE_TYPE=float16` to avoid errors.
41+
:::
42+
43+
### OpenAI-Compatible STT
44+
45+
| Variable | Description | Default |
46+
|----------|-------------|---------|
47+
| `AUDIO_STT_ENGINE` | STT engine: empty (local Whisper), `openai`, `azure`, `deepgram`, `mistral` | empty |
48+
| `AUDIO_STT_MODEL` | STT model for external providers | empty |
49+
| `AUDIO_STT_OPENAI_API_BASE_URL` | OpenAI-compatible API base URL | `https://api.openai.com/v1` |
50+
| `AUDIO_STT_OPENAI_API_KEY` | OpenAI API key | empty |
51+
| `AUDIO_STT_SUPPORTED_CONTENT_TYPES` | Comma-separated list of supported audio MIME types | empty |
52+
53+
### Azure STT
54+
55+
| Variable | Description | Default |
56+
|----------|-------------|---------|
57+
| `AUDIO_STT_AZURE_API_KEY` | Azure Cognitive Services API key | empty |
58+
| `AUDIO_STT_AZURE_REGION` | Azure region | `eastus` |
59+
| `AUDIO_STT_AZURE_LOCALES` | Comma-separated locales (e.g., `en-US,de-DE`) | auto |
60+
| `AUDIO_STT_AZURE_BASE_URL` | Custom Azure base URL (optional) | empty |
61+
| `AUDIO_STT_AZURE_MAX_SPEAKERS` | Max speakers for diarization | `3` |
62+
63+
### Deepgram STT
64+
65+
| Variable | Description | Default |
66+
|----------|-------------|---------|
67+
| `DEEPGRAM_API_KEY` | Deepgram API key | empty |
68+
69+
### Mistral STT
70+
71+
| Variable | Description | Default |
72+
|----------|-------------|---------|
73+
| `AUDIO_STT_MISTRAL_API_KEY` | Mistral API key | empty |
74+
| `AUDIO_STT_MISTRAL_API_BASE_URL` | Mistral API base URL | `https://api.mistral.ai/v1` |
75+
| `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS` | Use chat completions endpoint | `false` |
76+
77+
## Text To Speech (TTS) Environment Variables
78+
79+
### General TTS
80+
81+
| Variable | Description | Default |
82+
|----------|-------------|---------|
83+
| `AUDIO_TTS_ENGINE` | TTS engine: empty (disabled), `openai`, `elevenlabs`, `azure`, `transformers` | empty |
84+
| `AUDIO_TTS_MODEL` | TTS model | `tts-1` |
85+
| `AUDIO_TTS_VOICE` | Default voice | `alloy` |
86+
| `AUDIO_TTS_SPLIT_ON` | Split text on: `punctuation` or `none` | `punctuation` |
87+
| `AUDIO_TTS_API_KEY` | API key for ElevenLabs or Azure TTS | empty |
88+
89+
### OpenAI-Compatible TTS
90+
91+
| Variable | Description | Default |
92+
|----------|-------------|---------|
93+
| `AUDIO_TTS_OPENAI_API_BASE_URL` | OpenAI-compatible TTS API base URL | `https://api.openai.com/v1` |
94+
| `AUDIO_TTS_OPENAI_API_KEY` | OpenAI TTS API key | empty |
95+
| `AUDIO_TTS_OPENAI_PARAMS` | Additional JSON params for OpenAI TTS | empty |
96+
97+
### Azure TTS
98+
99+
| Variable | Description | Default |
100+
|----------|-------------|---------|
101+
| `AUDIO_TTS_AZURE_SPEECH_REGION` | Azure Speech region | `eastus` |
102+
| `AUDIO_TTS_AZURE_SPEECH_BASE_URL` | Custom Azure Speech base URL (optional) | empty |
103+
| `AUDIO_TTS_AZURE_SPEECH_OUTPUT_FORMAT` | Audio output format | `audio-24khz-160kbitrate-mono-mp3` |
104+
105+
## Tips for Configuring Audio
106+
107+
### Using Local Whisper STT
108+
109+
For GPU acceleration issues or older GPUs, try setting:
110+
```yaml
111+
environment:
112+
- WHISPER_COMPUTE_TYPE=float16
113+
```
114+
115+
### Using External TTS Services
116+
117+
When running Open WebUI in Docker with an external TTS service:
118+
119+
```yaml
120+
environment:
121+
- AUDIO_TTS_ENGINE=openai
122+
- AUDIO_TTS_OPENAI_API_BASE_URL=http://host.docker.internal:5050/v1
123+
- AUDIO_TTS_OPENAI_API_KEY=your-api-key
124+
```
125+
126+
:::tip
127+
Use `host.docker.internal` on Docker Desktop (Windows/Mac) to access services on the host. On Linux, use the host IP or container networking.
128+
:::
129+
130+
For troubleshooting audio issues, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
Lines changed: 125 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,125 @@
1+
---
2+
sidebar_position: 2
3+
title: "Mistral Voxtral STT"
4+
---
5+
6+
# Using Mistral Voxtral for Speech-to-Text
7+
8+
This guide covers how to use Mistral's Voxtral model for Speech-to-Text with Open WebUI. Voxtral is Mistral's speech-to-text model that provides accurate transcription.
9+
10+
## Requirements
11+
12+
- A Mistral API key
13+
- Open WebUI installed and running
14+
15+
## Quick Setup (UI)
16+
17+
1. Click your **profile icon** (bottom-left corner)
18+
2. Select **Admin Panel**
19+
3. Click **Settings****Audio** tab
20+
4. Configure the following:
21+
22+
| Setting | Value |
23+
|---------|-------|
24+
| **Speech-to-Text Engine** | `MistralAI` |
25+
| **API Key** | Your Mistral API key |
26+
| **STT Model** | `voxtral-mini-latest` (or leave empty for default) |
27+
28+
5. Click **Save**
29+
30+
## Available Models
31+
32+
| Model | Description |
33+
|-------|-------------|
34+
| `voxtral-mini-latest` | Default transcription model (recommended) |
35+
36+
## Environment Variables Setup
37+
38+
If you prefer to configure via environment variables:
39+
40+
```yaml
41+
services:
42+
open-webui:
43+
image: ghcr.io/open-webui/open-webui:main
44+
environment:
45+
- AUDIO_STT_ENGINE=mistral
46+
- AUDIO_STT_MISTRAL_API_KEY=your-mistral-api-key
47+
- AUDIO_STT_MODEL=voxtral-mini-latest
48+
# ... other configuration
49+
```
50+
51+
### All Mistral STT Environment Variables
52+
53+
| Variable | Description | Default |
54+
|----------|-------------|---------|
55+
| `AUDIO_STT_ENGINE` | Set to `mistral` | empty (uses local Whisper) |
56+
| `AUDIO_STT_MISTRAL_API_KEY` | Your Mistral API key | empty |
57+
| `AUDIO_STT_MISTRAL_API_BASE_URL` | Mistral API base URL | `https://api.mistral.ai/v1` |
58+
| `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS` | Use chat completions endpoint | `false` |
59+
| `AUDIO_STT_MODEL` | STT model | `voxtral-mini-latest` |
60+
61+
## Transcription Methods
62+
63+
Mistral supports two transcription methods:
64+
65+
### Standard Transcription (Default)
66+
Uses the dedicated transcription endpoint. This is the recommended method.
67+
68+
### Chat Completions Method
69+
Set `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS=true` to use Mistral's chat completions API for transcription. This method:
70+
- Requires audio in mp3 or wav format (automatic conversion is attempted)
71+
- May provide different results than the standard endpoint
72+
73+
## Using STT
74+
75+
1. Click the **microphone icon** in the chat input
76+
2. Speak your message
77+
3. Click the microphone again or wait for silence detection
78+
4. Your speech will be transcribed and appear in the input box
79+
80+
## Supported Audio Formats
81+
82+
Voxtral accepts common audio formats. The system defaults to accepting `audio/*` and `video/webm`.
83+
84+
If using the chat completions method, audio is automatically converted to mp3.
85+
86+
## Troubleshooting
87+
88+
### API Key Errors
89+
90+
If you see "Mistral API key is required":
91+
1. Verify your API key is entered correctly
92+
2. Check the API key hasn't expired
93+
3. Ensure your Mistral account has API access
94+
95+
### Transcription Not Working
96+
97+
1. Check container logs: `docker logs open-webui -f`
98+
2. Verify the STT Engine is set to `MistralAI`
99+
3. Try the standard transcription method (disable chat completions)
100+
101+
### Audio Format Issues
102+
103+
If using chat completions method and audio conversion fails:
104+
- Ensure FFmpeg is available in the container
105+
- Try recording in a different format (wav or mp3)
106+
- Switch to the standard transcription method
107+
108+
For more troubleshooting, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
109+
110+
## Comparison with Other STT Options
111+
112+
| Feature | Mistral Voxtral | OpenAI Whisper | Local Whisper |
113+
|---------|-----------------|----------------|---------------|
114+
| **Cost** | Per-minute pricing | Per-minute pricing | Free |
115+
| **Privacy** | Audio sent to Mistral | Audio sent to OpenAI | Audio stays local |
116+
| **Model Options** | voxtral-mini-latest | whisper-1 | tiny → large |
117+
| **GPU Required** | No | No | Recommended |
118+
119+
## Cost Considerations
120+
121+
Mistral charges per minute of audio for STT. Check [Mistral's pricing page](https://mistral.ai/products/la-plateforme#pricing) for current rates.
122+
123+
:::tip
124+
For free STT, use **Local Whisper** (the default) or the browser's **Web API** for basic transcription.
125+
:::

0 commit comments

Comments
 (0)