open-webui
diff --git a/‎docs/faq.mdx‎
Lines changed: 60 additions & 0 deletions b/‎docs/faq.mdx‎
Lines changed: 60 additions & 0 deletions
diff --git a/‎docs/features/audio/speech-to-text/env-variables.md‎
Lines changed: 117 additions & 13 deletions b/‎docs/features/audio/speech-to-text/env-variables.md‎
Lines changed: 117 additions & 13 deletions
diff --git a/‎docs/features/audio/speech-to-text/mistral-voxtral-integration.md‎
Lines changed: 125 additions & 0 deletions b/‎docs/features/audio/speech-to-text/mistral-voxtral-integration.md‎
Lines changed: 125 additions & 0 deletions
@@ -128,6 +128,10 @@ Everything you need to run Open WebUI, including your data, remains within your
 docker run -d -p 3000:8080 -e HF_ENDPOINT=https://hf-mirror.com/ --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
 ```
 
+### Q: Why are my reasoning model's thinking blocks showing as raw text instead of being hidden?
+
+**A:** This happens if the model's thinking tags are not recognized by Open WebUI. You can customize the tags in the model's Advanced Parameters. For more details, see the **[Reasoning & Thinking Models](/features/chat-features/reasoning-models)** guide.
+
 ### Q: RAG with Open WebUI is very bad or not working at all. Why?
 
 **A:** If you're using **Ollama**, be aware that Ollama sets the context length to **2048 tokens by default**. This means that none of the retrieved data might be used because it doesn't fit within the available context window.
@@ -136,10 +140,66 @@ To improve the performance of Retrieval-Augmented Generation (**RAG**) with Open
 
 To do this, configure your **Ollama model params** to allow a larger context window. You can check and modify this setting in your chat directly or from model editor page to enhance the RAG experience significantly.
 
+### Q: I asked the model what it is and it gave the wrong answer. Is Open WebUI routing to the wrong model?
+
+**A:** No—**LLMs do not reliably know their own identity.** When you ask a model "What model are you?" or "Are you GPT-4?", the response is not a system diagnostic. It's simply the model generating text based on patterns in its training data.
+
+Models frequently:
+- Claim to be a different model (e.g., a Llama model claiming to be ChatGPT)
+- Give outdated information about themselves
+- Hallucinate version numbers or capabilities
+- Change their answer depending on how you phrase the question
+
+**To verify which model you're actually using:**
+1. Check the model selector in the Open WebUI interface
+2. Look at the **Admin Panel > Settings > Connections** to confirm your API endpoints
+3. Check your provider's dashboard/logs for the actual API calls being made
+
+Asking the model itself is **not** a valid way to diagnose routing issues. If you suspect a configuration problem, check your connection settings and API keys instead.
+
+### Q: But why can models on official chat interfaces (like ChatGPT or Claude.ai) correctly identify themselves?
+
+**A:** Because the provider **injects a system prompt** that explicitly tells the model what it is. When you use ChatGPT, OpenAI's interface includes a hidden system message like "You are ChatGPT, a large language model trained by OpenAI..." before your conversation begins.
+
+The model isn't "aware" of itself—it's simply been instructed to claim a specific identity. You can do the same thing in Open WebUI by adding a system prompt to your model configuration (e.g., "You are Llama 3.3 70B..."). The model will then confidently repeat whatever identity you've told it to claim.
+
+This is also why the same model accessed through different interfaces might give different answers about its identity—it depends entirely on what system prompt (if any) was provided.
+
+### Q: Why am I seeing multiple API requests when I only send one message? Why is my token usage higher than expected?
+
+**A:** Open WebUI uses **Task Models** to power background features that enhance your chat experience. When you send a single message, additional API calls may be made for:
+
+- **Title Generation**: Automatically generating a title for new chats
+- **Tag Generation**: Auto-tagging chats for organization
+- **Query Generation**: Creating optimized search queries for RAG (when you attach files or knowledge)
+- **Web Search Queries**: Generating search terms when web search is enabled
+- **Autocomplete Suggestions**: If enabled
+
+By default, these tasks use the **same model** you're chatting with. If you're using an expensive API model (like GPT-4 or Claude), this can significantly increase your costs.
+
+**To reduce API costs:**
+1. Go to **Admin Panel > Settings > Interface** (for title/tag generation settings)
+2. Configure a **Task Model** under **Admin Panel > Settings > Models** to use a smaller, cheaper model (like GPT-4o-mini) or a local model for background tasks
+3. Disable features you don't need (auto-title, auto-tags, etc.)
+
+:::tip Cost-Saving Recommendation
+Set your Task Model to a fast, inexpensive model (or a local model via Ollama) while keeping your primary chat model as a more capable one. This gives you the best of both worlds: smart responses for your conversations, cheap/free processing for background tasks.
+:::
+
+For more optimization tips, see the **[Performance Tips Guide](tutorials/tips/performance)**.
+
 ### Q: Is MCP (Model Context Protocol) supported in Open WebUI?
 
 **A:** Yes, Open WebUI now includes **native support for MCP Streamable HTTP**, enabling direct, first-class integration with MCP tools that communicate over the standard HTTP transport. For any **other MCP transports or non-HTTP implementations**, you should use our official proxy adapter, **MCPO**, available at 👉 [https://github.com/open-webui/mcpo](https://github.com/open-webui/mcpo). MCPO provides a unified OpenAPI-compatible layer that bridges alternative MCP transports into Open WebUI safely and consistently. This architecture ensures maximum compatibility, strict security boundaries, and predictable tool behavior across different environments while keeping Open WebUI backend-agnostic and maintainable.
 
+### Q: Why doesn't Open WebUI support [Specific Provider]'s latest API (e.g. OpenAI Responses API)?
+
+**A:** Open WebUI is built around **universal protocols**, not specific providers. Our core philosophy is to support standard, widely-adopted APIs like the **OpenAI Chat Completions protocol**. 
+
+This protocol-centric design ensures that Open WebUI remains backend-agnostic and compatible with dozens of providers (like OpenRouter, LiteLLM, vLLM, and Groq) simultaneously. We avoid implementing proprietary, provider-specific APIs (such as OpenAI's stateful Responses API or Anthropic's Messages API) to prevent unsustainable architectural bloat and to maintain a truly open ecosystem. 
+
+If you need functionality exclusive to a proprietary API (like OpenAI's hidden reasoning traces), we recommend using a proxy like **LiteLLM** or **OpenRouter**, which translate those proprietary features into the standard Chat Completions protocol that Open WebUI supports.
+
 ### Q: Why is the frontend integrated into the same Docker image? Isn't this unscalable or problematic?
 
 The assumption that bundling the frontend with the backend is unscalable comes from a misunderstanding of how modern Single-Page Applications work. Open WebUI’s frontend is a static SPA, meaning it consists only of HTML, CSS, and JavaScript files with no runtime coupling to the backend. Because these files are static, lightweight, and require no separate server, including them in the same image has no impact on scalability. This approach simplifies deployment, ensures every replica serves the exact same assets, and eliminates unnecessary moving parts. If you prefer, you can still host the SPA on any CDN or static hosting service and point it to a remote backend, but packaging both together is the standard and most practical method for containerized SPAs.
 
@@ -11,16 +11,120 @@ For a complete list of all Open WebUI environment variables, see the [Environmen
 
 :::
 
-The following is a summary of the environment variables for speech to text (STT).
-
-# Environment Variables For Speech To Text (STT)
-
-| Variable | Description |
-|----------|-------------|
-| `WHISPER_MODEL` | Sets the Whisper model to use for local Speech-to-Text |
-| `WHISPER_MODEL_DIR` | Specifies the directory to store Whisper model files |
-| `WHISPER_LANGUAGE` | Specifies the ISO 639-1 (ISO 639-2 for Hawaiian and Cantonese) Speech-to-Text language to use for Whisper (language is predicted unless set) |
-| `AUDIO_STT_ENGINE` | Specifies the Speech-to-Text engine to use (empty for local Whisper, or `openai`) |
-| `AUDIO_STT_MODEL` | Specifies the Speech-to-Text model for OpenAI-compatible endpoints |
-| `AUDIO_STT_OPENAI_API_BASE_URL` | Sets the OpenAI-compatible base URL for Speech-to-Text |
-| `AUDIO_STT_OPENAI_API_KEY` | Sets the OpenAI API key for Speech-to-Text |
+The following is a summary of the environment variables for speech to text (STT) and text to speech (TTS).
+
+:::tip UI Configuration
+Most of these settings can also be configured in the **Admin Panel → Settings → Audio** tab. Environment variables take precedence on startup but can be overridden in the UI.
+:::
+
+## Speech To Text (STT) Environment Variables
+
+### Local Whisper
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `WHISPER_MODEL` | Whisper model size | `base` |
+| `WHISPER_MODEL_DIR` | Directory to store Whisper model files | `{CACHE_DIR}/whisper/models` |
+| `WHISPER_COMPUTE_TYPE` | Compute type for inference (see note below) | `int8` |
+| `WHISPER_LANGUAGE` | ISO 639-1 language code (empty = auto-detect) | empty |
+| `WHISPER_MULTILINGUAL` | Use the multilingual Whisper model | `false` |
+| `WHISPER_MODEL_AUTO_UPDATE` | Auto-download model updates | `false` |
+| `WHISPER_VAD_FILTER` | Enable Voice Activity Detection filter | `false` |
+
+:::info WHISPER_COMPUTE_TYPE Options
+- `int8` — CPU default, fastest but may not work on older GPUs
+- `float16` — **Recommended for CUDA/GPU**
+- `int8_float16` — Hybrid mode (int8 weights, float16 computation)
+- `float32` — Maximum compatibility, slowest
+
+If using the `:cuda` Docker image with an older GPU, set `WHISPER_COMPUTE_TYPE=float16` to avoid errors.
+:::
+
+### OpenAI-Compatible STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_ENGINE` | STT engine: empty (local Whisper), `openai`, `azure`, `deepgram`, `mistral` | empty |
+| `AUDIO_STT_MODEL` | STT model for external providers | empty |
+| `AUDIO_STT_OPENAI_API_BASE_URL` | OpenAI-compatible API base URL | `https://api.openai.com/v1` |
+| `AUDIO_STT_OPENAI_API_KEY` | OpenAI API key | empty |
+| `AUDIO_STT_SUPPORTED_CONTENT_TYPES` | Comma-separated list of supported audio MIME types | empty |
+
+### Azure STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_AZURE_API_KEY` | Azure Cognitive Services API key | empty |
+| `AUDIO_STT_AZURE_REGION` | Azure region | `eastus` |
+| `AUDIO_STT_AZURE_LOCALES` | Comma-separated locales (e.g., `en-US,de-DE`) | auto |
+| `AUDIO_STT_AZURE_BASE_URL` | Custom Azure base URL (optional) | empty |
+| `AUDIO_STT_AZURE_MAX_SPEAKERS` | Max speakers for diarization | `3` |
+
+### Deepgram STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `DEEPGRAM_API_KEY` | Deepgram API key | empty |
+
+### Mistral STT
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_MISTRAL_API_KEY` | Mistral API key | empty |
+| `AUDIO_STT_MISTRAL_API_BASE_URL` | Mistral API base URL | `https://api.mistral.ai/v1` |
+| `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS` | Use chat completions endpoint | `false` |
+
+## Text To Speech (TTS) Environment Variables
+
+### General TTS
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_TTS_ENGINE` | TTS engine: empty (disabled), `openai`, `elevenlabs`, `azure`, `transformers` | empty |
+| `AUDIO_TTS_MODEL` | TTS model | `tts-1` |
+| `AUDIO_TTS_VOICE` | Default voice | `alloy` |
+| `AUDIO_TTS_SPLIT_ON` | Split text on: `punctuation` or `none` | `punctuation` |
+| `AUDIO_TTS_API_KEY` | API key for ElevenLabs or Azure TTS | empty |
+
+### OpenAI-Compatible TTS
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_TTS_OPENAI_API_BASE_URL` | OpenAI-compatible TTS API base URL | `https://api.openai.com/v1` |
+| `AUDIO_TTS_OPENAI_API_KEY` | OpenAI TTS API key | empty |
+| `AUDIO_TTS_OPENAI_PARAMS` | Additional JSON params for OpenAI TTS | empty |
+
+### Azure TTS
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_TTS_AZURE_SPEECH_REGION` | Azure Speech region | `eastus` |
+| `AUDIO_TTS_AZURE_SPEECH_BASE_URL` | Custom Azure Speech base URL (optional) | empty |
+| `AUDIO_TTS_AZURE_SPEECH_OUTPUT_FORMAT` | Audio output format | `audio-24khz-160kbitrate-mono-mp3` |
+
+## Tips for Configuring Audio
+
+### Using Local Whisper STT
+
+For GPU acceleration issues or older GPUs, try setting:
+```yaml
+environment:
+  - WHISPER_COMPUTE_TYPE=float16
+```
+
+### Using External TTS Services
+
+When running Open WebUI in Docker with an external TTS service:
+
+```yaml
+environment:
+  - AUDIO_TTS_ENGINE=openai
+  - AUDIO_TTS_OPENAI_API_BASE_URL=http://host.docker.internal:5050/v1
+  - AUDIO_TTS_OPENAI_API_KEY=your-api-key
+```
+
+:::tip
+Use `host.docker.internal` on Docker Desktop (Windows/Mac) to access services on the host. On Linux, use the host IP or container networking.
+:::
+
+For troubleshooting audio issues, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
@@ -0,0 +1,125 @@
+---
+sidebar_position: 2
+title: "Mistral Voxtral STT"
+---
+
+# Using Mistral Voxtral for Speech-to-Text
+
+This guide covers how to use Mistral's Voxtral model for Speech-to-Text with Open WebUI. Voxtral is Mistral's speech-to-text model that provides accurate transcription.
+
+## Requirements
+
+- A Mistral API key
+- Open WebUI installed and running
+
+## Quick Setup (UI)
+
+1. Click your **profile icon** (bottom-left corner)
+2. Select **Admin Panel**
+3. Click **Settings** → **Audio** tab
+4. Configure the following:
+
+| Setting | Value |
+|---------|-------|
+| **Speech-to-Text Engine** | `MistralAI` |
+| **API Key** | Your Mistral API key |
+| **STT Model** | `voxtral-mini-latest` (or leave empty for default) |
+
+5. Click **Save**
+
+## Available Models
+
+| Model | Description |
+|-------|-------------|
+| `voxtral-mini-latest` | Default transcription model (recommended) |
+
+## Environment Variables Setup
+
+If you prefer to configure via environment variables:
+
+```yaml
+services:
+  open-webui:
+    image: ghcr.io/open-webui/open-webui:main
+    environment:
+      - AUDIO_STT_ENGINE=mistral
+      - AUDIO_STT_MISTRAL_API_KEY=your-mistral-api-key
+      - AUDIO_STT_MODEL=voxtral-mini-latest
+    # ... other configuration
+```
+
+### All Mistral STT Environment Variables
+
+| Variable | Description | Default |
+|----------|-------------|---------|
+| `AUDIO_STT_ENGINE` | Set to `mistral` | empty (uses local Whisper) |
+| `AUDIO_STT_MISTRAL_API_KEY` | Your Mistral API key | empty |
+| `AUDIO_STT_MISTRAL_API_BASE_URL` | Mistral API base URL | `https://api.mistral.ai/v1` |
+| `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS` | Use chat completions endpoint | `false` |
+| `AUDIO_STT_MODEL` | STT model | `voxtral-mini-latest` |
+
+## Transcription Methods
+
+Mistral supports two transcription methods:
+
+### Standard Transcription (Default)
+Uses the dedicated transcription endpoint. This is the recommended method.
+
+### Chat Completions Method
+Set `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS=true` to use Mistral's chat completions API for transcription. This method:
+- Requires audio in mp3 or wav format (automatic conversion is attempted)
+- May provide different results than the standard endpoint
+
+## Using STT
+
+1. Click the **microphone icon** in the chat input
+2. Speak your message
+3. Click the microphone again or wait for silence detection
+4. Your speech will be transcribed and appear in the input box
+
+## Supported Audio Formats
+
+Voxtral accepts common audio formats. The system defaults to accepting `audio/*` and `video/webm`.
+
+If using the chat completions method, audio is automatically converted to mp3.
+
+## Troubleshooting
+
+### API Key Errors
+
+If you see "Mistral API key is required":
+1. Verify your API key is entered correctly
+2. Check the API key hasn't expired
+3. Ensure your Mistral account has API access
+
+### Transcription Not Working
+
+1. Check container logs: `docker logs open-webui -f`
+2. Verify the STT Engine is set to `MistralAI`
+3. Try the standard transcription method (disable chat completions)
+
+### Audio Format Issues
+
+If using chat completions method and audio conversion fails:
+- Ensure FFmpeg is available in the container
+- Try recording in a different format (wav or mp3)
+- Switch to the standard transcription method
+
+For more troubleshooting, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
+
+## Comparison with Other STT Options
+
+| Feature | Mistral Voxtral | OpenAI Whisper | Local Whisper |
+|---------|-----------------|----------------|---------------|
+| **Cost** | Per-minute pricing | Per-minute pricing | Free |
+| **Privacy** | Audio sent to Mistral | Audio sent to OpenAI | Audio stays local |
+| **Model Options** | voxtral-mini-latest | whisper-1 | tiny → large |
+| **GPU Required** | No | No | Recommended |
+
+## Cost Considerations
+
+Mistral charges per minute of audio for STT. Check [Mistral's pricing page](https://mistral.ai/products/la-plateforme#pricing) for current rates.
+
+:::tip
+For free STT, use **Local Whisper** (the default) or the browser's **Web API** for basic transcription.
+:::