Skip to content
Merged

0.7 #923

Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
153 commits
Select commit Hold shift + click to select a range
386696c
Merge pull request #917 from open-webui/main
Classic298 Dec 21, 2025
4fa260b
Merge pull request #918 from open-webui/main
Classic298 Dec 21, 2025
39717dc
new env var
Classic298 Dec 23, 2025
fcf09f5
update
Classic298 Dec 23, 2025
c74d73f
Merge pull request #1 from open-webui/dev
Classic298 Dec 26, 2025
7144aba
Merge pull request #921 from Classic298/dev
Classic298 Dec 26, 2025
747fcc0
Merge pull request #922 from open-webui/main
Classic298 Dec 27, 2025
10e68f0
Update env-configuration.mdx
Classic298 Dec 27, 2025
36a89f8
Update env-configuration.mdx
Classic298 Dec 27, 2025
9a2126e
Merge pull request #924 from Classic298/dev
Classic298 Dec 27, 2025
ec898db
api keys
Classic298 Dec 28, 2025
b2e8fea
Merge branch 'open-webui:dev' into dev
Classic298 Dec 28, 2025
4301e8a
Update multi-replica.mdx
Classic298 Dec 28, 2025
0169de9
Merge pull request #925 from Classic298/dev
Classic298 Dec 28, 2025
c0601b5
folders / projects and missing env vars
Classic298 Dec 28, 2025
f391e5f
Update env-configuration.mdx
Classic298 Dec 28, 2025
c982403
Update performance.md
Classic298 Dec 28, 2025
9906cf1
Update performance.md
Classic298 Dec 28, 2025
727f114
Update performance.md
Classic298 Dec 28, 2025
9c216c5
Merge pull request #926 from Classic298/dev
Classic298 Dec 28, 2025
bf51b74
mcp
Classic298 Dec 28, 2025
0a6c2a2
otel
Classic298 Dec 28, 2025
f15ee1b
WHISPER_COMPUTE_TYPE
Classic298 Dec 28, 2025
ee239fb
audio
Classic298 Dec 28, 2025
565bedd
Update index.mdx
Classic298 Dec 28, 2025
a3ccd35
Merge branch 'open-webui:dev' into dev
Classic298 Dec 28, 2025
f38fc68
Update index.mdx
Classic298 Dec 28, 2025
f9c89b1
Update index.mdx
Classic298 Dec 28, 2025
ef65b2b
Update index.mdx
Classic298 Dec 28, 2025
73b378f
fix
Classic298 Dec 28, 2025
77a3feb
Merge pull request #927 from Classic298/dev
Classic298 Dec 28, 2025
45224d7
audio
Classic298 Dec 29, 2025
2a02b7f
Merge pull request #929 from Classic298/dev
Classic298 Dec 29, 2025
312a599
brave
Classic298 Dec 30, 2025
689a25d
mcp
Classic298 Dec 30, 2025
8d89fba
Merge pull request #931 from Classic298/dev
Classic298 Dec 30, 2025
93f11ba
Merge pull request #930 from Classic298/brave
Classic298 Dec 30, 2025
e0b71f1
OR
Classic298 Dec 30, 2025
40933d6
Merge pull request #932 from Classic298/dev
Classic298 Dec 30, 2025
c2975c8
ldap
Classic298 Dec 30, 2025
2bdd76f
Merge pull request #933 from Classic298/dev
Classic298 Dec 30, 2025
431dfb2
Update connection-error.mdx
Classic298 Dec 30, 2025
4c0bba8
Merge pull request #934 from Classic298/dev
Classic298 Dec 30, 2025
ddd3b49
Merge pull request #936 from open-webui/main
Classic298 Dec 30, 2025
3ecd3ac
md header splitting
Classic298 Dec 30, 2025
5a6b025
Merge branch 'open-webui:dev' into dev
Classic298 Dec 30, 2025
2bda93a
Merge pull request #937 from Classic298/dev
Classic298 Dec 30, 2025
0733a99
web search troubleshooting
Classic298 Dec 31, 2025
87a7cd0
Merge pull request #938 from Classic298/dev
Classic298 Dec 31, 2025
b1579f2
model timeouts
Classic298 Dec 31, 2025
3f69b28
Merge pull request #939 from Classic298/dev
Classic298 Dec 31, 2025
5f84f77
env vars
Classic298 Jan 1, 2026
f1376ef
Update brave.md
Classic298 Jan 1, 2026
88165d8
image gen
Classic298 Jan 1, 2026
5736f54
image gen
Classic298 Jan 1, 2026
7f898d8
Update gemini.mdx
Classic298 Jan 1, 2026
2d1de0c
Merge pull request #940 from Classic298/dev
Classic298 Jan 1, 2026
8d5a257
Merge pull request #941 from open-webui/main
Classic298 Jan 1, 2026
e28b702
cloud
Classic298 Jan 2, 2026
e61b7c9
Merge pull request #943 from Classic298/dev
Classic298 Jan 2, 2026
9900549
Update performance.md
Classic298 Jan 2, 2026
cc80564
Merge pull request #944 from Classic298/dev
Classic298 Jan 2, 2026
79bbadf
Update usage.md
Classic298 Jan 2, 2026
448a3f1
Merge pull request #945 from Classic298/dev
Classic298 Jan 2, 2026
d1eb87d
filters
Classic298 Jan 2, 2026
3705b19
sqlite
Classic298 Jan 2, 2026
7fed0f9
Merge pull request #947 from Classic298/dev
Classic298 Jan 2, 2026
31eec80
Update events.mdx
Classic298 Jan 3, 2026
3cb2555
Update filter.mdx
Classic298 Jan 3, 2026
d5e6d6d
Update filter.mdx
Classic298 Jan 3, 2026
408fc1c
Merge pull request #946 from Classic298/filters
Classic298 Jan 3, 2026
514d7a8
md header splitting min size merging min size target
Classic298 Jan 3, 2026
0462e65
Merge pull request #948 from Classic298/dev
Classic298 Jan 3, 2026
888802a
Update index.md
Classic298 Jan 3, 2026
160cbe9
Merge pull request #949 from Classic298/dev
Classic298 Jan 3, 2026
ab845c4
update
Classic298 Jan 3, 2026
da31096
Merge pull request #950 from Classic298/dev
Classic298 Jan 3, 2026
63fc7ba
Update starting-with-openai-compatible.mdx
Classic298 Jan 3, 2026
127bb6e
Merge pull request #951 from Classic298/dev
Classic298 Jan 3, 2026
5705555
reasoning
Classic298 Jan 4, 2026
5dec2d2
Merge pull request #953 from Classic298/dev
Classic298 Jan 4, 2026
ffb0d4e
Merge pull request #954 from open-webui/main
Classic298 Jan 4, 2026
8a2d3a3
Merge pull request #956 from open-webui/main
Classic298 Jan 4, 2026
122162e
Merge pull request #957 from open-webui/main
Classic298 Jan 4, 2026
e90a6c2
update
Classic298 Jan 4, 2026
0eb72de
Merge pull request #958 from Classic298/dev
Classic298 Jan 4, 2026
bf0f356
DDGS
Classic298 Jan 4, 2026
4e68d64
NATIVE TOOL CALLING
Classic298 Jan 5, 2026
8a3d0bc
group sharing update
Classic298 Jan 5, 2026
fed6720
fix links
Classic298 Jan 5, 2026
b94ea45
Update index.mdx
Classic298 Jan 5, 2026
694dab2
Update index.mdx
Classic298 Jan 5, 2026
00dc66f
Merge pull request #959 from Classic298/dev
Classic298 Jan 5, 2026
be9f47b
new chat stuff
Classic298 Jan 5, 2026
b04a730
Merge pull request #960 from Classic298/dev
Classic298 Jan 5, 2026
c05e6a8
fav messages
Classic298 Jan 5, 2026
b1e1cf0
Merge branch 'open-webui:dev' into dev
Classic298 Jan 5, 2026
f06891d
Merge pull request #961 from Classic298/dev
Classic298 Jan 5, 2026
5e9c9e4
Update agentic-search.mdx
Classic298 Jan 5, 2026
a3dd1d8
Merge pull request #962 from Classic298/dev
Classic298 Jan 5, 2026
8130254
web search stuff
Classic298 Jan 5, 2026
0274f25
Merge pull request #963 from Classic298/dev
Classic298 Jan 5, 2026
318dba9
new env var
Classic298 Jan 5, 2026
353a924
http2
Classic298 Jan 5, 2026
15e44e5
Merge pull request #964 from Classic298/dev
Classic298 Jan 5, 2026
5499abe
memory
Classic298 Jan 5, 2026
c4b41a4
Merge pull request #965 from Classic298/dev
Classic298 Jan 5, 2026
4d4cb7f
update
Classic298 Jan 6, 2026
a3405cc
fixes
Classic298 Jan 6, 2026
88a5c02
Update env-configuration.mdx
Classic298 Jan 6, 2026
9c77d30
Merge pull request #966 from Classic298/dev
Classic298 Jan 6, 2026
ff8fb2a
Merge pull request #968 from open-webui/main
Classic298 Jan 6, 2026
92a837f
Update agentic-search.mdx
Classic298 Jan 7, 2026
344b1d8
Update development.mdx
Classic298 Jan 7, 2026
8079e02
Update index.mdx
Classic298 Jan 7, 2026
c5514ed
Update knowledge.md
Classic298 Jan 7, 2026
f25b491
Update memory.mdx
Classic298 Jan 7, 2026
906f139
Update index.md
Classic298 Jan 7, 2026
72c81f3
Update rag-tutorial.md
Classic298 Jan 7, 2026
375c090
Update rag.mdx
Classic298 Jan 7, 2026
4c9060a
refac
Classic298 Jan 7, 2026
047771c
updates
Classic298 Jan 7, 2026
780a0a5
Merge pull request #969 from Classic298/dev
Classic298 Jan 7, 2026
3680e99
0.7
Classic298 Jan 7, 2026
6640927
Merge pull request #970 from Classic298/dev
Classic298 Jan 7, 2026
1de9c12
fix
Classic298 Jan 7, 2026
16239a4
Merge pull request #971 from Classic298/dev
Classic298 Jan 7, 2026
51f5a84
fix2
Classic298 Jan 7, 2026
625fdfb
Merge pull request #972 from Classic298/dev
Classic298 Jan 7, 2026
e38581b
thinking
Classic298 Jan 7, 2026
926f9a1
Merge pull request #973 from Classic298/dev
Classic298 Jan 7, 2026
333c682
Update reasoning-models.mdx
Classic298 Jan 7, 2026
e7abf93
Merge pull request #974 from Classic298/dev
Classic298 Jan 7, 2026
16a89e7
Merge pull request #976 from open-webui/main
Classic298 Jan 8, 2026
c463f3d
Update knowledge.md
Classic298 Jan 8, 2026
7ac7644
status
Classic298 Jan 8, 2026
b980962
Merge pull request #977 from Classic298/dev
Classic298 Jan 8, 2026
d4d5df5
Update env-configuration.mdx
Classic298 Jan 8, 2026
46b7c2e
Update webhooks.md
Classic298 Jan 9, 2026
3c04c21
Update roles.md
Classic298 Jan 9, 2026
aaca2f1
Update memory.mdx
Classic298 Jan 9, 2026
ae43522
Update index.mdx
Classic298 Jan 9, 2026
2c789ee
Update env-configuration.mdx
Classic298 Jan 9, 2026
18ec6ea
file context + new faq
Classic298 Jan 9, 2026
6372dfe
per model tts
Classic298 Jan 9, 2026
1fbc8bb
Merge pull request #979 from Classic298/dev
Classic298 Jan 9, 2026
347c91a
timezone
Classic298 Jan 9, 2026
7765dab
How TTS Splits Text
Classic298 Jan 9, 2026
c00c26f
Merge pull request #980 from Classic298/dev
Classic298 Jan 9, 2026
b3ff327
docs: update versioning to v0.7.0 and document new features
Classic298 Jan 9, 2026
d1055ef
Merge pull request #981 from Classic298/dev
Classic298 Jan 9, 2026
92b7383
docs: remove broken image link from evaluation page
Classic298 Jan 9, 2026
82ab559
Merge pull request #982 from Classic298/dev
Classic298 Jan 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions docs/faq.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,10 @@ Everything you need to run Open WebUI, including your data, remains within your
docker run -d -p 3000:8080 -e HF_ENDPOINT=https://hf-mirror.com/ --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
```

### Q: Why are my reasoning model's thinking blocks showing as raw text instead of being hidden?

**A:** This happens if the model's thinking tags are not recognized by Open WebUI. You can customize the tags in the model's Advanced Parameters. For more details, see the **[Reasoning & Thinking Models](/features/chat-features/reasoning-models)** guide.

### Q: RAG with Open WebUI is very bad or not working at all. Why?

**A:** If you're using **Ollama**, be aware that Ollama sets the context length to **2048 tokens by default**. This means that none of the retrieved data might be used because it doesn't fit within the available context window.
Expand All @@ -136,10 +140,66 @@ To improve the performance of Retrieval-Augmented Generation (**RAG**) with Open

To do this, configure your **Ollama model params** to allow a larger context window. You can check and modify this setting in your chat directly or from model editor page to enhance the RAG experience significantly.

### Q: I asked the model what it is and it gave the wrong answer. Is Open WebUI routing to the wrong model?

**A:** No—**LLMs do not reliably know their own identity.** When you ask a model "What model are you?" or "Are you GPT-4?", the response is not a system diagnostic. It's simply the model generating text based on patterns in its training data.

Models frequently:
- Claim to be a different model (e.g., a Llama model claiming to be ChatGPT)
- Give outdated information about themselves
- Hallucinate version numbers or capabilities
- Change their answer depending on how you phrase the question

**To verify which model you're actually using:**
1. Check the model selector in the Open WebUI interface
2. Look at the **Admin Panel > Settings > Connections** to confirm your API endpoints
3. Check your provider's dashboard/logs for the actual API calls being made

Asking the model itself is **not** a valid way to diagnose routing issues. If you suspect a configuration problem, check your connection settings and API keys instead.

### Q: But why can models on official chat interfaces (like ChatGPT or Claude.ai) correctly identify themselves?

**A:** Because the provider **injects a system prompt** that explicitly tells the model what it is. When you use ChatGPT, OpenAI's interface includes a hidden system message like "You are ChatGPT, a large language model trained by OpenAI..." before your conversation begins.

The model isn't "aware" of itself—it's simply been instructed to claim a specific identity. You can do the same thing in Open WebUI by adding a system prompt to your model configuration (e.g., "You are Llama 3.3 70B..."). The model will then confidently repeat whatever identity you've told it to claim.

This is also why the same model accessed through different interfaces might give different answers about its identity—it depends entirely on what system prompt (if any) was provided.

### Q: Why am I seeing multiple API requests when I only send one message? Why is my token usage higher than expected?

**A:** Open WebUI uses **Task Models** to power background features that enhance your chat experience. When you send a single message, additional API calls may be made for:

- **Title Generation**: Automatically generating a title for new chats
- **Tag Generation**: Auto-tagging chats for organization
- **Query Generation**: Creating optimized search queries for RAG (when you attach files or knowledge)
- **Web Search Queries**: Generating search terms when web search is enabled
- **Autocomplete Suggestions**: If enabled

By default, these tasks use the **same model** you're chatting with. If you're using an expensive API model (like GPT-4 or Claude), this can significantly increase your costs.

**To reduce API costs:**
1. Go to **Admin Panel > Settings > Interface** (for title/tag generation settings)
2. Configure a **Task Model** under **Admin Panel > Settings > Models** to use a smaller, cheaper model (like GPT-4o-mini) or a local model for background tasks
3. Disable features you don't need (auto-title, auto-tags, etc.)

:::tip Cost-Saving Recommendation
Set your Task Model to a fast, inexpensive model (or a local model via Ollama) while keeping your primary chat model as a more capable one. This gives you the best of both worlds: smart responses for your conversations, cheap/free processing for background tasks.
:::

For more optimization tips, see the **[Performance Tips Guide](tutorials/tips/performance)**.

### Q: Is MCP (Model Context Protocol) supported in Open WebUI?

**A:** Yes, Open WebUI now includes **native support for MCP Streamable HTTP**, enabling direct, first-class integration with MCP tools that communicate over the standard HTTP transport. For any **other MCP transports or non-HTTP implementations**, you should use our official proxy adapter, **MCPO**, available at 👉 [https://github.com/open-webui/mcpo](https://github.com/open-webui/mcpo). MCPO provides a unified OpenAPI-compatible layer that bridges alternative MCP transports into Open WebUI safely and consistently. This architecture ensures maximum compatibility, strict security boundaries, and predictable tool behavior across different environments while keeping Open WebUI backend-agnostic and maintainable.

### Q: Why doesn't Open WebUI support [Specific Provider]'s latest API (e.g. OpenAI Responses API)?

**A:** Open WebUI is built around **universal protocols**, not specific providers. Our core philosophy is to support standard, widely-adopted APIs like the **OpenAI Chat Completions protocol**.

This protocol-centric design ensures that Open WebUI remains backend-agnostic and compatible with dozens of providers (like OpenRouter, LiteLLM, vLLM, and Groq) simultaneously. We avoid implementing proprietary, provider-specific APIs (such as OpenAI's stateful Responses API or Anthropic's Messages API) to prevent unsustainable architectural bloat and to maintain a truly open ecosystem.

If you need functionality exclusive to a proprietary API (like OpenAI's hidden reasoning traces), we recommend using a proxy like **LiteLLM** or **OpenRouter**, which translate those proprietary features into the standard Chat Completions protocol that Open WebUI supports.

### Q: Why is the frontend integrated into the same Docker image? Isn't this unscalable or problematic?

The assumption that bundling the frontend with the backend is unscalable comes from a misunderstanding of how modern Single-Page Applications work. Open WebUI’s frontend is a static SPA, meaning it consists only of HTML, CSS, and JavaScript files with no runtime coupling to the backend. Because these files are static, lightweight, and require no separate server, including them in the same image has no impact on scalability. This approach simplifies deployment, ensures every replica serves the exact same assets, and eliminates unnecessary moving parts. If you prefer, you can still host the SPA on any CDN or static hosting service and point it to a remote backend, but packaging both together is the standard and most practical method for containerized SPAs.
Expand Down
130 changes: 117 additions & 13 deletions docs/features/audio/speech-to-text/env-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,16 +11,120 @@ For a complete list of all Open WebUI environment variables, see the [Environmen

:::

The following is a summary of the environment variables for speech to text (STT).

# Environment Variables For Speech To Text (STT)

| Variable | Description |
|----------|-------------|
| `WHISPER_MODEL` | Sets the Whisper model to use for local Speech-to-Text |
| `WHISPER_MODEL_DIR` | Specifies the directory to store Whisper model files |
| `WHISPER_LANGUAGE` | Specifies the ISO 639-1 (ISO 639-2 for Hawaiian and Cantonese) Speech-to-Text language to use for Whisper (language is predicted unless set) |
| `AUDIO_STT_ENGINE` | Specifies the Speech-to-Text engine to use (empty for local Whisper, or `openai`) |
| `AUDIO_STT_MODEL` | Specifies the Speech-to-Text model for OpenAI-compatible endpoints |
| `AUDIO_STT_OPENAI_API_BASE_URL` | Sets the OpenAI-compatible base URL for Speech-to-Text |
| `AUDIO_STT_OPENAI_API_KEY` | Sets the OpenAI API key for Speech-to-Text |
The following is a summary of the environment variables for speech to text (STT) and text to speech (TTS).

:::tip UI Configuration
Most of these settings can also be configured in the **Admin Panel → Settings → Audio** tab. Environment variables take precedence on startup but can be overridden in the UI.
:::

## Speech To Text (STT) Environment Variables

### Local Whisper

| Variable | Description | Default |
|----------|-------------|---------|
| `WHISPER_MODEL` | Whisper model size | `base` |
| `WHISPER_MODEL_DIR` | Directory to store Whisper model files | `{CACHE_DIR}/whisper/models` |
| `WHISPER_COMPUTE_TYPE` | Compute type for inference (see note below) | `int8` |
| `WHISPER_LANGUAGE` | ISO 639-1 language code (empty = auto-detect) | empty |
| `WHISPER_MULTILINGUAL` | Use the multilingual Whisper model | `false` |
| `WHISPER_MODEL_AUTO_UPDATE` | Auto-download model updates | `false` |
| `WHISPER_VAD_FILTER` | Enable Voice Activity Detection filter | `false` |

:::info WHISPER_COMPUTE_TYPE Options
- `int8` — CPU default, fastest but may not work on older GPUs
- `float16` — **Recommended for CUDA/GPU**
- `int8_float16` — Hybrid mode (int8 weights, float16 computation)
- `float32` — Maximum compatibility, slowest

If using the `:cuda` Docker image with an older GPU, set `WHISPER_COMPUTE_TYPE=float16` to avoid errors.
:::

### OpenAI-Compatible STT

| Variable | Description | Default |
|----------|-------------|---------|
| `AUDIO_STT_ENGINE` | STT engine: empty (local Whisper), `openai`, `azure`, `deepgram`, `mistral` | empty |
| `AUDIO_STT_MODEL` | STT model for external providers | empty |
| `AUDIO_STT_OPENAI_API_BASE_URL` | OpenAI-compatible API base URL | `https://api.openai.com/v1` |
| `AUDIO_STT_OPENAI_API_KEY` | OpenAI API key | empty |
| `AUDIO_STT_SUPPORTED_CONTENT_TYPES` | Comma-separated list of supported audio MIME types | empty |

### Azure STT

| Variable | Description | Default |
|----------|-------------|---------|
| `AUDIO_STT_AZURE_API_KEY` | Azure Cognitive Services API key | empty |
| `AUDIO_STT_AZURE_REGION` | Azure region | `eastus` |
| `AUDIO_STT_AZURE_LOCALES` | Comma-separated locales (e.g., `en-US,de-DE`) | auto |
| `AUDIO_STT_AZURE_BASE_URL` | Custom Azure base URL (optional) | empty |
| `AUDIO_STT_AZURE_MAX_SPEAKERS` | Max speakers for diarization | `3` |

### Deepgram STT

| Variable | Description | Default |
|----------|-------------|---------|
| `DEEPGRAM_API_KEY` | Deepgram API key | empty |

### Mistral STT

| Variable | Description | Default |
|----------|-------------|---------|
| `AUDIO_STT_MISTRAL_API_KEY` | Mistral API key | empty |
| `AUDIO_STT_MISTRAL_API_BASE_URL` | Mistral API base URL | `https://api.mistral.ai/v1` |
| `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS` | Use chat completions endpoint | `false` |

## Text To Speech (TTS) Environment Variables

### General TTS

| Variable | Description | Default |
|----------|-------------|---------|
| `AUDIO_TTS_ENGINE` | TTS engine: empty (disabled), `openai`, `elevenlabs`, `azure`, `transformers` | empty |
| `AUDIO_TTS_MODEL` | TTS model | `tts-1` |
| `AUDIO_TTS_VOICE` | Default voice | `alloy` |
| `AUDIO_TTS_SPLIT_ON` | Split text on: `punctuation` or `none` | `punctuation` |
| `AUDIO_TTS_API_KEY` | API key for ElevenLabs or Azure TTS | empty |

### OpenAI-Compatible TTS

| Variable | Description | Default |
|----------|-------------|---------|
| `AUDIO_TTS_OPENAI_API_BASE_URL` | OpenAI-compatible TTS API base URL | `https://api.openai.com/v1` |
| `AUDIO_TTS_OPENAI_API_KEY` | OpenAI TTS API key | empty |
| `AUDIO_TTS_OPENAI_PARAMS` | Additional JSON params for OpenAI TTS | empty |

### Azure TTS

| Variable | Description | Default |
|----------|-------------|---------|
| `AUDIO_TTS_AZURE_SPEECH_REGION` | Azure Speech region | `eastus` |
| `AUDIO_TTS_AZURE_SPEECH_BASE_URL` | Custom Azure Speech base URL (optional) | empty |
| `AUDIO_TTS_AZURE_SPEECH_OUTPUT_FORMAT` | Audio output format | `audio-24khz-160kbitrate-mono-mp3` |

## Tips for Configuring Audio

### Using Local Whisper STT

For GPU acceleration issues or older GPUs, try setting:
```yaml
environment:
- WHISPER_COMPUTE_TYPE=float16
```

### Using External TTS Services

When running Open WebUI in Docker with an external TTS service:

```yaml
environment:
- AUDIO_TTS_ENGINE=openai
- AUDIO_TTS_OPENAI_API_BASE_URL=http://host.docker.internal:5050/v1
- AUDIO_TTS_OPENAI_API_KEY=your-api-key
```

:::tip
Use `host.docker.internal` on Docker Desktop (Windows/Mac) to access services on the host. On Linux, use the host IP or container networking.
:::

For troubleshooting audio issues, see the [Audio Troubleshooting Guide](/troubleshooting/audio).
125 changes: 125 additions & 0 deletions docs/features/audio/speech-to-text/mistral-voxtral-integration.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
---
sidebar_position: 2
title: "Mistral Voxtral STT"
---

# Using Mistral Voxtral for Speech-to-Text

This guide covers how to use Mistral's Voxtral model for Speech-to-Text with Open WebUI. Voxtral is Mistral's speech-to-text model that provides accurate transcription.

## Requirements

- A Mistral API key
- Open WebUI installed and running

## Quick Setup (UI)

1. Click your **profile icon** (bottom-left corner)
2. Select **Admin Panel**
3. Click **Settings** → **Audio** tab
4. Configure the following:

| Setting | Value |
|---------|-------|
| **Speech-to-Text Engine** | `MistralAI` |
| **API Key** | Your Mistral API key |
| **STT Model** | `voxtral-mini-latest` (or leave empty for default) |

5. Click **Save**

## Available Models

| Model | Description |
|-------|-------------|
| `voxtral-mini-latest` | Default transcription model (recommended) |

## Environment Variables Setup

If you prefer to configure via environment variables:

```yaml
services:
open-webui:
image: ghcr.io/open-webui/open-webui:main
environment:
- AUDIO_STT_ENGINE=mistral
- AUDIO_STT_MISTRAL_API_KEY=your-mistral-api-key
- AUDIO_STT_MODEL=voxtral-mini-latest
# ... other configuration
```

### All Mistral STT Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `AUDIO_STT_ENGINE` | Set to `mistral` | empty (uses local Whisper) |
| `AUDIO_STT_MISTRAL_API_KEY` | Your Mistral API key | empty |
| `AUDIO_STT_MISTRAL_API_BASE_URL` | Mistral API base URL | `https://api.mistral.ai/v1` |
| `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS` | Use chat completions endpoint | `false` |
| `AUDIO_STT_MODEL` | STT model | `voxtral-mini-latest` |

## Transcription Methods

Mistral supports two transcription methods:

### Standard Transcription (Default)
Uses the dedicated transcription endpoint. This is the recommended method.

### Chat Completions Method
Set `AUDIO_STT_MISTRAL_USE_CHAT_COMPLETIONS=true` to use Mistral's chat completions API for transcription. This method:
- Requires audio in mp3 or wav format (automatic conversion is attempted)
- May provide different results than the standard endpoint

## Using STT

1. Click the **microphone icon** in the chat input
2. Speak your message
3. Click the microphone again or wait for silence detection
4. Your speech will be transcribed and appear in the input box

## Supported Audio Formats

Voxtral accepts common audio formats. The system defaults to accepting `audio/*` and `video/webm`.

If using the chat completions method, audio is automatically converted to mp3.

## Troubleshooting

### API Key Errors

If you see "Mistral API key is required":
1. Verify your API key is entered correctly
2. Check the API key hasn't expired
3. Ensure your Mistral account has API access

### Transcription Not Working

1. Check container logs: `docker logs open-webui -f`
2. Verify the STT Engine is set to `MistralAI`
3. Try the standard transcription method (disable chat completions)

### Audio Format Issues

If using chat completions method and audio conversion fails:
- Ensure FFmpeg is available in the container
- Try recording in a different format (wav or mp3)
- Switch to the standard transcription method

For more troubleshooting, see the [Audio Troubleshooting Guide](/troubleshooting/audio).

## Comparison with Other STT Options

| Feature | Mistral Voxtral | OpenAI Whisper | Local Whisper |
|---------|-----------------|----------------|---------------|
| **Cost** | Per-minute pricing | Per-minute pricing | Free |
| **Privacy** | Audio sent to Mistral | Audio sent to OpenAI | Audio stays local |
| **Model Options** | voxtral-mini-latest | whisper-1 | tiny → large |
| **GPU Required** | No | No | Recommended |

## Cost Considerations

Mistral charges per minute of audio for STT. Check [Mistral's pricing page](https://mistral.ai/products/la-plateforme#pricing) for current rates.

:::tip
For free STT, use **Local Whisper** (the default) or the browser's **Web API** for basic transcription.
:::
Loading