open-webui · Classic298 · Jan 9, 2026 · Jan 9, 2026 · Jan 9, 2026
diff --git a/docs/faq.mdx b/docs/faq.mdx
@@ -140,6 +140,54 @@ To improve the performance of Retrieval-Augmented Generation (**RAG**) with Open
 
 To do this, configure your **Ollama model params** to allow a larger context window. You can check and modify this setting in your chat directly or from model editor page to enhance the RAG experience significantly.
 
+### Q: I asked the model what it is and it gave the wrong answer. Is Open WebUI routing to the wrong model?
+
+**A:** No—**LLMs do not reliably know their own identity.** When you ask a model "What model are you?" or "Are you GPT-4?", the response is not a system diagnostic. It's simply the model generating text based on patterns in its training data.
+
+Models frequently:
+- Claim to be a different model (e.g., a Llama model claiming to be ChatGPT)
+- Give outdated information about themselves
+- Hallucinate version numbers or capabilities
+- Change their answer depending on how you phrase the question
+
+**To verify which model you're actually using:**
+1. Check the model selector in the Open WebUI interface
+2. Look at the **Admin Panel > Settings > Connections** to confirm your API endpoints
+3. Check your provider's dashboard/logs for the actual API calls being made
+
+Asking the model itself is **not** a valid way to diagnose routing issues. If you suspect a configuration problem, check your connection settings and API keys instead.
+
+### Q: But why can models on official chat interfaces (like ChatGPT or Claude.ai) correctly identify themselves?
+
+**A:** Because the provider **injects a system prompt** that explicitly tells the model what it is. When you use ChatGPT, OpenAI's interface includes a hidden system message like "You are ChatGPT, a large language model trained by OpenAI..." before your conversation begins.
+
+The model isn't "aware" of itself—it's simply been instructed to claim a specific identity. You can do the same thing in Open WebUI by adding a system prompt to your model configuration (e.g., "You are Llama 3.3 70B..."). The model will then confidently repeat whatever identity you've told it to claim.
+
+This is also why the same model accessed through different interfaces might give different answers about its identity—it depends entirely on what system prompt (if any) was provided.
+
+### Q: Why am I seeing multiple API requests when I only send one message? Why is my token usage higher than expected?
+
+**A:** Open WebUI uses **Task Models** to power background features that enhance your chat experience. When you send a single message, additional API calls may be made for:
+
+- **Title Generation**: Automatically generating a title for new chats
+- **Tag Generation**: Auto-tagging chats for organization
+- **Query Generation**: Creating optimized search queries for RAG (when you attach files or knowledge)
+- **Web Search Queries**: Generating search terms when web search is enabled
+- **Autocomplete Suggestions**: If enabled
+
+By default, these tasks use the **same model** you're chatting with. If you're using an expensive API model (like GPT-4 or Claude), this can significantly increase your costs.
+
+**To reduce API costs:**
+1. Go to **Admin Panel > Settings > Interface** (for title/tag generation settings)
+2. Configure a **Task Model** under **Admin Panel > Settings > Models** to use a smaller, cheaper model (like GPT-4o-mini) or a local model for background tasks
+3. Disable features you don't need (auto-title, auto-tags, etc.)
+
+:::tip Cost-Saving Recommendation
+Set your Task Model to a fast, inexpensive model (or a local model via Ollama) while keeping your primary chat model as a more capable one. This gives you the best of both worlds: smart responses for your conversations, cheap/free processing for background tasks.
+:::
+
+For more optimization tips, see the **[Performance Tips Guide](tutorials/tips/performance)**.
+
 ### Q: Is MCP (Model Context Protocol) supported in Open WebUI?
 
 **A:** Yes, Open WebUI now includes **native support for MCP Streamable HTTP**, enabling direct, first-class integration with MCP tools that communicate over the standard HTTP transport. For any **other MCP transports or non-HTTP implementations**, you should use our official proxy adapter, **MCPO**, available at 👉 [https://github.com/open-webui/mcpo](https://github.com/open-webui/mcpo). MCPO provides a unified OpenAPI-compatible layer that bridges alternative MCP transports into Open WebUI safely and consistently. This architecture ensures maximum compatibility, strict security boundaries, and predictable tool behavior across different environments while keeping Open WebUI backend-agnostic and maintainable.

diff --git a/docs/features/audio/text-to-speech/openai-tts-integration.md b/docs/features/audio/text-to-speech/openai-tts-integration.md
@@ -57,6 +57,34 @@ OpenAI provides 6 built-in voices:
 Try different voices to find the one that best suits your use case. You can preview voices in OpenAI's documentation.
 :::
 
+## Per-Model TTS Voice
+
+You can assign a specific TTS voice to individual models, allowing different AI personas to have distinct voices. This is configured in the Model Editor.
+
+### Setting a Model-Specific Voice
+
+1. Go to **Workspace > Models**
+2. Click the **Edit** (pencil) icon on the model you want to configure
+3. Scroll down to find the **TTS Voice** field
+4. Enter the voice name (e.g., `alloy`, `echo`, `shimmer`, `onyx`, `nova`, `fable`)
+5. Click **Save**
+
+### Voice Priority
+
+When playing TTS audio, Open WebUI uses the following priority:
+
+1. **Model-specific TTS voice** (if set in Model Editor)
+2. **User's personal voice setting** (if configured in user settings)
+3. **System default voice** (configured by admin)
+
+This allows admins to give each AI persona a consistent voice while still letting users override with their personal preference when no model-specific voice is set.
+
+### Use Cases
+
+- **Character personas**: Give a "British Butler" model the `fable` voice, while an "Energetic Assistant" uses `nova`
+- **Language learning**: Assign appropriate voices for different language tutors
+- **Accessibility**: Set clearer voices for models designed for accessibility use cases
+
 ## Environment Variables Setup
 
 If you prefer to configure via environment variables:

diff --git a/docs/features/plugin/tools/index.mdx b/docs/features/plugin/tools/index.mdx
@@ -185,6 +185,34 @@ These models excel at multi-step reasoning, proper JSON formatting, and autonomo
 
 **Why use these?** It allows for **Deep Research** (searching the web multiple times, or querying knowledge bases), **Contextual Awareness** (looking up previous chats or notes), **Dynamic Personalization** (saving facts), and **Precise Automation** (generating content based on existing notes or documents).
 
+#### Disabling Builtin Tools (Per-Model)
+
+The **Builtin Tools** capability can be toggled on or off for each model in the **Workspace > Models** editor under **Capabilities**. When enabled (the default), all the system tools listed above are automatically injected when using Native Mode.
+
+**When to disable Builtin Tools:**
+
+| Scenario | Reason to Disable |
+|----------|-------------------|
+| **Model doesn't support function calling** | Smaller or older models may not handle the `tools` parameter correctly |
+| **Simpler/predictable behavior needed** | You want the model to work only with pre-injected context, no autonomous tool calls |
+| **Security/control concerns** | Prevents the model from actively querying knowledge bases, searching chats, accessing memories, etc. |
+| **Token efficiency** | Tool specifications consume tokens; disabling saves context window space |
+
+**What happens when Builtin Tools is disabled:**
+
+1. **No tool injection**: The model won't receive any of the built-in system tools, even in Native Mode.
+2. **RAG still works** (if File Context is enabled): Attached files are still processed via RAG and injected as context.
+3. **No autonomous retrieval**: The model cannot decide to search knowledge bases or fetch additional information—it works only with what's provided upfront.
+
+:::warning Builtin Tools vs File Context
+**Builtin Tools** controls whether the model gets *tools* for autonomous retrieval. It does **not** control whether file content is injected via RAG—that's controlled by the separate **File Context** capability.
+
+- **File Context** = Whether Open WebUI extracts and injects file content (RAG processing)
+- **Builtin Tools** = Whether the model gets tools to autonomously search/retrieve additional content
+
+See [File Context vs Builtin Tools](../../rag/index.md#file-context-vs-builtin-tools) for a detailed comparison.
+:::
+
 ### Interleaved Thinking {#interleaved-thinking}
 
 🧠 When using **Native Mode (Agentic Mode)**, high-tier models can engage in **Interleaved Thinking**. This is a powerful "Thought → Action → Thought → Action → Thought → ..." loop where the model can reason about a task, execute one or more tools, evaluate the results, and then decide on its next move.

diff --git a/docs/features/rag/index.md b/docs/features/rag/index.md
@@ -142,6 +142,62 @@ Change the RAG embedding model directly in the `Admin Panel` > `Settings` > `Doc
 
 The RAG feature allows users to easily track the context of documents fed to LLMs with added citations for reference points. This ensures transparency and accountability in the use of external sources within your chats.
 
+## File Context vs Builtin Tools
+
+Open WebUI provides two separate capabilities that control how files are handled. Understanding the difference is important for configuring models correctly.
+
+### File Context Capability
+
+The **File Context** capability controls whether Open WebUI performs RAG (Retrieval-Augmented Generation) on attached files:
+
+| File Context | Behavior |
+|--------------|----------|
+| ✅ **Enabled** (default) | Attached files are processed via RAG. Content is retrieved and injected into the conversation context. |
+| ❌ **Disabled** | File processing is **completely skipped**. No content extraction, no injection. The model receives no file content. |
+
+**When to disable File Context:**
+- **Bypassing RAG entirely**: When you don't want Open WebUI to process attached files at all.
+- **Using Builtin Tools only**: If you prefer the model to retrieve file content on-demand via tools like `query_knowledge_bases` rather than having content pre-injected.
+- **Debugging/testing**: To isolate whether issues are related to RAG processing.
+
+:::warning File Context Disabled = No Pre-Injected Content
+When File Context is disabled, file content is **not automatically extracted or injected**. Open WebUI does not forward files to the model's native API. If you disable this, the only way the model can access file content is through builtin tools (if enabled) that query knowledge bases or retrieve attached files on-demand (agentic file processing).
+:::
+
+:::info
+The File Context toggle only appears when **File Upload** is enabled for the model.
+:::
+
+### Builtin Tools Capability
+
+The **Builtin Tools** capability controls whether the model receives native function-calling tools for autonomous retrieval:
+
+| Builtin Tools | Behavior |
+|---------------|----------|
+| ✅ **Enabled** (default) | In Native Function Calling mode, the model receives tools like `query_knowledge_bases`, `view_knowledge_file`, `search_chats`, etc. |
+| ❌ **Disabled** | No builtin tools are injected. The model works only with pre-injected context. |
+
+**When to disable Builtin Tools:**
+- **Model doesn't support function calling**: Smaller or older models may not handle the `tools` parameter.
+- **Predictable behavior needed**: You want the model to work only with what's provided upfront.
+
+### Combining the Two Capabilities
+
+These capabilities work independently, giving you fine-grained control:
+
+| File Context | Builtin Tools | Result |
+|--------------|---------------|--------|
+| ✅ Enabled | ✅ Enabled | **Full Agentic Mode**: RAG content injected + model can autonomously query knowledge bases |
+| ✅ Enabled | ❌ Disabled | **Traditional RAG**: Content injected upfront, no autonomous retrieval tools |
+| ❌ Disabled | ✅ Enabled | **Tools-Only Mode**: No pre-injected content, but model can use tools to query knowledge bases or retrieve attached files on-demand |
+| ❌ Disabled | ❌ Disabled | **No File Processing**: Attached files are ignored, no content reaches the model |
+
+:::tip Choosing the Right Configuration
+- **Most models**: Keep both enabled (defaults) for full functionality.
+- **Small/local models**: Disable Builtin Tools if they don't support function calling.
+- **On-demand retrieval only**: Disable File Context, enable Builtin Tools if you want the model to decide what to retrieve rather than pre-injecting everything.
+:::
+
 ## Enhanced RAG Pipeline
 
 The togglable hybrid search sub-feature for our RAG embedding feature enhances RAG functionality via `BM25`, with re-ranking powered by `CrossEncoder`, and configurable relevance score thresholds. This provides a more precise and tailored RAG experience for your specific use case.

diff --git a/docs/features/workspace/models.md b/docs/features/workspace/models.md
@@ -80,10 +80,13 @@ You can transform a generic model into a specialized agent by toggling specific
   - **Vision**: Toggle to enable image analysis capabilities (requires a vision-capable Base Model).
   - **Web Search**: Enable the model to access the configured search provider (e.g., Google, SearxNG) for real-time information.
   - **File Upload**: Allow users to upload files to this model.
+  - **File Context**: When enabled (default), attached files are processed via RAG and their content is injected into the conversation. When disabled, file content is **not** extracted or injected—the model receives no file content unless it retrieves it via builtin tools. Only visible when File Upload is enabled. See [File Context vs Builtin Tools](../rag/index.md#file-context-vs-builtin-tools) for details.
   - **Code Interpreter**: Enable Python code execution.
   - **Image Generation**: Enable image generation integration.
   - **Usage / Citations**: Toggle usage tracking or source citations.
   - **Status Updates**: Show visible progress steps in the chat UI (e.g., "Searching web...", "Reading file...") during generation. Useful for slower, complex tasks.
+  - **Builtin Tools**: When enabled (default), automatically injects system tools (timestamps, memory, chat history, knowledge base queries, notes, etc.) in [Native Function Calling mode](../plugin/tools/index.mdx#disabling-builtin-tools-per-model). Disable this if the model doesn't support function calling or you need to save context window tokens. Note: This is separate from **File Context**—see [File Context vs Builtin Tools](../rag/index.md#file-context-vs-builtin-tools) for the difference.
+- **TTS Voice**: Set a specific Text-to-Speech voice for this model. When users read responses aloud, this voice will be used instead of the global default. Useful for giving different personas distinct voices. Leave empty to use the user's settings or system default. See [Per-Model TTS Voice](../audio/text-to-speech/openai-tts-integration#per-model-tts-voice) for details.
 - **Default Features**: Force specific toggles (like Web Search) to be "On" immediately when a user starts a chat with this model.
 
 ## Model Management