v3.7.0 #6974

mudler · 2025-10-31T21:34:22Z

mudler
Oct 31, 2025
Maintainer

🚀 LocalAI 3.7.0

Welcome to LocalAI 3.7.0 👋

This release introduces Agentic MCP support with full WebUI integration, a brand-new neutts TTS backend, fuzzy model search, long-form TTS chunking for chatterbox, and a complete WebUI overhaul.

We’ve also fixed critical bugs, improved stability, and enhanced compatibility with OpenAI’s APIs.

📌 TL;DR – What’s New in LocalAI 3.7.0

Feature	Summary
🤖 Agentic MCP Support (WebUI-enabled)	Build AI agents that use real tools (web search, code exec). Fully-OpenAI compatible and integrated into the WebUI.
🎙️ neutts TTS Backend (Neuphonic-powered)	Generate natural, high-quality speech with low-latency audio — ideal for voice assistants.
🖼️ WebUI enhancements	Faster, cleaner UI with real-time updates and full YAML model control.
💬 Long-Text TTS Chunking (Chatterbox)	Generate natural-sounding long-form audio by intelligently splitting text and preserving context.
🧩 Advanced Agent Controls	Fine-tune agent behavior with new options for retries, reasoning, and re-evaluation.
📸 New Video Creation Endpoint	We now support the OpenAI-compatible `/v1/videos` endpoint for text-to-video generation.
🐍 Enhanced Whisper compatibility	Whisper.cpp is now supported on various CPU variants (AVX, AVX2, etc.) to prevent `illegal instruction` crashes.
🔍 Fuzzy Gallery Search	Find models in the gallery even with typos (e.g., `gema` finds `gemma`).
📦 Easier Model & Backend Management	Import, edit, and delete models directly via clean YAML in the WebUI.
▶️ Realtime Example	Check out the new realtime voice assistant example (multilingual).
⚠️ Security, Stability & API Compliance	Fixed critical crashes, deadlocks, session events, OpenAI compliance, and JSON schema panics.
🧠 Qwen 3 VL	Support for Qwen 3 VL with llama.cpp/gguf models

🔥 What’s New in Detail

🤖 Agentic MCP Support – Build Intelligent, Tool-Using AI Agents

We're proud to announce full Agentic MCP support a feature for building AI agents that can reason, plan, and execute actions using external tools like web search, code execution, and data retrieval. You can use standard chat/completions endpoint, but powered by an agent in the background.

Full documentation is available here

✅ Now in WebUI: A dedicated toggle appears in the chat interface when a model supports MCP. Just click to enable agent mode.

✨ Key Features:

New Endpoint: POST /mcp/v1/chat/completions (OpenAI-compatible).

Flexible Tool Configuration:

mcp:
  stdio: |
    {
      "mcpServers": {
        "searxng": {
          "command": "docker",
          "args": ["run", "-i", "--rm", "ghcr.io/mudler/mcps/duckduckgo:master"]
        }
      }
    }

Advanced Agent Control via agent config:
```
agent:
  max_attempts: 3
  max_iterations: 5
  enable_reasoning: true
  enable_re_evaluation: true
```
- max_attempts: Retry failed tool calls up to N times.
- max_iterations: Limit how many times the agent can loop through reasoning.
- enable_reasoning: Allow step-by-step thought processes (e.g., chain-of-thought).
- enable_re_evaluation: Re-analyze decisions when tool results are ambiguous.

You can find some plug-n-play MCPs here: https://github.com/mudler/MCPs
Under the hood, MCP functionality is powered by https://github.com/mudler/cogito

🖼️ WebUI enhancements

WebUI had a major overhaul:

The chat view now has an MCP toggle in the chat for models that have mcp settings enabled in the model config file.
The Editor mask of the model has now been simplified to show/edit the YAML settings of the model
More reactive, dropped HTMX in favor of Alpine.js and vanilla javascript
Various fixes including deletion of models

🎙️ Introducing neutts TTS Backend – Natural Speech, Low Latency

Say hello to neutts a new, lightweight TTS backend powered by Neuphonic, delivering high-quality, natural-sounding speech with minimal overhead.

🎛️ Setup Example

name: neutts-english
backend: neutts
parameters:
  model: neuphonic/neutts-air
tts:
  audio_path: "./output.wav"
  streaming: true
options:
  # text transcription of the provided audio file
  - ref_text: "So I'm live on radio..."
known_usecases:
  - tts

🐍 Whisper.cpp enhancements

whisper.cpp CPU variants are now available for:

avx
avx2
avx512
fallback (no optimized instructions available)

These variants are optimized for specific instruction sets and reduce crashes on older or non-AVX CPUs.

🔍 Smarter Gallery Search: Fuzzy & Case-Insensitive Matching

Searching for gemma now finds gemma-3, gemma2, etc. — even with typos like gemaa or gema.

🧩 Improved Tool & Schema Handling – No More Crashes

We’ve fixed multiple edge cases that caused crashes or silent failures in tool usage.

✅ Fixes:

Nullable JSON Schemas: "type": ["string", "null"] now works without panics.
Empty Parameters: Tools with missing or empty parameters now handled gracefully.
Strict Mode Enforcement: When strict_mode: true, the model must pick a tool — no more skipping.
Multi-Type Arrays: Safe handling of ["string", "null"] in function definitions.

🔄 Interaction with Grammar Triggers: strict_mode and grammar rules work together — if a tool is required and the function definition is invalid, the server returns a clear JSON error instead of crashing.

📸 New Video Creation Endpoint: OpenAI-Compatible

LocalAI now supports OpenAI’s /v1/videos endpoint for generating videos from text prompts.

📌 Usage Example:

curl http://localhost:8080/v1/videos \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-..." \
  -d '{
    "model": "sora",
    "prompt": "A cat walking through a forest at sunset",
    "size": "1024x576",
  }'

⚠️ Note: Video generation is resource-heavy. Use only on machines with GPU or high RAM (8GB+).

🧠 Qwen 3 VL in llama.cpp

Support has been added for Qwen 3 VL in llama.cpp. We have updated llama.cpp to latest! As a reminder, Qwen 3 VL and multimodal models are also compatible with our vLLM and MLX backends. Qwen 3 VL models are already available in the model gallery:

qwen3-vl-30b-a3b-instruct
qwen3-vl-30b-a3b-thinking
qwen3-vl-4b-instruct
qwen3-vl-32b-instruct
qwen3-vl-4b-thinking
qwen3-vl-2b-thinking
qwen3-vl-2b-instruct

Note: upgrading the llama.cpp backend is necessary if you already have a LocalAI installation.

🚀 (CI) Gallery Updater Agent: Auto-Detect & Suggest New Models

We’ve added an autonomous CI agent that scans Hugging Face daily for new models and opens PRs to update the gallery.

✨ How It Works:

Scans HF for new, trending models
Extracts base model, quantization, and metadata.
Uses cogito (our agentic framework) to assign the model to the correct family and to obtain the model informations.
Opens a PR with:
- Suggested name, family, and usecases
- Link to HF model
- YAML snippet for import

🔧 Critical Bug Fixes & Stability Improvements

Issue	Fix	Impact
📌 WebUI Crash on Model Load	Fixed `can't evaluate field Name in type string` error	Models now render even without config files
🔁 Deadlock in Model Load/Idle Checks	Guarded against race conditions during model loading	Improved performance under load
📞 Realtime API Compliance	Added `session.created` event; removed redundant `conversation.created`	Works with VoxInput, OpenAI clients, and more
📥 MCP Response Formatting	Output wrapped in `message` field	Matches OpenAI spec — better client compatibility
🛑 JSON Error Responses	Now return clean JSON instead of HTML	Scripts and libraries no longer break on auth failures
🔄 Session Registration	Fixed initial MCP calls failing due to cache issues	Reliable first-time use
🎧 `kokoro` TTS	Returns full audio, not partial	Better for long-form TTS

🚀 The Complete Local Stack for Privacy-First AI

LocalAI	The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. Link: https://github.com/mudler/LocalAI
LocalAGI	A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. Link: https://github.com/mudler/LocalAGI
LocalRecall	A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI. Link: https://github.com/mudler/LocalRecall

❤️ Thank You!

A huge THANK YOU to our growing community! With over 35,000 stars, LocalAI is a true FOSS movement — built by people, for people, with no corporate backing.

If you love privacy-first AI and open source, please:

✅ Star the repo
💬 Contribute code, docs, or feedback
📣 Share with others

Your support keeps this stack alive and evolving!

✅ Full Changelog

📋 Click to expand full changelog

What's Changed

New Contributors

@robert-cronin made their first contribution in fix: handle multi-type arrays in JSON schema to prevent panic #6495
@gmaOCR made their first contribution in feat(api): OpenAI video create enpoint integration #6777
@lukasdotcom made their first contribution in feat: return complete audio for kokoro #6842

Full Changelog: v3.6.0...v3.7.0

This discussion was created from the release v3.7.0.

luxxio11 · 2025-11-06T14:59:09Z

luxxio11
Nov 6, 2025

Hi

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v3.7.0 #6974

Uh oh!

{{title}}

Uh oh!

LocalAI

LocalAGI

LocalRecall

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

v3.7.0 #6974

Uh oh!

mudler Oct 31, 2025 Maintainer

🚀 LocalAI 3.7.0

📌 TL;DR – What’s New in LocalAI 3.7.0

🔥 What’s New in Detail

🤖 Agentic MCP Support – Build Intelligent, Tool-Using AI Agents

✨ Key Features:

🖼️ WebUI enhancements

🎙️ Introducing neutts TTS Backend – Natural Speech, Low Latency

🎛️ Setup Example

🐍 Whisper.cpp enhancements

🔍 Smarter Gallery Search: Fuzzy & Case-Insensitive Matching

🧩 Improved Tool & Schema Handling – No More Crashes

✅ Fixes:

📸 New Video Creation Endpoint: OpenAI-Compatible

📌 Usage Example:

🧠 Qwen 3 VL in llama.cpp

🚀 (CI) Gallery Updater Agent: Auto-Detect & Suggest New Models

✨ How It Works:

🔧 Critical Bug Fixes & Stability Improvements

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

LocalRecall

❤️ Thank You!

✅ Full Changelog

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

Replies: 1 comment

Uh oh!

luxxio11 Nov 6, 2025

mudler
Oct 31, 2025
Maintainer

luxxio11
Nov 6, 2025