11# vLLM-MLX
22
3- ** Apple Silicon MLX Backend for vLLM ** - GPU-accelerated LLM inference on Mac
3+ ** vLLM-like inference for Apple Silicon ** - GPU-accelerated Text, Image, Video & Audio on Mac
44
55[ ![ License] ( https://img.shields.io/badge/License-Apache_2.0-blue.svg )] ( LICENSE )
66[ ![ Python 3.10+] ( https://img.shields.io/badge/python-3.10+-blue.svg )] ( https://www.python.org/downloads/ )
@@ -14,11 +14,13 @@ vllm-mlx brings native Apple Silicon GPU acceleration to vLLM by integrating:
1414- ** [ MLX] ( https://github.com/ml-explore/mlx ) ** : Apple's ML framework with unified memory and Metal kernels
1515- ** [ mlx-lm] ( https://github.com/ml-explore/mlx-lm ) ** : Optimized LLM inference with KV cache and quantization
1616- ** [ mlx-vlm] ( https://github.com/Blaizzy/mlx-vlm ) ** : Vision-language models for multimodal inference
17+ - ** [ mlx-audio] ( https://github.com/Blaizzy/mlx-audio ) ** : Speech-to-Text and Text-to-Speech with native voices
1718
1819## Features
1920
21+ - ** Multimodal** - Text, Image, Video & Audio in one platform
2022- ** Native GPU acceleration** on Apple Silicon (M1, M2, M3, M4)
21- - ** Vision-language models ** - image, video, and audio understanding
23+ - ** Native TTS voices ** - Spanish, French, Chinese, Japanese + 5 more languages
2224- ** OpenAI API compatible** - drop-in replacement for OpenAI client
2325- ** MCP Tool Calling** - integrate external tools via Model Context Protocol
2426- ** Paged KV Cache** - memory-efficient caching with prefix sharing
@@ -77,6 +79,35 @@ response = client.chat.completions.create(
7779)
7880```
7981
82+ ### Audio (TTS/STT)
83+
84+ ``` bash
85+ # Install audio dependencies
86+ pip install vllm-mlx[audio]
87+ python -m spacy download en_core_web_sm
88+ brew install espeak-ng # macOS, for non-English languages
89+ ```
90+
91+ ``` bash
92+ # Text-to-Speech (English)
93+ python examples/tts_example.py " Hello, how are you?" --play
94+
95+ # Text-to-Speech (Spanish)
96+ python examples/tts_multilingual.py " Hola mundo" --lang es --play
97+
98+ # List available models and languages
99+ python examples/tts_multilingual.py --list-models
100+ python examples/tts_multilingual.py --list-languages
101+ ```
102+
103+ ** Supported TTS Models:**
104+ | Model | Languages | Description |
105+ | -------| -----------| -------------|
106+ | Kokoro | EN, ES, FR, JA, ZH, IT, PT, HI | Fast, 82M params, 11 voices |
107+ | Chatterbox | 15+ languages | Expressive, voice cloning |
108+ | VibeVoice | EN | Realtime, low latency |
109+ | VoxCPM | ZH, EN | High quality Chinese/English |
110+
80111## Documentation
81112
82113For full documentation, see the [ docs] ( docs/ ) directory:
@@ -89,6 +120,7 @@ For full documentation, see the [docs](docs/) directory:
89120 - [ OpenAI-Compatible Server] ( docs/guides/server.md )
90121 - [ Python API] ( docs/guides/python-api.md )
91122 - [ Multimodal (Images & Video)] ( docs/guides/multimodal.md )
123+ - [ Audio (STT/TTS)] ( docs/guides/audio.md )
92124 - [ MCP & Tool Calling] ( docs/guides/mcp-tools.md )
93125 - [ Continuous Batching] ( docs/guides/continuous-batching.md )
94126
0 commit comments