MLX Audio Plus

Motivation

This fork removes a large amount of cruft (incompatibly licensed code and data that should not be included in the repo) from Blaizzy/mlx-audio. In addition to the models from that repo, this one includes improvements as well as the following new models ported to MLX in Python:

TTS
STT
- Fun-ASR

Improvements to the upstream repo will continue to be merged here.

This repo also serves as the basis for Swift ports of models in mlx-swift-audio.

Installation

pip install mlx-audio-plus

Usage

CLI

# CosyVoice 3: cross-lingual mode (reference audio only)
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "Hello, this is a test of text to speech." \
    --ref_audio reference.wav

# CosyVoice 3: zero-shot mode (with transcription for better quality)
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "Hello, this is a test of text to speech." \
    --ref_audio reference.wav \
    --ref_text "This is what I said in the reference audio."

# CosyVoice 3: instruct mode with style control
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "I have exciting news!" \
    --ref_audio reference.wav \
    --instruct_text "Speak with excitement and enthusiasm"

# CosyVoice 3: voice conversion
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --ref_audio target_speaker.wav \
    --source_audio source_speech.wav

# Play audio directly instead of saving
mlx_audio.tts.generate --model mlx-community/Fun-CosyVoice3-0.5B-2512-4bit \
    --text "Hello world" \
    --ref_audio reference.wav \
    --play

# Chatterbox: generate speech from reference audio
mlx_audio.tts.generate --model mlx-community/Chatterbox-TTS-4bit \
    --text "The quick brown fox jumped over the lazy dog." \
    --ref_audio reference.wav

Python

from mlx_audio.tts.generate import generate_audio

# CosyVoice 3: cross-lingual mode (reference audio only)
generate_audio(
    text="Hello, this is a test of text to speech.",
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="reference.wav",
    file_prefix="output",  # Optional
    audio_format="wav",  # Optional
)

# CosyVoice 3: zero-shot mode (with transcription for better quality)
generate_audio(
    text="Bonjour, comment allez-vous aujourd'hui?",
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="reference.wav",
    ref_text="This is what I said in the reference audio.",
)

# CosyVoice 3: instruct mode with style control
generate_audio(
    text="I have some exciting news to share with you!",
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="reference.wav",
    instruct_text="Speak with excitement and enthusiasm",
)

# CosyVoice 3: voice conversion (convert source audio to target speaker)
generate_audio(
    model="mlx-community/Fun-CosyVoice3-0.5B-2512-4bit",
    ref_audio="target_speaker.wav",  # Target voice
    source_audio="source_speech.wav",
)

# Chatterbox: generate speech from reference audio
generate_audio(
    text="The quick brown fox jumped over the lazy dog.",
    model="mlx-community/Chatterbox-TTS-4bit",
    ref_audio="reference.wav",
)

Speech to text

from mlx_audio.stt.models.funasr import Model

# Fun-ASR

# Load the model
model = Model.from_pretrained("mlx-community/Fun-ASR-Nano-2512-4bit")

# Basic transcription
result = model.generate("audio.wav")
print(result.text)

# Translation (speech to English text)
result = model.generate(
    "chinese_speech.wav",
    task="translate",
    target_language="en"
)

# Custom prompting for domain-specific content
result = model.generate(
    "medical_dictation.wav",
    initial_prompt="Medical consultation discussing cardiac symptoms."
)

# Streaming output
for chunk in model.generate("audio.wav", stream=True):
    print(chunk, end="", flush=True)

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.claude		.claude
.github/workflows		.github/workflows
docs		docs
examples		examples
licenses		licenses
mlx_audio		mlx_audio
mlx_audio_swift/tts/MLXAudio/Kokoro/Chinese		mlx_audio_swift/tts/MLXAudio/Kokoro/Chinese
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MLX Audio Plus

Motivation

Installation

Usage

CLI

Python

Speech to text

About

Uh oh!

Releases 8

Packages

Languages

License

DePasqualeOrg/mlx-audio-plus

Folders and files

Latest commit

History

Repository files navigation

MLX Audio Plus

Motivation

Installation

Usage

CLI

Python

Speech to text

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Languages

Packages