-
-
Notifications
You must be signed in to change notification settings - Fork 523
Open
Labels
good first issueGood for newcomersGood for newcomers
Description
Overview
This issue outlines our roadmap for integrating additional text-to-speech (TTS) and speech-to-speech (STS) models into the MLX-Audio library to expand our offerings beyond the current Kokoro model.
Text-to-Speech (TTS) Models
Planned TTS Models
- Nari Labs Dia 1.6B
- OuteTTS v1
- Orpheus
- BARK
- SparkTTS 0.5B
- Sesame CSM-1B
- IndexTTS
- ChatterBox
- VibeVoice
- VyoTTS
- MegaTTS
- Zonos
- CosyVoice2
- StyleTTS2
- Parler TTS
- ibm-granite/granite-speech-3.2-8b
- LLMVoX
- MeloTTS
- bosonai/higgs-audio-v2
Speech-to-Speech (STS) Models
Planned STS Models
- Kyutai-Labs Moshi
- Kyutai-Labs Moshi-vis
Speech-to-text (STT)
- Whisper
- Parakeet
- Wav2vec
- Voxtral
- Canary
Technical Considerations
- All models will need MLX-specific optimizations
- Quantization support should be implemented for each model
- Documentation and examples will be created for each new model
- Performance benchmarks will be established
Instructions:
- Select the model and comment below with your selection
- Create a Draft PR titled: "Add support for X"
- Read Contribution guide
- Check existing models
- Tag @Blaizzy for code reviews and questions.
Community Input
We welcome community feedback on prioritization and additional model suggestions. Please comment on this issue with your thoughts.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers