Conversation
|
Great job @lucasnewman, I love the speed! 🚀 The mimi codec will be really useful for some models on our roadmap. FYI, there is an existing repo you can take inspiration from: https://github.com/senstella/csm-mlx I will help out when I return from vacation this coming week. |
|
Thanks for the reference! Basic audio gen is working now -- here's a sample. |
|
@Blaizzy I don't know if you want to use the (unquantized) model I uploaded to HF or another repo -- it's up to you! This is what the output looks like: The voice, speed, & language aren't applicable here but I was trying to be as surgical as possible with the model loading / generate changes. Feel free to change it up to whatever you'd like. |
|
This is phenomenal @lucasnewman, you crushed it! 🔥 I will review and merge tomorrow. As well as, handle the quantization.
Could you upload your copy to mlx-community with the name: and update path
I'm thinking about a general API design. For instance, in my view |
The base model is fp32, not bf16, so I'll put it at |
Yep, that makes sense. We'll need some kind of |
Sure, that makes sense. I'm used to converting it to bf16. |
Got it, let me check a few things and come back with some suggestions |
|
Merged! 🚀 |
Support for the Sesame TTS model, based on the official implementation and pre-trained model here.
Example usage:
python -m mlx_audio.tts.generate --model lucasnewman/csm-1b-mlx --play --text "Hello from Sesame."TODO:
I'll save quantization support as a follow-up since it's not really my area of expertise.