Conversation
Blaizzy
left a comment
There was a problem hiding this comment.
LGTM!
Just have one nit before we merge 👌🏽
There was a problem hiding this comment.
Could you move this into the correct tests folder?
|
@mm65x Would you mind uploading to converted model to |
|
moved to i'll also try to get the converted weights up on mlx-community later today. |
|
@lucasnewman i don't have write access to the |
You can just add yourself to the community to upload models, but I can do it for you in a bit if needed. |
|
@mm65x I uploaded the model here: https://huggingface.co/mlx-community/SenseVoiceSmall Can you update the README to use |
|
@lucasnewman oh nice, i see you uploaded it to |
Context
SenseVoice Small (https://github.com/FunAudioLLM/SenseVoice) from Alibaba DAMO Academy. does transcription for 50+ languages (strongest on zh/en/ja/ko/yue) plus language id, emotion, and audio event detection. 234M params, runs at about 70ms for 10s of audio on apple silicon.
Description
adds sensevoice to the stt pipeline. the model uses SANM (self-attention with FSMN memory) layers, which are a bit unusual - each attention layer has a parallel depthwise conv branch that acts as a memory mechanism.
the upstream repo only distributes pytorch
.ptweights with a yaml config, so this includes a conversion script that produces safetensors + a standard config.json. once there's a converted model on mlx-community/huggingface this won't be needed for most users.Changes in the codebase
mlx_audio/stt/models/sensevoice/sensevoice.py- model implementation (frontend, encoder, ctc head, generate)mlx_audio/stt/models/sensevoice/config.py- config dataclassesmlx_audio/stt/models/sensevoice/convert.py- pt → safetensors convertermlx_audio/stt/models/sensevoice/README.md- setup + usagemlx_audio/stt/models/sensevoice/tests/test_sensevoice.py- 16 testsmlx_audio/stt/models/__init__.py,mlx_audio/stt/utils.py- registrationChanges outside the codebase
none.
Additional information
Checklist