Skip to content

add sensevoice stt model#556

Merged
lucasnewman merged 4 commits intoBlaizzy:mainfrom
mm65x:add-sensevoice-stt
Mar 12, 2026
Merged

add sensevoice stt model#556
lucasnewman merged 4 commits intoBlaizzy:mainfrom
mm65x:add-sensevoice-stt

Conversation

@mm65x
Copy link
Copy Markdown
Contributor

@mm65x mm65x commented Mar 9, 2026

Context

SenseVoice Small (https://github.com/FunAudioLLM/SenseVoice) from Alibaba DAMO Academy. does transcription for 50+ languages (strongest on zh/en/ja/ko/yue) plus language id, emotion, and audio event detection. 234M params, runs at about 70ms for 10s of audio on apple silicon.

Description

adds sensevoice to the stt pipeline. the model uses SANM (self-attention with FSMN memory) layers, which are a bit unusual - each attention layer has a parallel depthwise conv branch that acts as a memory mechanism.

the upstream repo only distributes pytorch .pt weights with a yaml config, so this includes a conversion script that produces safetensors + a standard config.json. once there's a converted model on mlx-community/huggingface this won't be needed for most users.

Changes in the codebase

  • mlx_audio/stt/models/sensevoice/sensevoice.py - model implementation (frontend, encoder, ctc head, generate)
  • mlx_audio/stt/models/sensevoice/config.py - config dataclasses
  • mlx_audio/stt/models/sensevoice/convert.py - pt → safetensors converter
  • mlx_audio/stt/models/sensevoice/README.md - setup + usage
  • mlx_audio/stt/models/sensevoice/tests/test_sensevoice.py - 16 tests
  • mlx_audio/stt/models/__init__.py, mlx_audio/stt/utils.py - registration

Changes outside the codebase

none.

Additional information

  • verified component by component against the FunASR and SenseVoice reference repos
  • tested end-to-end on en, zh, ja, ko, yue - transcriptions, language detection, emotion, and event output all correct
  • users need to run the conversion script for now (instructions in the README), could upload a pre-converted model to mlx-community later

Checklist

  • Tests added/updated
  • Documentation updated
  • Issue referenced - STT roadmap

Copy link
Copy Markdown
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Just have one nit before we merge 👌🏽

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this into the correct tests folder?

@lucasnewman
Copy link
Copy Markdown
Collaborator

@mm65x Would you mind uploading to converted model to mlx-community to make it easy for folks to use?

@mm65x
Copy link
Copy Markdown
Contributor Author

mm65x commented Mar 11, 2026

moved to mlx_audio/stt/tests/ in latest commit. let me know if that's the right spot!

i'll also try to get the converted weights up on mlx-community later today.

@mm65x
Copy link
Copy Markdown
Contributor Author

mm65x commented Mar 11, 2026

@lucasnewman i don't have write access to the mlx-community org on huggingface, so i can't upload it there directly. i included the conversion script in this PR so anyone with access should be able to run it and upload the weights! let me know if you want me to upload it to my personal HF instead.

@lucasnewman
Copy link
Copy Markdown
Collaborator

@lucasnewman i don't have write access to the mlx-community org on huggingface, so i can't upload it there directly. i included the conversion script in this PR so anyone with access should be able to run it and upload the weights! let me know if you want me to upload it to my personal HF instead.

You can just add yourself to the community to upload models, but I can do it for you in a bit if needed.

@lucasnewman
Copy link
Copy Markdown
Collaborator

@mm65x I uploaded the model here: https://huggingface.co/mlx-community/SenseVoiceSmall

Can you update the README to use mlx-community/SenseVoiceSmall and remove the conversion section? Then we can merge.

@mm65x
Copy link
Copy Markdown
Contributor Author

mm65x commented Mar 12, 2026

@lucasnewman oh nice, i see you uploaded it to mlx-community/SenseVoiceSmall! i've just removed the convert.py script from this PR and updated the README to point directly to the huggingface repo. let me know if there's anything else needed here.

Copy link
Copy Markdown
Collaborator

@lucasnewman lucasnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

@lucasnewman lucasnewman merged commit 6bd0eea into Blaizzy:main Mar 12, 2026
19 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants