add sensevoice stt model by mm65x · Pull Request #556 · Blaizzy/mlx-audio

mm65x · 2026-03-09T20:52:20Z

Context

SenseVoice Small (https://github.com/FunAudioLLM/SenseVoice) from Alibaba DAMO Academy. does transcription for 50+ languages (strongest on zh/en/ja/ko/yue) plus language id, emotion, and audio event detection. 234M params, runs at about 70ms for 10s of audio on apple silicon.

Description

adds sensevoice to the stt pipeline. the model uses SANM (self-attention with FSMN memory) layers, which are a bit unusual - each attention layer has a parallel depthwise conv branch that acts as a memory mechanism.

the upstream repo only distributes pytorch .pt weights with a yaml config, so this includes a conversion script that produces safetensors + a standard config.json. once there's a converted model on mlx-community/huggingface this won't be needed for most users.

Changes in the codebase

mlx_audio/stt/models/sensevoice/sensevoice.py - model implementation (frontend, encoder, ctc head, generate)
mlx_audio/stt/models/sensevoice/config.py - config dataclasses
mlx_audio/stt/models/sensevoice/convert.py - pt → safetensors converter
mlx_audio/stt/models/sensevoice/README.md - setup + usage
mlx_audio/stt/models/sensevoice/tests/test_sensevoice.py - 16 tests
mlx_audio/stt/models/__init__.py, mlx_audio/stt/utils.py - registration

Changes outside the codebase

none.

Additional information

verified component by component against the FunASR and SenseVoice reference repos
tested end-to-end on en, zh, ja, ko, yue - transcriptions, language detection, emotion, and event output all correct
users need to run the conversion script for now (instructions in the README), could upload a pre-converted model to mlx-community later

Checklist

Tests added/updated
Documentation updated
Issue referenced - STT roadmap

Blaizzy

LGTM!

Just have one nit before we merge 👌🏽

Blaizzy · 2026-03-09T22:41:09Z

mlx_audio/stt/tests/test_sensevoice.py

Could you move this into the correct tests folder?

lucasnewman · 2026-03-10T01:18:16Z

@mm65x Would you mind uploading to converted model to mlx-community to make it easy for folks to use?

mm65x · 2026-03-11T12:03:37Z

moved to mlx_audio/stt/tests/ in latest commit. let me know if that's the right spot!

i'll also try to get the converted weights up on mlx-community later today.

mm65x · 2026-03-11T12:13:12Z

@lucasnewman i don't have write access to the mlx-community org on huggingface, so i can't upload it there directly. i included the conversion script in this PR so anyone with access should be able to run it and upload the weights! let me know if you want me to upload it to my personal HF instead.

lucasnewman · 2026-03-11T17:20:51Z

@lucasnewman i don't have write access to the mlx-community org on huggingface, so i can't upload it there directly. i included the conversion script in this PR so anyone with access should be able to run it and upload the weights! let me know if you want me to upload it to my personal HF instead.

You can just add yourself to the community to upload models, but I can do it for you in a bit if needed.

lucasnewman · 2026-03-12T04:33:54Z

@mm65x I uploaded the model here: https://huggingface.co/mlx-community/SenseVoiceSmall

Can you update the README to use mlx-community/SenseVoiceSmall and remove the conversion section? Then we can merge.

mm65x · 2026-03-12T19:19:37Z

@lucasnewman oh nice, i see you uploaded it to mlx-community/SenseVoiceSmall! i've just removed the convert.py script from this PR and updated the README to point directly to the huggingface repo. let me know if there's anything else needed here.

lucasnewman

🚀

mm65x added 2 commits March 9, 2026 20:51

add sensevoice stt model

ebac3f1

add sensevoice unit tests

0a857a8

Blaizzy reviewed Mar 9, 2026

View reviewed changes

mlx_audio/stt/tests/test_sensevoice.py

Copy link
Copy Markdown

Owner

Blaizzy Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this into the correct tests folder?

move tests to correct directory

79c3c72

remove convert script and update README to use mlx-community model

6d1cd64

lucasnewman approved these changes Mar 12, 2026

View reviewed changes

lucasnewman merged commit 6bd0eea into Blaizzy:main Mar 12, 2026
19 of 20 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add sensevoice stt model#556

add sensevoice stt model#556
lucasnewman merged 4 commits intoBlaizzy:mainfrom
mm65x:add-sensevoice-stt

mm65x commented Mar 9, 2026 •

edited

Loading

Uh oh!

Blaizzy left a comment

Uh oh!

Blaizzy Mar 9, 2026

Uh oh!

lucasnewman commented Mar 10, 2026

Uh oh!

mm65x commented Mar 11, 2026

Uh oh!

mm65x commented Mar 11, 2026

Uh oh!

lucasnewman commented Mar 11, 2026

Uh oh!

lucasnewman commented Mar 12, 2026

Uh oh!

mm65x commented Mar 12, 2026

Uh oh!

lucasnewman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mm65x commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Description

Changes in the codebase

Changes outside the codebase

Additional information

Checklist

Uh oh!

Blaizzy left a comment

Choose a reason for hiding this comment

Uh oh!

Blaizzy Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

lucasnewman commented Mar 10, 2026

Uh oh!

mm65x commented Mar 11, 2026

Uh oh!

mm65x commented Mar 11, 2026

Uh oh!

lucasnewman commented Mar 11, 2026

Uh oh!

lucasnewman commented Mar 12, 2026

Uh oh!

mm65x commented Mar 12, 2026

Uh oh!

lucasnewman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mm65x commented Mar 9, 2026 •

edited

Loading