add mms asr model by mm65x · Pull Request #553 · Blaizzy/mlx-audio

mm65x · 2026-03-08T20:34:33Z

Context

MMS (Massively Multilingual Speech) from Meta supports 1000+ languages through language-specific adapter layers on a shared wav2vec2 backbone. It fills a gap in language coverage that no other model in mlx-audio currently provides.

Description

This adds MMS ASR support by building on the existing wav2vec2 implementation. The main additions are a CTC decoding head, adapter layer support in the encoder, and automatic loading of language-specific adapter weights.

The model loads the base wav2vec2 weights from model.safetensors, then overlays language-specific adapter layers and CTC head from adapter.{lang}.safetensors. Audio is normalized to zero mean and unit variance before processing, matching the HF feature extractor behavior.

Changes in the codebase

mlx_audio/stt/models/mms/mms.py: CTC model wrapping wav2vec2, greedy CTC decoding, adapter and vocab loading
mlx_audio/stt/models/mms/tests/test_mms.py: 14 unit tests
mlx_audio/stt/models/wav2vec/wav2vec.py: added Wav2Vec2AttnAdapterLayer and optional adapter support in stable layer norm encoder layers
mlx_audio/stt/utils.py: register "mms" in MODEL_REMAPPING

Changes outside the codebase

None.

Additional information

Tested with facebook/mms-1b-fl102, output matches HF transformers reference
Adapter layers loaded in post_load_hook from separate safetensors files
Supports multi-language vocab.json format (language-keyed)
Adapter only added to StableLayerNorm encoder variant, matching HF implementation

Checklist

Tests added/updated
Documentation updated
Issue referenced - STT expansion

lucasnewman · 2026-03-09T19:05:43Z

@mm65x Looks great! If you can resolve the conflicts we can merge.

mm65x · 2026-03-09T20:07:22Z

rebased on main, conflicts resolved. also removed the unnecessary from_pretrained deprecation wrapper from moonshine while at it

Blaizzy

LGTM, thanks!

mm65x added 7 commits March 9, 2026 20:04

add mms asr model with ctc decoding

ccb6fbf

add mms asr model with ctc decoding (wip)

4d9f29f

add adapter layer support for mms

184186e

add audio normalization and readme

d50fd32

fix adapter placement to match hf

7a35111

run formatter

ac569b4

remove unnecessary from_pretrained deprecation wrapper from moonshine

c4d7b07

mm65x force-pushed the add-mms-asr branch from 06a54ec to c4d7b07 Compare March 9, 2026 20:06

Blaizzy approved these changes Mar 9, 2026

View reviewed changes

lucasnewman merged commit 3c874c6 into Blaizzy:main Mar 10, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add mms asr model#553

add mms asr model#553
lucasnewman merged 7 commits intoBlaizzy:mainfrom
mm65x:add-mms-asr

mm65x commented Mar 8, 2026

Uh oh!

lucasnewman commented Mar 9, 2026

Uh oh!

mm65x commented Mar 9, 2026

Uh oh!

Blaizzy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mm65x commented Mar 8, 2026

Context

Description

Changes in the codebase

Changes outside the codebase

Additional information

Checklist

Uh oh!

lucasnewman commented Mar 9, 2026

Uh oh!

mm65x commented Mar 9, 2026

Uh oh!

Blaizzy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants