Skip to content

add mms asr model#553

Merged
lucasnewman merged 7 commits intoBlaizzy:mainfrom
mm65x:add-mms-asr
Mar 10, 2026
Merged

add mms asr model#553
lucasnewman merged 7 commits intoBlaizzy:mainfrom
mm65x:add-mms-asr

Conversation

@mm65x
Copy link
Copy Markdown
Contributor

@mm65x mm65x commented Mar 8, 2026

Context

MMS (Massively Multilingual Speech) from Meta supports 1000+ languages through language-specific adapter layers on a shared wav2vec2 backbone. It fills a gap in language coverage that no other model in mlx-audio currently provides.

Description

This adds MMS ASR support by building on the existing wav2vec2 implementation. The main additions are a CTC decoding head, adapter layer support in the encoder, and automatic loading of language-specific adapter weights.

The model loads the base wav2vec2 weights from model.safetensors, then overlays language-specific adapter layers and CTC head from adapter.{lang}.safetensors. Audio is normalized to zero mean and unit variance before processing, matching the HF feature extractor behavior.

Changes in the codebase

  • mlx_audio/stt/models/mms/mms.py: CTC model wrapping wav2vec2, greedy CTC decoding, adapter and vocab loading
  • mlx_audio/stt/models/mms/tests/test_mms.py: 14 unit tests
  • mlx_audio/stt/models/wav2vec/wav2vec.py: added Wav2Vec2AttnAdapterLayer and optional adapter support in stable layer norm encoder layers
  • mlx_audio/stt/utils.py: register "mms" in MODEL_REMAPPING

Changes outside the codebase

None.

Additional information

  • Tested with facebook/mms-1b-fl102, output matches HF transformers reference
  • Adapter layers loaded in post_load_hook from separate safetensors files
  • Supports multi-language vocab.json format (language-keyed)
  • Adapter only added to StableLayerNorm encoder variant, matching HF implementation

Checklist

  • Tests added/updated
  • Documentation updated
  • Issue referenced - STT expansion

@lucasnewman
Copy link
Copy Markdown
Collaborator

@mm65x Looks great! If you can resolve the conflicts we can merge.

@mm65x
Copy link
Copy Markdown
Contributor Author

mm65x commented Mar 9, 2026

rebased on main, conflicts resolved. also removed the unnecessary from_pretrained deprecation wrapper from moonshine while at it

Copy link
Copy Markdown
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@lucasnewman lucasnewman merged commit 3c874c6 into Blaizzy:main Mar 10, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants