add canary stt model (nvidia canary-1b-v2) by mm65x · Pull Request #550 · Blaizzy/mlx-audio

mm65x · 2026-03-07T19:26:13Z

Context

Canary is listed as a planned STT model in the roadmap (#1). NVIDIA's canary-1b-v2 is a top performer on the Open ASR Leaderboard (7.15% avg WER) with support for 25 EU languages plus Russian and Ukrainian, including cross-language translation.

Description

Adds a complete Canary model implementation for mlx-audio's STT pipeline. The model uses a FastConformer encoder (reusing the existing parakeet conformer) paired with a Transformer decoder with cross-attention for autoregressive text generation. Weights are loaded from safetensors converted from NVIDIA's .nemo format.

Changes in the codebase

mlx_audio/stt/models/canary/canary.py: model class with generate(), sanitize() for NeMo weight mapping, and audio preprocessing
mlx_audio/stt/models/canary/decoder.py: transformer decoder with self-attention, cross-attention, fixed positional encoding, and KV-cache
mlx_audio/stt/models/canary/config.py: model configuration dataclasses
mlx_audio/stt/models/canary/tokenizer.py: sentencepiece tokenizer wrapper with canary prompt format
mlx_audio/stt/models/canary/__init__.py: module exports
mlx_audio/stt/utils.py: register "canary" in MODEL_REMAPPING

Changes outside the codebase

None.

Additional information

Encoder reuses parakeet/conformer.py directly, no duplication
Tested with canary-1b-v2: English/German transcription and bidirectional translation all produce correct output
The model has no encoder-decoder projection (Identity) since encoder and decoder dimensions match (1024)
Decoder uses 8 transformer layers with pre-layer-norm and a final layer norm

Checklist

Tests added/updated
Documentation updated
Issue referenced (e.g., "Closes #...") - addresses Canary item in TTS and STS Models to port to MLX-Audio (Roadmap) #1

Blaizzy · 2026-03-07T20:53:58Z

Awesome, this was one of the top in our backlog

Could you add a model readme (with inference examples) in the canary folder and link it in the main readme?

mm65x · 2026-03-08T12:58:24Z

Added a README in the canary folder with usage examples and linked it from the main README's STT table.

mm65x · 2026-03-08T13:04:15Z

Also, thank you for this library! I'm building a local ASR app for Mac and mlx-audio has been a great option. I've got a couple more models in the pipeline that I'd like to contribute and will open PRs for them too

lucasnewman · 2026-03-08T16:44:49Z

Please run the formatter: pre-commit run --all so we can clear tests, otherwise looks great!

mm65x · 2026-03-08T19:20:25Z

done, ran the formatter

lucasnewman

🚀

add canary stt model (nvidia canary-1b-v2)

4f3efa1

mm65x mentioned this pull request Mar 7, 2026

TTS and STS Models to port to MLX-Audio (Roadmap) #1

Open

26 tasks

mm65x added 4 commits March 7, 2026 19:35

clean up comments

73b7b6f

add unit tests

7a3c60a

cleanup

7f161e0

cleanup

f3db244

add readme and docs

3a61ee9

run formatter

9a056b0

lucasnewman approved these changes Mar 8, 2026

View reviewed changes

lucasnewman merged commit e3f3f2b into Blaizzy:main Mar 8, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add canary stt model (nvidia canary-1b-v2)#550

add canary stt model (nvidia canary-1b-v2)#550
lucasnewman merged 7 commits intoBlaizzy:mainfrom
mm65x:add-canary-stt

mm65x commented Mar 7, 2026 •

edited

Loading

Uh oh!

Blaizzy commented Mar 7, 2026

Uh oh!

mm65x commented Mar 8, 2026

Uh oh!

mm65x commented Mar 8, 2026

Uh oh!

lucasnewman commented Mar 8, 2026

Uh oh!

mm65x commented Mar 8, 2026

Uh oh!

lucasnewman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mm65x commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Description

Changes in the codebase

Changes outside the codebase

Additional information

Checklist

Uh oh!

Blaizzy commented Mar 7, 2026

Uh oh!

mm65x commented Mar 8, 2026

Uh oh!

mm65x commented Mar 8, 2026

Uh oh!

lucasnewman commented Mar 8, 2026

Uh oh!

mm65x commented Mar 8, 2026

Uh oh!

lucasnewman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mm65x commented Mar 7, 2026 •

edited

Loading