add canary stt model (nvidia canary-1b-v2)#550
Merged
lucasnewman merged 7 commits intoBlaizzy:mainfrom Mar 8, 2026
Merged
Conversation
26 tasks
Owner
|
Awesome, this was one of the top in our backlog Could you add a model readme (with inference examples) in the canary folder and link it in the main readme? |
Contributor
Author
|
Added a README in the canary folder with usage examples and linked it from the main README's STT table. |
Contributor
Author
|
Also, thank you for this library! I'm building a local ASR app for Mac and mlx-audio has been a great option. I've got a couple more models in the pipeline that I'd like to contribute and will open PRs for them too |
Collaborator
|
Please run the formatter: |
Contributor
Author
|
done, ran the formatter |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
Canary is listed as a planned STT model in the roadmap (#1). NVIDIA's canary-1b-v2 is a top performer on the Open ASR Leaderboard (7.15% avg WER) with support for 25 EU languages plus Russian and Ukrainian, including cross-language translation.
Description
Adds a complete Canary model implementation for mlx-audio's STT pipeline. The model uses a FastConformer encoder (reusing the existing parakeet conformer) paired with a Transformer decoder with cross-attention for autoregressive text generation. Weights are loaded from safetensors converted from NVIDIA's .nemo format.
Changes in the codebase
mlx_audio/stt/models/canary/canary.py: model class withgenerate(),sanitize()for NeMo weight mapping, and audio preprocessingmlx_audio/stt/models/canary/decoder.py: transformer decoder with self-attention, cross-attention, fixed positional encoding, and KV-cachemlx_audio/stt/models/canary/config.py: model configuration dataclassesmlx_audio/stt/models/canary/tokenizer.py: sentencepiece tokenizer wrapper with canary prompt formatmlx_audio/stt/models/canary/__init__.py: module exportsmlx_audio/stt/utils.py: register "canary" in MODEL_REMAPPINGChanges outside the codebase
None.
Additional information
parakeet/conformer.pydirectly, no duplicationChecklist