Add Irodori-TTS: Japanese TTS model port to MLX by yoshphys · Pull Request #591 · Blaizzy/mlx-audio

yoshphys · 2026-03-21T10:20:21Z

Summary

Port Aratako/Irodori-TTS-500M to mlx-audio as a new TTS model (irodori_tts)
Japanese TTS based on Echo TTS architecture, using Rectified Flow diffusion + DACVAE codec (48kHz, 128-dim latents)
Adds "irodori_tts" entry to MODEL_REMAPPING in tts/utils.py

New files

File	Description
`models/irodori_tts/model.py`	IrodoriDiT architecture (JointAttention, LowRankAdaLN, SwiGLU, RoPE)
`models/irodori_tts/irodori_tts.py`	TTS wrapper (Model class, DACVAE loading, generate pipeline)
`models/irodori_tts/config.py`	IrodoriDiTConfig, SamplerConfig, ModelConfig
`models/irodori_tts/sampling.py`	Euler sampler with CFG (independent/alternating/joint modes)
`models/irodori_tts/text.py`	Japanese text normalization + HuggingFace tokenizer wrapper
`models/irodori_tts/convert.py`	Weight conversion script: PyTorch → MLX fp16 (DiT + DACVAE)
`models/irodori_tts/README.md`	Usage docs, memory requirements, conversion instructions
`tests/test_irodori_tts.py`	28 unit tests (all passing)

Test plan

Run unit tests: python -m unittest mlx_audio.tts.tests.test_irodori_tts -v
Convert weights: python -m mlx_audio.tts.models.irodori_tts.convert (requires torch)
Run inference: python -m mlx_audio.tts.generate --model ./Irodori-TTS-500M-fp16 --text "こんにちは"
On 16GB machines, use sequence_length=300 and cfg_guidance_mode=alternating to stay within memory limits

🤖 Generated with Claude Code

Blaizzy

Hey @yoshphys
Thanks for the contribution!

I just have a few nits:

Please share a audio sample of the port and the source model so we can compare.
Not needed: models/irodori_tts/convert.py
Please move tests/test_irodori_tts.py to test_models.py and follow the format there.
Upload a converted model to mlx-community on Huggingface (4bit, 5bit, 6bit, 8bit, bf16)

yoshphys · 2026-03-22T02:23:50Z

Update

Tests moved to `test_models.py`

All Irodori-TTS tests have been consolidated into test_models.py following the existing format.

`convert.py` removed

Removed from the PR (kept locally for reference, not needed by end users).

Quantized models uploaded to mlx-community

Audio comparison

Text: 「お電話ありがとうございます。ただいま電話が大変混み合っております。恐れ入りますが、発信音のあとに、ご用件をお話しください。」

	Audio
Original (Aratako/Irodori-TTS-500M, PyTorch)	comparison_original.wav
MLX port (fp16, `sequence_length=400`, `cfg_guidance_mode=alternating`)	comparison_mlx_fp16.wav

Port Aratako/Irodori-TTS-500M to mlx-audio. The model uses a DiT (Diffusion Transformer) with Rectified Flow sampling and DACVAE codec (48kHz, 128-dim latents). Key components: - IrodoriDiT: JointAttention (self+text+speaker), LowRankAdaLN, SwiGLU - Euler sampler with CFG (independent/alternating/joint modes) - Japanese text normalization + HuggingFace tokenizer (llm-jp/llm-jp-3-150m) - DACVAE codec loaded from facebook/dacvae-watermarked via convert.py - convert.py: converts PyTorch weights to MLX fp16 safetensors - 28 unit tests covering architecture, text processing, sanitize, and smoke tests Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Remove models/irodori_tts/convert.py from tracking (reviewer: not needed) - Delete tests/test_irodori_tts.py; move all 26 Irodori-TTS tests into tests/test_models.py following the established format (imports inside test methods, module-level stubs/helpers) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

mlx_audio/tts/models/irodori_tts/README.md

lucasnewman

See minor comment but looks good to me!

convert.py was removed in a previous commit; the section is no longer needed. Also update DACVAE download note to reflect that weights are fetched automatically on first use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Blaizzy

LGTM, thanks!

Blaizzy requested changes Mar 21, 2026

View reviewed changes

yoshphys and others added 4 commits March 22, 2026 11:42

Fix black formatting in irodori_tts model files

7bb1ebc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Fix black formatting in model.py and text.py

46767d6

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

yoshphys force-pushed the feature/irodori-tts branch from 7d8698f to 46767d6 Compare March 22, 2026 02:42

lucasnewman reviewed Mar 23, 2026

View reviewed changes

mlx_audio/tts/models/irodori_tts/README.md Outdated Show resolved Hide resolved

lucasnewman approved these changes Mar 23, 2026

View reviewed changes

lucasnewman requested a review from Blaizzy March 23, 2026 16:02

yoshphys and others added 2 commits March 24, 2026 09:59

Remove Conversion section from Irodori-TTS README

b243c65

convert.py was removed in a previous commit; the section is no longer needed. Also update DACVAE download note to reflect that weights are fetched automatically on first use. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into feature/irodori-tts

3c95188

Blaizzy approved these changes Mar 24, 2026

View reviewed changes

lucasnewman merged commit 6c513de into Blaizzy:main Mar 24, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Irodori-TTS: Japanese TTS model port to MLX#591

Add Irodori-TTS: Japanese TTS model port to MLX#591
lucasnewman merged 6 commits intoBlaizzy:mainfrom
yoshphys:feature/irodori-tts

yoshphys commented Mar 21, 2026

Uh oh!

Blaizzy left a comment

Uh oh!

yoshphys commented Mar 22, 2026

Uh oh!

Uh oh!

lucasnewman left a comment

Uh oh!

Blaizzy left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

yoshphys commented Mar 21, 2026

Summary

New files

Test plan

Uh oh!

Blaizzy left a comment

Choose a reason for hiding this comment

Uh oh!

yoshphys commented Mar 22, 2026

Update

Tests moved to test_models.py

convert.py removed

Quantized models uploaded to mlx-community

Audio comparison

Uh oh!

Uh oh!

lucasnewman left a comment

Choose a reason for hiding this comment

Uh oh!

Blaizzy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Tests moved to `test_models.py`

`convert.py` removed