add ACE-Step text-to-audio model by mm65x · Pull Request #575 · Blaizzy/mlx-audio

mm65x · 2026-03-14T21:16:11Z

Context

ACE-Step 1.5 (https://github.com/ace-step/ACE-Step-1.5) by StepFun. It is a state-of-the-art text-to-audio model capable of generating full songs with vocals and instrumentation directly from text prompts. It is highly efficient and designed for consumer hardware.

Description

adds ACE-Step to the tts pipeline. the model is a hybrid architecture utilizing an LLM (Qwen3) as a text-conditioner/planner, a Diffusion Transformer (DiT) to generate audio latents, and an Oobleck Autoencoder VAE to decode latents back to PCM audio.

the upstream repo provides a partial MLX backend for the DiT and VAE, but relies on PyTorch and transformers for the AceStepConditionEncoder and the text/lyric embedding phases. this PR fully ports the remaining components (AceStepConditionEncoder and AceStepLyricEncoder) to pure MLX and integrates the text prompting via mlx-lm, making the entire generation pipeline 100% native MLX with zero PyTorch runtime dependencies.

also includes a conversion script since upstream only distributes pytorch weights.

Changes in the codebase

mlx_audio/tts/models/acestep/acestep.py - model implementation and pipeline logic
mlx_audio/tts/models/acestep/conditioner.py - pure MLX port of AceStepConditionEncoder and AceStepLyricEncoder
mlx_audio/tts/models/acestep/config.py - config dataclasses
mlx_audio/tts/models/acestep/convert.py - pt -> safetensors conversion script
mlx_audio/tts/models/acestep/dit.py - MLX DiT decoder
mlx_audio/tts/models/acestep/generate_utils.py - MLX diffusion loops
mlx_audio/tts/models/acestep/vae.py - MLX VAE decoder
mlx_audio/tts/models/acestep/README.md - setup + usage
mlx_audio/tts/models/__init__.py, mlx_audio/tts/utils.py - registration
mlx_audio/tts/tests/test_acestep.py - unit tests for the pure MLX condition encoder

Changes outside the codebase

none.

Additional information

fully strips out the need for transformers and diffusers during runtime
users need to run the conversion script for now (instructions in the README)

Checklist

Tests added/updated
Documentation updated
Issue referenced - closes Model Request: ACE-Step #114

Blaizzy · 2026-03-14T22:50:09Z

Awesome work @mm65x!

However there is already an existing PR I created #499

Just missing a couple things and some design decisions for supporting SFX models.

To avoid duplicate work, I would recommend check it out and sending a PR to that branch if you have any of the missing pieces working.

Blaizzy · 2026-03-14T22:51:37Z

How about we close this and collaborate on #499?

mm65x · 2026-03-14T23:24:19Z

oops! closing in favor of #499 which already has this fully implemented with more robust token handling and audio features

Blaizzy · 2026-03-14T23:37:32Z

No worries!

My bad, since sfx models are new and this model is quite complex(lots of moving pieces) I put it on hold for a couple weeks to push our Swift SDK and improve inference here.

Blaizzy · 2026-03-15T11:27:17Z

Will share updates in #499 later today or tomorrow.

mm65x added 5 commits March 14, 2026 21:15

add ACE-Step TTA model (music generation) with full native MLX port

a0163cb

format acestep files

bc089b9

add acestep unit tests

9fd7507

fix acestep tensor shapes

38f4c34

fix condition encoder empty timbre token mismatch

8fbb9d7

mm65x closed this Mar 14, 2026

mm65x reopened this Mar 15, 2026

mm65x closed this Mar 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add ACE-Step text-to-audio model#575

add ACE-Step text-to-audio model#575
mm65x wants to merge 5 commits intoBlaizzy:mainfrom
mm65x:add-acestep-tta

mm65x commented Mar 14, 2026 •

edited

Loading

Uh oh!

Blaizzy commented Mar 14, 2026

Uh oh!

Blaizzy commented Mar 14, 2026 •

edited

Loading

Uh oh!

mm65x commented Mar 14, 2026 •

edited

Loading

Uh oh!

Blaizzy commented Mar 14, 2026

Uh oh!

Blaizzy commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mm65x commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Description

Changes in the codebase

Changes outside the codebase

Additional information

Checklist

Uh oh!

Blaizzy commented Mar 14, 2026

Uh oh!

Blaizzy commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mm65x commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Blaizzy commented Mar 14, 2026

Uh oh!

Blaizzy commented Mar 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mm65x commented Mar 14, 2026 •

edited

Loading

Blaizzy commented Mar 14, 2026 •

edited

Loading

mm65x commented Mar 14, 2026 •

edited

Loading