fix(tts): allow streamed generation to save audio to disk by mvdirty · Pull Request #608 · Blaizzy/mlx-audio

mvdirty · 2026-03-27T14:59:19Z

mvdirty note: This work was specified by me, and I am a human. The changes were implemented, documentation was updated, and this PR description was (except for this note) prepared by my clanker, mvagent. I have pretty deep development experience, but not in python, and this is me dipping my toes in just enough to solve a need. The changes seem pretty clean, and we have performed a good range of both manual and automated testing of them, but let me know if anything needs updating to be more idiomatic of either python or the mlx-audio repo style.

Context

mlx_audio.tts.generate supports normal file output and also supports low-latency streaming playback with --stream, but the streaming path skipped file writes entirely. That made it hard to use mlx-audio in workflows that need both immediate listening and a saved file for later playback or downstream processing.

This change adds an explicit --save flag for streaming mode so users can opt into both behaviors at once without changing the existing default behavior of --stream.

Description

The TTS CLI save/stream behavior is controlled in mlx_audio/tts/generate.py.

Previously:

non-streaming generation wrote files to disk
--stream queued audio to the live player
the streaming path bypassed the normal output-writing logic

This change introduces --save for use with --stream and preserves the usual mlx-audio file naming and output handling:

--stream still streams audio during generation and does not write files by default
--stream --save writes streamed output to disk after generation completes
--stream --save preserves the usual numbered segment naming such as audio_000.wav
--stream --save --join_audio writes one combined file such as audio.wav
output_path and file_prefix are preserved in the streamed save path
--save without --stream is rejected with a parser error

Changes in the codebase

mlx_audio/tts/generate.py
- add a --save CLI flag
- reject --save unless --stream is also present
- buffer streamed audio for later writing while keeping live playback unchanged
- preserve existing numbered output naming for segmented generation
- preserve existing join_audio, output_path, and file_prefix behavior when saving streamed output
- factor the final concatenated write path through a small helper
mlx_audio/tts/tests/test_generate.py
- add parser validation coverage for --save
- add regression coverage for streamed save output naming
- add regression coverage for multi-segment streamed output
- add regression coverage for output_path / file_prefix handling with streamed save
- confirm --stream without --save still does not write files
README.md
- add CLI examples for --stream, --stream --save, and --join_audio
- document default numbered output naming and the --join_audio behavior

Changes outside the codebase

None.

Additional information

The change is intentionally localized to mlx_audio/tts/generate.py; AudioPlayer remains playback-only.
Host-side smoke testing confirmed:
- streamed playback begins quickly
- streamed audio can be saved to disk
- output_path, file_prefix, and join_audio combinations behave correctly
- black --check passed on changed Python files
- isort --check-only passed on changed Python files
- pytest -s mlx_audio/tts/tests/test_generate.py passes fully
Local sandbox validation was partially limited because MLX runtime libraries are not available here (libmlx.so), but:
- git diff --check passed
- black --check passed on changed Python files
- isort --check-only passed on changed Python files
- the new pytest file reaches collection until the missing MLX runtime boundary

Checklist

Tests added/updated
Documentation updated

Why: - streaming playback skipped all disk writes, which blocked workflows that need live audio and a saved file from the same generation - stream-save should preserve the usual numbered or joined output naming so output_path and file_prefix remain predictable What: - add a --save flag for streamed TTS generation and reject it without --stream - buffer streamed audio per segment and write it with the existing naming and join_audio conventions - document the new CLI behavior and add targeted regression tests for segment naming and output_path/file_prefix handling Notes: - --stream alone still plays streamed audio without writing files - black and isort checks pass on the changed Python files - local pytest reaches collection but fails in this sandbox because libmlx.so is unavailable at runtime - user host smoke tests confirmed streamed playback plus saved output - host pytest exposed and corrected a missing segment_idx field in the test double used by the new regression tests Refs: n/a

mvdirty force-pushed the fix/tts-stream-save branch from 2bfbfdf to 1ff19f4 Compare March 27, 2026 15:08

Merge branch 'main' into fix/tts-stream-save

712de80

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(tts): allow streamed generation to save audio to disk#608

fix(tts): allow streamed generation to save audio to disk#608
mvdirty wants to merge 2 commits intoBlaizzy:mainfrom
mvdirty:fix/tts-stream-save

mvdirty commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

mvdirty commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Description

Changes in the codebase

Changes outside the codebase

Additional information

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mvdirty commented Mar 27, 2026 •

edited

Loading