Skip to content

fix(tts): allow streamed generation to save audio to disk#608

Open
mvdirty wants to merge 2 commits intoBlaizzy:mainfrom
mvdirty:fix/tts-stream-save
Open

fix(tts): allow streamed generation to save audio to disk#608
mvdirty wants to merge 2 commits intoBlaizzy:mainfrom
mvdirty:fix/tts-stream-save

Conversation

@mvdirty
Copy link
Copy Markdown

@mvdirty mvdirty commented Mar 27, 2026

mvdirty note: This work was specified by me, and I am a human. The changes were implemented, documentation was updated, and this PR description was (except for this note) prepared by my clanker, mvagent. I have pretty deep development experience, but not in python, and this is me dipping my toes in just enough to solve a need. The changes seem pretty clean, and we have performed a good range of both manual and automated testing of them, but let me know if anything needs updating to be more idiomatic of either python or the mlx-audio repo style.


Context

mlx_audio.tts.generate supports normal file output and also supports low-latency streaming playback with --stream, but the streaming path skipped file writes entirely. That made it hard to use mlx-audio in workflows that need both immediate listening and a saved file for later playback or downstream processing.

This change adds an explicit --save flag for streaming mode so users can opt into both behaviors at once without changing the existing default behavior of --stream.

Description

The TTS CLI save/stream behavior is controlled in mlx_audio/tts/generate.py.

Previously:

  • non-streaming generation wrote files to disk
  • --stream queued audio to the live player
  • the streaming path bypassed the normal output-writing logic

This change introduces --save for use with --stream and preserves the usual mlx-audio file naming and output handling:

  • --stream still streams audio during generation and does not write files by default
  • --stream --save writes streamed output to disk after generation completes
  • --stream --save preserves the usual numbered segment naming such as audio_000.wav
  • --stream --save --join_audio writes one combined file such as audio.wav
  • output_path and file_prefix are preserved in the streamed save path
  • --save without --stream is rejected with a parser error

Changes in the codebase

  • mlx_audio/tts/generate.py

    • add a --save CLI flag
    • reject --save unless --stream is also present
    • buffer streamed audio for later writing while keeping live playback unchanged
    • preserve existing numbered output naming for segmented generation
    • preserve existing join_audio, output_path, and file_prefix behavior when saving streamed output
    • factor the final concatenated write path through a small helper
  • mlx_audio/tts/tests/test_generate.py

    • add parser validation coverage for --save
    • add regression coverage for streamed save output naming
    • add regression coverage for multi-segment streamed output
    • add regression coverage for output_path / file_prefix handling with streamed save
    • confirm --stream without --save still does not write files
  • README.md

    • add CLI examples for --stream, --stream --save, and --join_audio
    • document default numbered output naming and the --join_audio behavior

Changes outside the codebase

None.

Additional information

  • The change is intentionally localized to mlx_audio/tts/generate.py; AudioPlayer remains playback-only.
  • Host-side smoke testing confirmed:
    • streamed playback begins quickly
    • streamed audio can be saved to disk
    • output_path, file_prefix, and join_audio combinations behave correctly
    • black --check passed on changed Python files
    • isort --check-only passed on changed Python files
    • pytest -s mlx_audio/tts/tests/test_generate.py passes fully
  • Local sandbox validation was partially limited because MLX runtime libraries are not available here (libmlx.so), but:
    • git diff --check passed
    • black --check passed on changed Python files
    • isort --check-only passed on changed Python files
    • the new pytest file reaches collection until the missing MLX runtime boundary

Checklist

  • Tests added/updated
  • Documentation updated

Why:
- streaming playback skipped all disk writes, which blocked workflows
  that need live audio and a saved file from the same generation
- stream-save should preserve the usual numbered or joined output
  naming so output_path and file_prefix remain predictable

What:
- add a --save flag for streamed TTS generation and reject it
  without --stream
- buffer streamed audio per segment and write it with the existing
  naming and join_audio conventions
- document the new CLI behavior and add targeted regression tests
  for segment naming and output_path/file_prefix handling

Notes:
- --stream alone still plays streamed audio without writing files
- black and isort checks pass on the changed Python files
- local pytest reaches collection but fails in this sandbox because
  libmlx.so is unavailable at runtime
- user host smoke tests confirmed streamed playback plus saved output
- host pytest exposed and corrected a missing segment_idx field in the
  test double used by the new regression tests

Refs: n/a
@mvdirty mvdirty force-pushed the fix/tts-stream-save branch from 2bfbfdf to 1ff19f4 Compare March 27, 2026 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant