fix(tts): allow streamed generation to save audio to disk#608
Open
mvdirty wants to merge 2 commits intoBlaizzy:mainfrom
Open
fix(tts): allow streamed generation to save audio to disk#608mvdirty wants to merge 2 commits intoBlaizzy:mainfrom
mvdirty wants to merge 2 commits intoBlaizzy:mainfrom
Conversation
Why: - streaming playback skipped all disk writes, which blocked workflows that need live audio and a saved file from the same generation - stream-save should preserve the usual numbered or joined output naming so output_path and file_prefix remain predictable What: - add a --save flag for streamed TTS generation and reject it without --stream - buffer streamed audio per segment and write it with the existing naming and join_audio conventions - document the new CLI behavior and add targeted regression tests for segment naming and output_path/file_prefix handling Notes: - --stream alone still plays streamed audio without writing files - black and isort checks pass on the changed Python files - local pytest reaches collection but fails in this sandbox because libmlx.so is unavailable at runtime - user host smoke tests confirmed streamed playback plus saved output - host pytest exposed and corrected a missing segment_idx field in the test double used by the new regression tests Refs: n/a
2bfbfdf to
1ff19f4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
mvdirty note: This work was specified by me, and I am a human. The changes were implemented, documentation was updated, and this PR description was (except for this note) prepared by my clanker, mvagent. I have pretty deep development experience, but not in python, and this is me dipping my toes in just enough to solve a need. The changes seem pretty clean, and we have performed a good range of both manual and automated testing of them, but let me know if anything needs updating to be more idiomatic of either python or the mlx-audio repo style.
Context
mlx_audio.tts.generatesupports normal file output and also supports low-latency streaming playback with--stream, but the streaming path skipped file writes entirely. That made it hard to use mlx-audio in workflows that need both immediate listening and a saved file for later playback or downstream processing.This change adds an explicit
--saveflag for streaming mode so users can opt into both behaviors at once without changing the existing default behavior of--stream.Description
The TTS CLI save/stream behavior is controlled in
mlx_audio/tts/generate.py.Previously:
--streamqueued audio to the live playerThis change introduces
--savefor use with--streamand preserves the usual mlx-audio file naming and output handling:--streamstill streams audio during generation and does not write files by default--stream --savewrites streamed output to disk after generation completes--stream --savepreserves the usual numbered segment naming such asaudio_000.wav--stream --save --join_audiowrites one combined file such asaudio.wavoutput_pathandfile_prefixare preserved in the streamed save path--savewithout--streamis rejected with a parser errorChanges in the codebase
mlx_audio/tts/generate.py--saveCLI flag--saveunless--streamis also presentjoin_audio,output_path, andfile_prefixbehavior when saving streamed outputmlx_audio/tts/tests/test_generate.py--saveoutput_path/file_prefixhandling with streamed save--streamwithout--savestill does not write filesREADME.md--stream,--stream --save, and--join_audio--join_audiobehaviorChanges outside the codebase
None.
Additional information
mlx_audio/tts/generate.py;AudioPlayerremains playback-only.output_path,file_prefix, andjoin_audiocombinations behave correctlyblack --checkpassed on changed Python filesisort --check-onlypassed on changed Python fileslibmlx.so), but:git diff --checkpassedblack --checkpassed on changed Python filesisort --check-onlypassed on changed Python filesChecklist