Fix Qwen3-TTS streaming memory leak. by orbitalquark · Pull Request #585 · Blaizzy/mlx-audio

orbitalquark · 2026-03-17T03:00:10Z

When running the mlx-server and sending stream-able "/v1/audio/speech" requests with the Qwen3-TTS model, memory usage would continue to grow by 3-4GB per request until being culled after about 10GB. Then the cycle would repeat.

The Qwen3-TTS streaming model does a good job keeping memory usage down during the stream, but it fails to do one final mx.clear_cache() after yielding the last streaming chunk.

This PR fixes the leak.

lucasnewman

Thanks!

Fix Qwen3-TTS streaming memory leak.

7199bb0

lucasnewman approved these changes Mar 17, 2026

View reviewed changes

lucasnewman merged commit 8083120 into Blaizzy:main Mar 17, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Qwen3-TTS streaming memory leak.#585

Fix Qwen3-TTS streaming memory leak.#585
lucasnewman merged 1 commit intoBlaizzy:mainfrom
orbitalquark:fix-qwen3-tts-streaming-memory-leak

orbitalquark commented Mar 17, 2026

Uh oh!

lucasnewman left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

orbitalquark commented Mar 17, 2026

Uh oh!

lucasnewman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants