tts : fix n_ubatch + make WavTokenizer cache-less #13713

ggerganov · 2025-05-22T18:56:20Z

Set the WavTokenizer n_ubatch == n_batch since it is computing embeddings
Use llama_encode() instead of llama_decode() with the WavTokenizer context
The WavTokenizer does not need a KV cache, so we now no longer create it. This saves about ~800 MB of memory.

ggml-ci

ggerganov mentioned this pull request May 22, 2025

Eval bug: llama-tts fails (abort) with longer lines #13712

Closed

tts : fix n_ubatch + make WavTokenizer cache-less

108d484

ggml-ci

ggerganov force-pushed the gg/tts-fix-ubatch branch from b814fb7 to 108d484 Compare May 22, 2025 18:58

github-actions bot added the examples label May 22, 2025

ggerganov merged commit 8a1d206 into master May 22, 2025
51 of 53 checks passed

Provide feedback