Support for OuteTTS 1.0 #12794

edwko · 2025-04-07T09:54:32Z

Since v1.0 has simplified processing, this implementation provides full feature support.

Changes and Features

JSON Speaker Loading:
- Added support for the new JSON speaker format, which includes an interface version.
- OuteTTS 1.0 is supported using interface version 3.
Text Chunking for Long Inputs:
- Enables processing of very long input texts by splitting them.
- Splitting respects minimum and maximum word boundaries (min = 10, max = 30).
- Supports multilingual text.
- Can be disabled via --tts-no-text-chunking (default: enabled).
Text Preprocessing & Prompt:
- While optional, a light cleanup and normalization step is included to improve output quality.
- Added new required prompt handling for the v1.0
Code Organization:
- Implementation is located in: tts-outetts-v1.cpp.
- A default speaker is added in a header file as JSON default_speaker.h.

TODO / Help Needed

DAC (Descript Audio Codec) Integration:
- The decoder layers from DAC need to be implemented:
  descript-audio-codec/dac/model/dac.py
- Model used:
  weights_24khz_1.5kbps_v1.0.pth
- DAC is supported by the transformers library and can be converted to safetensors, which might help implementation.
  Also, see this PR I submitted to fix a dependency issue in the conversion script for compatibility with newer PyTorch versions:
  transformers PR #36393
- Requesting assistance from @ngxson and @ggerganov for implementing this part.

Example Commands

Default generation uses default speaker automatically and chunked text:

build/bin/llama-tts-outetts-v1 -m "path/to/model.gguf" -p "A very very long text"

Disables chunked text:

build/bin/llama-tts-outetts-v1 -m "path/to/model.gguf" -p "Hello, how are you doing?" --tts-no-text-chunking

With custom speaker file:

build/bin/llama-tts-outetts-v1 -m "path/to/model.gguf" -p "A very very long text" --tts-speaker-file "path/to/speaker.json"

ngxson · 2025-04-07T11:39:11Z

The decoder layers from DAC need to be implemented

FYI, currently we're missing Snake1d which should be implemented via #12487

ggerganov · 2025-04-22T12:52:19Z

Does DAC replace WavTokenizer?

edwko · 2025-04-22T17:22:44Z

Does DAC replace WavTokenizer?

Yes, since this model is multilingual, DAC is a better fit for reconstructing audio across languages.

Horschig · 2025-04-25T09:15:19Z

It would be really great if this would get merged. However I was wondering whether it'd also be possible to add mulitlingual support to llama-server?

foldl · 2025-05-19T06:51:19Z

FYI: OuteTTS 1.0 is supported by chatllm.cpp. You can find DAC & SNAC implementation there.

edwko added 2 commits April 7, 2025 09:29

OuteTTS 1.0 support

f420015

Revert tts.cpp

38126e9

edwko marked this pull request as draft April 7, 2025 09:55

github-actions bot added examples python python script changes labels Apr 7, 2025

Som-anon mentioned this pull request Apr 7, 2025

Eval bug: OuteTTS 0.3 and up crashes #12807

Closed

jhen0409 mentioned this pull request Jun 19, 2025

OuteTTS - isVocoderEnable is false mybigday/llama.rn#152

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support for OuteTTS 1.0 #12794

Support for OuteTTS 1.0 #12794

edwko commented Apr 7, 2025 •

edited

Loading

Uh oh!

ngxson commented Apr 7, 2025

Uh oh!

ggerganov commented Apr 22, 2025

Uh oh!

edwko commented Apr 22, 2025

Uh oh!

Horschig commented Apr 25, 2025

Uh oh!

foldl commented May 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Support for OuteTTS 1.0 #12794

Are you sure you want to change the base?

Support for OuteTTS 1.0 #12794

Conversation

edwko commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes and Features

TODO / Help Needed

Example Commands

Uh oh!

ngxson commented Apr 7, 2025

Uh oh!

ggerganov commented Apr 22, 2025

Uh oh!

edwko commented Apr 22, 2025

Uh oh!

Horschig commented Apr 25, 2025

Uh oh!

foldl commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

edwko commented Apr 7, 2025 •

edited

Loading

foldl commented May 19, 2025 •

edited

Loading