Skip to content

zeropointnine/tts-toy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

demo_h264.mp4

Description

Interactive console app that plays text-to-speech audio using Orpheus-3B. Can be used to "vocalize" your favorite chatbot's text responses.

When in "chat mode", uses a system prompt to elicit the model's special vocalizations like <laugh>, <sigh>, <gasp>, etc.

Requires setting up Orpheus model to be served locally (see below).

Core decoding logic adapted from orpheus-tts-local by isaiahbjork.

Setup

1. Install Project

git clone [github repo clone url]
cd [repo name]

Init virtual environment, and activate (requires Python 3.13+). Eg:

`python -m venv venv`
`venv\Scripts\activate`

Install dependencies:

pip install -r requirements.txt

Install Pytorch with CUDA on top, if desired.

2. Set up local LLM server with the Orpheus model

Download a quantized version of the finetuned version of the Orpheus-3B model. For example: here (Q8) or here (Q4).

Run an LLM server and select the Orpheus model, much like as you would when serving any LLM.

Example command using llama.cpp (LM Studio also works):

llama-server.exe -m path/to/Orpheus-3b-FT-Q8_0.gguf -ngl 99 -c 4096 --host 0.0.0.0

3. Edit config.json

Required:

Edit orpheus_llm.url to that of your LLM server's endpoint.

For llama-server, that would normally be http://127.0.0.1:8080/v1/completions.

Required for LLM chat functionality:

Update the properties of the chatbot_llm object

The url should be a chat/completions-compatible endpoint (eg, OpenRouter service).

Populate either api_key or api_key_environment_variable as needed.

Lastly, the inner request_dict object can be populated with properties which will get merged into the service request's JSON data (eg, "model", "temperature", etc).

4. Run

python app.py

Usage notes

The chat LLM system prompt can be edited using system_prompt.txt

Performance notes

Reminder here that Orpheus model inference + SNAC decoding is not a lightweight task.

If you're having trouble acheiving stutter-free audio, try offloading Orpheus LLM inference duties to another machine on the local network.

Anecdotally, my dev system (Ryzen 7700 + 3080Ti) does the audio generation about 1.5x faster than real-time, using the Orpheus-3B Q8 model and running the LLM server on the same machine. On an M1 MacbookPro with the LLM server on a different machine.

Known issues

There is a prompt-toolkit-related bug which manifests inconsistently on Mac Terminal that corrupts display. If you are afflicted by this, please leave details under Issues.

Updates

2025-04-30

  • New command, "!redraw", redraws the display (useful if display is corrupted by unexpected console debug text, etc).

2025-04-23

  • Orpheus gen refactor; updated colors; buffer underflow recovery logic

2025-04-21

  • Syntax for setting voice has changed to: "!voice=tara", etc. This allows for arbitrary voice names when using custom Orpheus finetunes.

2025-04-20

  • The sentence or phrase of the currently playing audio segment now highlights in realtime.

2025-04-13

  • User settings now persist.

2025-04-11

  • Can now save audio output to disk. Toggle with !save. This opens up some use cases.

2025-04-09

  • TTS text now displays in sync with audio segment being played. Toggle with !sync.

2025-04-08

  • Chat response now streams, allowing for audio generation to begin after the first several words are received.

Todo

  • Web service layer for audio generation?
  • Voice cloning (will have to wait for official support first)

About

Chatbot-to-speech using Orpheus TTS model. Interactive console app.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages