demo_h264.mp4
Interactive console app that plays text-to-speech audio using Orpheus-3B. Can be used to "vocalize" your favorite chatbot's text responses.
When in "chat mode", uses a system prompt to elicit the model's special vocalizations like <laugh>, <sigh>, <gasp>, etc.
Requires setting up Orpheus model to be served locally (see below).
Core decoding logic adapted from orpheus-tts-local by isaiahbjork.
git clone [github repo clone url]
cd [repo name]
Init virtual environment, and activate (requires Python 3.13+). Eg:
`python -m venv venv`
`venv\Scripts\activate`
Install dependencies:
pip install -r requirements.txt
Install Pytorch with CUDA on top, if desired.
Download a quantized version of the finetuned version of the Orpheus-3B model. For example: here (Q8) or here (Q4).
Run an LLM server and select the Orpheus model, much like as you would when serving any LLM.
Example command using llama.cpp (LM Studio also works):
llama-server.exe -m path/to/Orpheus-3b-FT-Q8_0.gguf -ngl 99 -c 4096 --host 0.0.0.0
Required:
Edit orpheus_llm.url
to that of your LLM server's endpoint.
For llama-server, that would normally be http://127.0.0.1:8080/v1/completions.
Required for LLM chat functionality:
Update the properties of the chatbot_llm
object
The url
should be a chat/completions
-compatible endpoint (eg, OpenRouter service).
Populate either api_key
or api_key_environment_variable
as needed.
Lastly, the inner request_dict
object can be populated with properties which will get merged into the service request's JSON data (eg, "model", "temperature", etc).
python app.py
The chat LLM system prompt can be edited using system_prompt.txt
Reminder here that Orpheus model inference + SNAC decoding is not a lightweight task.
If you're having trouble acheiving stutter-free audio, try offloading Orpheus LLM inference duties to another machine on the local network.
Anecdotally, my dev system (Ryzen 7700 + 3080Ti) does the audio generation about 1.5x faster than real-time, using the Orpheus-3B Q8 model and running the LLM server on the same machine. On an M1 MacbookPro with the LLM server on a different machine.
There is a prompt-toolkit-related bug which manifests inconsistently on Mac Terminal that corrupts display. If you are afflicted by this, please leave details under Issues.
2025-04-30
- New command, "!redraw", redraws the display (useful if display is corrupted by unexpected console debug text, etc).
2025-04-23
- Orpheus gen refactor; updated colors; buffer underflow recovery logic
2025-04-21
- Syntax for setting voice has changed to: "!voice=tara", etc. This allows for arbitrary voice names when using custom Orpheus finetunes.
2025-04-20
- The sentence or phrase of the currently playing audio segment now highlights in realtime.
2025-04-13
- User settings now persist.
2025-04-11
- Can now save audio output to disk. Toggle with
!save
. This opens up some use cases.
2025-04-09
- TTS text now displays in sync with audio segment being played. Toggle with
!sync
.
2025-04-08
- Chat response now streams, allowing for audio generation to begin after the first several words are received.
- Web service layer for audio generation?
- Voice cloning (will have to wait for official support first)