|
| 1 | +--- |
| 2 | +slug: echokit-30-days-day-20-local-gpt-sovits |
| 3 | +title: "Day 20: Running GPT-SoVITS Locally as EchoKit’s TTS Provider | The First 30 Days with EchoKit" |
| 4 | +tags: [echokit30days, tts] |
| 5 | +--- |
| 6 | + |
| 7 | + |
| 8 | +Over the past few days, we’ve been switching EchoKit between different cloud-based TTS providers and voice styles. It’s fun, it’s flexible, and it really shows how modular the EchoKit pipeline is. |
| 9 | + |
| 10 | +But today, I want to go one step further. |
| 11 | + |
| 12 | +**Today is about running TTS fully locally.** |
| 13 | +No hosted APIs. No external requests. Just an open-source model running on your own machine — and EchoKit talking through it. |
| 14 | + |
| 15 | +For Day 20, I’m using **GPT-SoVITS** as EchoKit’s local TTS provider. |
| 16 | + |
| 17 | + |
| 18 | + |
| 19 | +## What Is GPT-SoVITS? |
| 20 | + |
| 21 | +**GPT-SoVITS** is an open-source text-to-speech and voice cloning system that combines: |
| 22 | + |
| 23 | +* A GPT-style text encoder for linguistic understanding |
| 24 | +* SoVITS-based voice synthesis for natural prosody and timbre |
| 25 | + |
| 26 | +Compared to traditional TTS systems, GPT-SoVITS stands out for two reasons. |
| 27 | + |
| 28 | +First, it produces **very natural, expressive speech**, especially for longer sentences and conversational content. |
| 29 | + |
| 30 | +Second, it supports **high-quality voice cloning** with relatively small reference audio, which has made it popular in open-source voice communities. |
| 31 | + |
| 32 | +Most importantly for us: |
| 33 | +**GPT-SoVITS can run entirely on your own hardware.** |
| 34 | + |
| 35 | + |
| 36 | + |
| 37 | +## Running GPT-SoVITS Locally |
| 38 | + |
| 39 | +To make local GPT-SoVITS easier to run, we also ported GPT-SoVITS to a **Rust-based implementation**. |
| 40 | + |
| 41 | +This significantly simplifies local deployment and makes it much easier to integrate with EchoKit. |
| 42 | + |
| 43 | +> Check out [Build and run a GPT-SoVITS server](https://echokit.dev/docs/server/gpt-sovits) for details. The following steps are on a MacBook |
| 44 | +
|
| 45 | +First, install the LibTorch dependencies: |
| 46 | + |
| 47 | +```bash |
| 48 | +curl -LO https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.4.0.zip |
| 49 | +unzip libtorch-macos-arm64-2.4.0.zip |
| 50 | +``` |
| 51 | + |
| 52 | +Then, tell the system where to find LibTorch: |
| 53 | + |
| 54 | +```bash |
| 55 | +export DYLD_LIBRARY_PATH=$(pwd)/libtorch/lib:$DYLD_LIBRARY_PATH |
| 56 | +export LIBTORCH=$(pwd)/libtorch |
| 57 | +``` |
| 58 | + |
| 59 | +Next, clone the source code and build the GPT-SoVITS API server: |
| 60 | + |
| 61 | +```bash |
| 62 | +git clone https://github.com/second-state/gsv_tts |
| 63 | +git clone https://github.com/second-state/gpt_sovits_rs |
| 64 | + |
| 65 | +cd gsv_tts |
| 66 | +cargo build --release |
| 67 | +``` |
| 68 | + |
| 69 | +Then, download the required models. |
| 70 | +Since I’m running GPT-SoVITS locally on my MacBook, I’m using the **CPU versions**: |
| 71 | + |
| 72 | +```bash |
| 73 | +cd resources |
| 74 | +curl -L -o t2s.pt https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/t2s.cpu.pt |
| 75 | +curl -L -o vits.pt https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/vits.cpu.pt |
| 76 | +curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/ssl_model.pt |
| 77 | +curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/bert_model.pt |
| 78 | +curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/g2pw_model.pt |
| 79 | +curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/mini-bart-g2p.pt |
| 80 | +``` |
| 81 | + |
| 82 | +Finally, start the GPT-SoVITS API server: |
| 83 | + |
| 84 | +```bash |
| 85 | +TTS_LISTEN=0.0.0.0:9094 nohup target/release/gsv_tts & |
| 86 | +``` |
| 87 | + |
| 88 | + |
| 89 | +## Configure EchoKit to Use the Local TTS Provider |
| 90 | + |
| 91 | +At this point, GPT-SoVITS is running as a local service and exposing a simple HTTP API. |
| 92 | + |
| 93 | +Once the service is up, EchoKit only needs an endpoint that accepts text and returns audio. |
| 94 | + |
| 95 | +Update the TTS section in the EchoKit server configuration: |
| 96 | + |
| 97 | +```toml |
| 98 | +[tts] |
| 99 | +platform = "StreamGSV" |
| 100 | +url = "http://localhost:9094/v1/audio/stream_speech" |
| 101 | +speaker = "cooper" |
| 102 | +``` |
| 103 | + |
| 104 | +Restart the EchoKit server, connect the service to the device, and EchoKit will start using the new local TTS provider. |
| 105 | + |
| 106 | +## A Fully Local Voice AI Pipeline |
| 107 | + |
| 108 | +With today’s setup, we can now run **the entire voice AI pipeline locally**: |
| 109 | + |
| 110 | +* **ASR**: local speech-to-text |
| 111 | +* **LLM**: local open-source language models |
| 112 | +* **TTS**: GPT-SoVITS running on your own machine |
| 113 | + |
| 114 | +That means: |
| 115 | + |
| 116 | +* No cloud dependency |
| 117 | +* No external APIs |
| 118 | +* No vendor lock-in |
| 119 | + |
| 120 | +Just a complete, end-to-end voice AI system you can understand, modify, and truly own. |
| 121 | + |
| 122 | +--- |
| 123 | + |
| 124 | +Want to get your own EchoKit device and make it unique? |
| 125 | + |
| 126 | +* [EchoKit Box](https://echokit.dev/echokit_box.html) |
| 127 | +* [EchoKit DIY](https://echokit.dev/echokit_diy.html) |
| 128 | + |
| 129 | +Join the [EchoKit Discord](https://discord.gg/Fwe3zsT5g3) to share your custom voices and see how others are personalizing their voice AI agents. |
0 commit comments