Skip to content

Commit 7bf50b6

Browse files
authored
Merge pull request #121 from second-state/alabulei1-patch-11
Document local GPT-SoVITS setup for EchoKit
2 parents 098d7dd + aa09e3e commit 7bf50b6

File tree

1 file changed

+129
-0
lines changed

1 file changed

+129
-0
lines changed
Lines changed: 129 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
---
2+
slug: echokit-30-days-day-20-local-gpt-sovits
3+
title: "Day 20: Running GPT-SoVITS Locally as EchoKit’s TTS Provider | The First 30 Days with EchoKit"
4+
tags: [echokit30days, tts]
5+
---
6+
7+
8+
Over the past few days, we’ve been switching EchoKit between different cloud-based TTS providers and voice styles. It’s fun, it’s flexible, and it really shows how modular the EchoKit pipeline is.
9+
10+
But today, I want to go one step further.
11+
12+
**Today is about running TTS fully locally.**
13+
No hosted APIs. No external requests. Just an open-source model running on your own machine — and EchoKit talking through it.
14+
15+
For Day 20, I’m using **GPT-SoVITS** as EchoKit’s local TTS provider.
16+
17+
18+
19+
## What Is GPT-SoVITS?
20+
21+
**GPT-SoVITS** is an open-source text-to-speech and voice cloning system that combines:
22+
23+
* A GPT-style text encoder for linguistic understanding
24+
* SoVITS-based voice synthesis for natural prosody and timbre
25+
26+
Compared to traditional TTS systems, GPT-SoVITS stands out for two reasons.
27+
28+
First, it produces **very natural, expressive speech**, especially for longer sentences and conversational content.
29+
30+
Second, it supports **high-quality voice cloning** with relatively small reference audio, which has made it popular in open-source voice communities.
31+
32+
Most importantly for us:
33+
**GPT-SoVITS can run entirely on your own hardware.**
34+
35+
36+
37+
## Running GPT-SoVITS Locally
38+
39+
To make local GPT-SoVITS easier to run, we also ported GPT-SoVITS to a **Rust-based implementation**.
40+
41+
This significantly simplifies local deployment and makes it much easier to integrate with EchoKit.
42+
43+
> Check out [Build and run a GPT-SoVITS server](https://echokit.dev/docs/server/gpt-sovits) for details. The following steps are on a MacBook
44+
45+
First, install the LibTorch dependencies:
46+
47+
```bash
48+
curl -LO https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.4.0.zip
49+
unzip libtorch-macos-arm64-2.4.0.zip
50+
```
51+
52+
Then, tell the system where to find LibTorch:
53+
54+
```bash
55+
export DYLD_LIBRARY_PATH=$(pwd)/libtorch/lib:$DYLD_LIBRARY_PATH
56+
export LIBTORCH=$(pwd)/libtorch
57+
```
58+
59+
Next, clone the source code and build the GPT-SoVITS API server:
60+
61+
```bash
62+
git clone https://github.com/second-state/gsv_tts
63+
git clone https://github.com/second-state/gpt_sovits_rs
64+
65+
cd gsv_tts
66+
cargo build --release
67+
```
68+
69+
Then, download the required models.
70+
Since I’m running GPT-SoVITS locally on my MacBook, I’m using the **CPU versions**:
71+
72+
```bash
73+
cd resources
74+
curl -L -o t2s.pt https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/t2s.cpu.pt
75+
curl -L -o vits.pt https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/vits.cpu.pt
76+
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/ssl_model.pt
77+
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/bert_model.pt
78+
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/g2pw_model.pt
79+
curl -LO https://huggingface.co/L-jasmine/GPT_Sovits/resolve/main/v2pro/mini-bart-g2p.pt
80+
```
81+
82+
Finally, start the GPT-SoVITS API server:
83+
84+
```bash
85+
TTS_LISTEN=0.0.0.0:9094 nohup target/release/gsv_tts &
86+
```
87+
88+
89+
## Configure EchoKit to Use the Local TTS Provider
90+
91+
At this point, GPT-SoVITS is running as a local service and exposing a simple HTTP API.
92+
93+
Once the service is up, EchoKit only needs an endpoint that accepts text and returns audio.
94+
95+
Update the TTS section in the EchoKit server configuration:
96+
97+
```toml
98+
[tts]
99+
platform = "StreamGSV"
100+
url = "http://localhost:9094/v1/audio/stream_speech"
101+
speaker = "cooper"
102+
```
103+
104+
Restart the EchoKit server, connect the service to the device, and EchoKit will start using the new local TTS provider.
105+
106+
## A Fully Local Voice AI Pipeline
107+
108+
With today’s setup, we can now run **the entire voice AI pipeline locally**:
109+
110+
* **ASR**: local speech-to-text
111+
* **LLM**: local open-source language models
112+
* **TTS**: GPT-SoVITS running on your own machine
113+
114+
That means:
115+
116+
* No cloud dependency
117+
* No external APIs
118+
* No vendor lock-in
119+
120+
Just a complete, end-to-end voice AI system you can understand, modify, and truly own.
121+
122+
---
123+
124+
Want to get your own EchoKit device and make it unique?
125+
126+
* [EchoKit Box](https://echokit.dev/echokit_box.html)
127+
* [EchoKit DIY](https://echokit.dev/echokit_diy.html)
128+
129+
Join the [EchoKit Discord](https://discord.gg/Fwe3zsT5g3) to share your custom voices and see how others are personalizing their voice AI agents.

0 commit comments

Comments
 (0)