You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[](https://huggingface.co/hexgrad/Kokoro-82M/tree/c3b0d86e2a980e027ef71c28819ea02e351c2667)
9
9
10
10
Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) text-to-speech model
@@ -14,8 +14,7 @@ Dockerized FastAPI wrapper for [Kokoro-82M](https://huggingface.co/hexgrad/Kokor
14
14
- automatic chunking/stitching for long texts
15
15
- simple audio generation web ui utility
16
16
17
-
<detailsopen>
18
-
<summary><b>Quick Start</b></summary>
17
+
## Quick Start
19
18
20
19
The service can be accessed through either the API endpoints or the Gradio web interface.
21
20
@@ -48,9 +47,10 @@ The service can be accessed through either the API endpoints or the Gradio web i
Access the interactive web UI at http://localhost:7860 after starting the service. Features include:
135
154
- Voice/format/speed selection
@@ -141,9 +160,9 @@ If you only want the API, just comment out everything in the docker-compose.yml
141
160
Currently, voices created via the API are accessible here, but voice combination/creation has not yet been added
142
161
</details>
143
162
144
-
163
+
## Processing Details
145
164
<details>
146
-
<summary><b>Performance Benchmarks</b></summary>
165
+
<summary>Performance Benchmarks</summary>
147
166
148
167
Benchmarking was performed on generation via the local API using text lengths up to feature-length books (~1.5 hours output), measuring processing time and realtime factor. Tests were run on:
149
168
- Windows 11 Home w/ WSL2
@@ -163,7 +182,7 @@ Key Performance Metrics:
163
182
- Average Processing Rate: 137.67 tokens/second (cl100k_base)
164
183
</details>
165
184
<details>
166
-
<summary><b>GPU Vs. CPU<b></summary>
185
+
<summary>GPU Vs. CPU</summary>
167
186
168
187
```bash
169
188
# GPU: Requires NVIDIA GPU with CUDA 12.1 support
@@ -172,35 +191,29 @@ docker compose up --build
172
191
# CPU: ~10x slower than GPU inference
173
192
docker compose -f docker-compose.cpu.yml up --build
174
193
```
175
-
</details>
176
-
<details>
177
-
<summary><b>Features</b></summary>
178
194
179
-
- OpenAI-compatible API endpoints (with optional Gradio Web UI)
- Automatically splits and stitches at sentence boundaries to reduce artifacts and maintain performacne
184
-
- Voice Combination:
185
-
- Averages model weights of any existing voicepacks
186
-
- Saves generated voicepacks for future use
195
+
*Note: CPU Inference is currently a very basic implementation, and not heavily tested*
187
196
197
+
</details>
188
198
199
+
<details>
200
+
<summary>Natural Boundary Detection</summary>
189
201
190
-
*Note: CPU Inference is currently a very basic implementation, and not heavily tested*
202
+
- Automatically splits and stitches at sentence boundaries
203
+
- Helps to reduce artifacts and allow long form processing as the base model is only currently configured for approximately 30s output
191
204
</details>
192
205
206
+
## Model and License
207
+
193
208
<details open>
194
-
<summary><b>Model</b></summary>
209
+
<summary>Model</summary>
195
210
196
211
This API uses the [Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M) model from HuggingFace.
197
212
198
213
Visit the model page for more details about training, architecture, and capabilities. I have no affiliation with any of their work, and produced this wrapper for ease of use and personal projects.
199
214
</details>
200
-
201
215
<details>
202
-
<summary><b>License</b></summary>
203
-
216
+
<summary>License</summary>
204
217
This project is licensed under the Apache License 2.0 - see below for details:
205
218
206
219
- The Kokoro model weights are licensed under Apache 2.0 (see [model page](https://huggingface.co/hexgrad/Kokoro-82M))
@@ -209,3 +222,6 @@ This project is licensed under the Apache License 2.0 - see below for details:
209
222
210
223
The full Apache 2.0 license text can be found at: https://www.apache.org/licenses/LICENSE-2.0
0 commit comments