|
| 1 | +--- |
| 2 | +title: "Making Synthetic Voices From Scratch" |
| 3 | +excerpt: "Creating a voice for a text-to-speech system usually requires a real person to spend hours recording audio. That’s expensive, time-consuming, and in many languages or accents, the voices just don’t exist at all" |
| 4 | +date: "2025-06-26T00:00:00.000Z" |
| 5 | +author: |
| 6 | + name: JarbasAl |
| 7 | + picture: "https://avatars.githubusercontent.com/u/33701864" |
| 8 | +coverImage: "https://github.com/OpenVoiceOS/ovos_assets/blob/master/tts.png?raw=true" |
| 9 | +ogImage: |
| 10 | + url: "https://github.com/OpenVoiceOS/ovos_assets/blob/master/tts.png?raw=true" |
| 11 | +--- |
| 12 | + |
| 13 | +## Making Synthetic Voices From Scratch |
| 14 | + |
| 15 | +### What’s the problem? |
| 16 | + |
| 17 | +Creating a voice for a text-to-speech (TTS) system usually requires a real person to spend hours recording audio. That’s expensive, time-consuming, and in many languages or accents, the voices just don’t exist at all, especially for open-source or offline use. |
| 18 | + |
| 19 | +### What did we do? |
| 20 | + |
| 21 | +We developed a technique that allows us to **create synthetic voices completely from scratch**, even if we don’t have recordings from a real person. These voices: |
| 22 | + |
| 23 | +* Work **offline**, even on small devices like a Raspberry Pi, |
| 24 | +* Can speak any language, if there’s a good donor system available, |
| 25 | +* Are fully customizable in sound and tone. |
| 26 | + |
| 27 | +### How does it work? |
| 28 | + |
| 29 | +1. **Start with an existing voice** - We use an existing TTS voice (from any source) to generate lots of fake speech and text pairs. |
| 30 | + |
| 31 | +2. **Transform it into a new voice** - We apply a special voice conversion process to change the sound of the voice to something new, like a different gender, age, or accent. |
| 32 | + |
| 33 | +3. **Train a compact model** - With this synthetic data, we train a new voice model that sounds natural, speaks fluently, and runs entirely offline. |
| 34 | + |
| 35 | +### Why is this special? |
| 36 | + |
| 37 | +* We can create a new voice **without needing anyone to record lines**. |
| 38 | +* The voices don’t rely on cloud services, they work **100% offline**. |
| 39 | +* Each voice can be **customized** to sound unique or to match a character, personality, or accent. |
| 40 | + |
| 41 | +### What about ethics? |
| 42 | + |
| 43 | +We take voice rights seriously. |
| 44 | + |
| 45 | +* If we’re using a real person’s voice, we always get **clear permission**. |
| 46 | +* If no permission is available, we use **public domain recordings** or create **original voices** that don’t copy anyone. |
| 47 | +* Our process actually makes the voice **less recognizable**, which helps protect privacy and avoid impersonation risks. |
| 48 | + |
| 49 | +### Real-world example |
| 50 | + |
| 51 | +We applied this method to **European Portuguese**, a language that had no good offline voice options. In a short time, we built **4 brand-new, high-quality voices**, no recordings needed, and they all run on small local devices. |
| 52 | + |
| 53 | +> 💡 Did we mention OpenVoiceOS now has a huggingface account? find all our TTS voices and more at [huggingface.co/OpenVoiceOS](https://huggingface.co/OpenVoiceOS) |
| 54 | +
|
| 55 | +--- |
| 56 | + |
| 57 | +### 🧠 In short: |
| 58 | + |
| 59 | +> We’ve found a way to build natural-sounding, offline-ready synthetic voices, **without needing a real speaker**. It’s fast, ethical, and opens the door for more voices in more languages, for everyone. |
| 60 | +
|
| 61 | +--- |
| 62 | + |
| 63 | +## Help Us Build Voice for Everyone |
| 64 | + |
| 65 | +If you believe that voice assistants should be open, inclusive, and user-controlled, we invite you to support OVOS: |
| 66 | + |
| 67 | +- **💸 Donate**: Your contributions help us pay for infrastructure, development, and legal protections. |
| 68 | + |
| 69 | +- **📣 Contribute Open Data**: Speech models need diverse, high-quality data. If you can share voice samples, transcripts, or datasets under open licenses, let's collaborate. |
| 70 | + |
| 71 | +- **🌍 Help Translate**: OVOS is global by nature. Translators make our platform accessible to more communities every day. |
| 72 | + |
| 73 | +We're not building this for profit. We're building it for people. And with your help, we can ensure open voice has a future—transparent, private, and community-owned. |
| 74 | + |
| 75 | +👉 [Support the project here](https://www.openvoiceos.org/contribution) |
| 76 | + |
0 commit comments