|
| 1 | +# ❓ FAQ |
| 2 | + |
| 3 | +--- |
| 4 | + |
| 5 | +## ⚙️ General |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +### Why use Coqui XTTS as the TTS model? |
| 10 | + |
| 11 | +Three main reasons: |
| 12 | + |
| 13 | +- **Custom Voices**. |
| 14 | + |
| 15 | + You can use any voice you want: your own, Cyn, Vito Corleone, Justin Timberlake — whatever. |
| 16 | +I believe that’s way more flexible than static, pre-generated voices. |
| 17 | + |
| 18 | +- **License**. |
| 19 | + |
| 20 | + Some TTS models have extremely restrictive licenses. For example, Silero TTS is under CC BY-NC-SA 4.0, which means: |
| 21 | + - You can’t use it commercially. |
| 22 | + - Even worse: I’d be forced to release this entire framework under the same license. |
| 23 | +That’s a no-go. |
| 24 | + |
| 25 | +- **Simplicity**. |
| 26 | + |
| 27 | + The Python server is **just ~100 lines of code**. |
| 28 | +It supports 10+ languages out of the box. |
| 29 | +You send a UnityWebRequest with text — it sends back .wav. |
| 30 | +That’s it. No ONNX, no weird tensors, no converting text into token IDs, etc. |
| 31 | + |
| 32 | + |
| 33 | +But if you hate Python or Coqui XTTS, you can integrate your favorite model. |
| 34 | +Modularity is the point. I won’t stop you 🤠 |
| 35 | + |
| 36 | +--- |
| 37 | + |
| 38 | +## 🐍 Python |
| 39 | + |
| 40 | +--- |
| 41 | + |
| 42 | +### Why use Python for TTS? |
| 43 | + |
| 44 | +I didn’t really have a choice. |
| 45 | + |
| 46 | +**Coqui XTTS only offers a Python API**, and there are no stable C#, C++, C, TypeScript, JavaScript or Go bindings. |
| 47 | +If there had been a solid alternative in another language, I would’ve gladly used it — but this is what we’ve got. |
| 48 | + |
| 49 | +--- |
| 50 | + |
| 51 | +### Why is Python.exe using so much RAM? |
| 52 | +Because it’s running a full neural TTS model — locally — in Python. |
| 53 | +That’s the price you pay for realistic voices and multilingual support. |
| 54 | +If it bothers you — try switching to smaller models or optimize the server logic yourself. |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## ❗ Most Important Question |
| 59 | + |
| 60 | +--- |
| 61 | + |
| 62 | +### Why is it so hard to configure and get started? |
| 63 | + |
| 64 | +A totally fair question — especially after you’ve gone through all the setup steps and seen how many pieces are involved. |
| 65 | + |
| 66 | +Here’s the truth: |
| 67 | +> UnityNeuroSpeech is the first — not just Unity, but game development framework — that allows you to talk to AI directly inside your game. |
| 68 | +
|
| 69 | +--- |
| 70 | + |
| 71 | +The only out-of-the-box solution for Unity in this whole stack is [whisper.unity](https://github.com/Macoron/whisper.unity). |
| 72 | +And even then — you still need to create a separate `SetupWhisperPath.cs` script to make it work properly in builds. Yes, you can try using **StreamingAssets**, and whisper.unity supports that. |
| 73 | +But I personally prefer not to scatter files across multiple Unity folders. |
| 74 | +If that’s critical for your project — feel free to try it with StreamingAssets. |
| 75 | + |
| 76 | +--- |
| 77 | + |
| 78 | +Now let’s talk about Text-to-Speech... |
| 79 | + |
| 80 | +Frankly, there are no really “clean” or “easy” TTS solutions — not just for Unity, **but even for C# as a whole**. |
| 81 | +(If you’re curious about the deep rabbit hole of TTS licensing and tech, check out the [Python Server](server.md) page.) |
| 82 | + |
| 83 | +--- |
| 84 | + |
| 85 | +As for **Ollama**, I integrated it into Unity myself — using Microsoft’s `Microsoft.Extensions` libraries. |
| 86 | + |
| 87 | +--- |
| 88 | + |
| 89 | +Yes — it does feel like a lot at first. |
| 90 | + |
| 91 | +But don’t worry — **UnityNeuroSpeech is actively being improved**, and with every update, the setup process will become simpler and more automated. |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +And maybe one day, this kind of tech will be built into Unity out of the box. |
| 96 | +You’ll click a button, and boom — talking AI agents everywhere. |
| 97 | + |
| 98 | +But when that day comes... **it won’t be unique anymore.** |
| 99 | + |
| 100 | +So if you want to stand out with a game that features **real, emotional voice interaction powered by AI** — now’s your chance. |
| 101 | +Go for it 😎 |
0 commit comments