This guide covers how to configure text-to-speech models using the Editor UI provided by Unity-Sherpa-ONNX.
Project Settings > Sherpa-ONNX > TTS
The TTS settings window has three areas:
- Import section — download and extract model archives by URL
- Profile list — manage multiple TTS profiles
- Profile detail — configure model paths, parameters, and deployment options
| Type | Description |
|---|---|
| Vits | VITS-based models (including Piper voices) |
| Matcha | MatchaTTS acoustic model + vocoder |
| Kokoro | Kokoro multi-voice model |
| Kitten | Kitten TTS model |
| ZipVoice | Zipformer-based voice synthesis |
| PocketTTS multi-component model |
Model type is auto-detected from the archive name during import:
| Archive name prefix / keyword | Detected type |
|---|---|
vits-* |
Vits |
matcha-* |
Matcha |
kokoro-* |
Kokoro |
*kitten* |
Kitten |
*zipformer*, *zip-voice*, *zipvoice* |
ZipVoice |
*pocket* |
- Click Import from URL to expand the import section
- Paste the model archive URL (
.tar.bz2,.tar.gz, or.zip) - For Matcha models, select a vocoder from the dropdown (Vocos 22 kHz is recommended)
- Optionally enable Use int8 models if the archive contains quantized variants
- Click Import
The importer downloads the archive, extracts it to Assets/StreamingAssets/SherpaOnnx/tts-models/{name}/, creates a profile, and auto-configures all model paths.
Pre-trained models are available at the sherpa-onnx TTS models page.
Click the + button below the profile list. A new profile named "New Profile" is added and selected.
Select a profile and click the - button. The profile and its model directory are deleted.
Use the Active profile dropdown above the list to select which profile the runtime TTS system will use. This value is serialized to tts-settings.json at build time.
| Field | Description |
|---|---|
| Profile name | Display name; also used as the model folder name |
| Model type | Vits, Matcha, Kokoro, Kitten, ZipVoice, or Pocket |
| Model source | Local, Remote, or LocalZip (see Deployment Options) |
| Field | Default | Description |
|---|---|---|
| Speaker ID | 0 | Speaker index for multi-speaker models |
| Speed | 1.0 | Playback speed multiplier |
| Field | Default | Description |
|---|---|---|
| Rule FSTs | (empty) | Comma-separated paths to .fst text normalization files |
| Rule FARs | (empty) | Comma-separated paths to .far text normalization files |
| Max sentences | 1 | Maximum sentences processed per call |
| Silence scale | 0.2 | Scale factor for silence between sentences |
| Field | Default | Description |
|---|---|---|
| Threads | 1 | Number of inference threads |
| Provider | cpu | ONNX Runtime execution provider |
Each model type shows its own section with paths to .onnx files, token files, lexicons, and type-specific parameters (noise scale, length scale, etc.). These fields are filled automatically during import.
If a model directory exists, the Auto-configure paths button appears at the top of the detail panel. Clicking it scans the model folder and fills all path fields automatically:
- Finds
.onnxmodel files - Locates
tokens.txt, lexicon files,voices.bin - Detects
espeak-ng-dataanddictsubdirectories - Finds
.fstand.fartext normalization rules - Sets default scale parameters for the detected model type
When both normal and int8-quantized .onnx files exist in the model directory, a toggle button appears:
- Use int8 models (blue) — switch to quantized variants for faster inference
- Use normal models (grey) — switch back to full-precision models
Int8 variants are detected by int8 in the file name (e.g. model.int8.onnx). The switcher updates the relevant path fields and preserves all other settings.
Matcha models require a separate vocoder. The vocoder selector appears in the Matcha settings section and during import:
| Vocoder | Description |
|---|---|
| Vocos 22 kHz | Recommended; fast and compact |
| HiFi-GAN v1 | Classic HiFi-GAN vocoder |
| HiFi-GAN v2 | Improved variant |
| HiFi-GAN v3 | Latest variant |
Click Download to fetch the selected vocoder. The old vocoder file is replaced automatically.
Model files stay in Assets/StreamingAssets/SherpaOnnx/tts-models/{profileName}/ and are included in the build as-is.
Set the Base URL in the Remote section. At runtime, the app downloads the model archive from:
{baseUrl}/{profileName}.zip
Use this when models are too large to ship with the app binary.
Model files are zipped automatically at build time and placed in StreamingAssets. On first launch, the app extracts the archive to persistentDataPath.
Use the Pack to zip (test) button to verify the zip process in the Editor. Delete zip removes the test archive.
The build processor handles zip/restore automatically:
- Pre-build: zips the model folder, backs up originals
- Post-build: restores originals, removes zip files
The cache section configures object pooling for TTS playback:
| Pool | Default size | Description |
|---|---|---|
| OfflineTts | 4 | Raw audio buffer pool (float arrays) |
| AudioClip | 4 | Unity AudioClip object pool |
| AudioSource | 2 | AudioSource component pool for parallel playback |
Each pool can be enabled or disabled independently.
Assets/StreamingAssets/SherpaOnnx/
tts-settings.json # Serialized profiles and cache config
tts-models/
{profileName}/ # Model files for each profile
model.onnx
tokens.txt
...
| Issue | Solution |
|---|---|
| "Auto-configure paths" button missing | Import or manually place model files in the profile's model directory |
| Int8 switch button not shown | No int8 variant found; ensure both model.onnx and model.int8.onnx exist |
| Vocoder download fails | Check network connection; vocoder files are hosted on GitHub releases |
| Pack to zip fails | Ensure the model directory exists and contains files |
