Speech Dataset Maker is a Windows application designed to simplify the creation of high-quality speech datasets for training Text-to-Speech (TTS) models. It is especially convenient for directly preparing datasets in Piper format.
- User-Friendly Interface: Intuitive GUI for easy dataset creation.
- Direct Export for TTS: Export recordings directly in Piper format without a separate export step.
- Multiple Dataset Support: Define different datasets with different sentences and technical settings.
- Automatic Silence Trimming: Automatically removes silence at the beginning and end of recordings.
- Unicode & RTL/LTR Support: Works with any language, including right-to-left scripts.
- Editable Metadata: Edit the text before saving to the metadata file.
- Download the setup file from the Releases page.
- Install the application.
- Ensure .NET 8.0 Desktop Runtime is installed on your system.
- Windows operating system
- .NET 8.0 Desktop Runtime (57MB)
To start recording each dataset, you need:
- a
.jsonfile for configurations - a
.tsvfile with ID–sentence pairs
You can add or edit these files in the dataset folder of the application. A link to that folder is provided in the interface.
- Select the desired microphone from the list of available devices.
- Select the dataset in the interface.
- Unrecorded sentences will appear in the text box — you can edit them before recording.
- When ready, click Record and read the sentence aloud.
- Use the Play button to check your recording.
- If satisfied, click Save. The next sentence will appear.
- The app automatically trims silence and updates metadata.
- Recordings are directly stored in Piper format, ready for TTS training.
- Click Output Folder to view the dataset at any stage.
Ideas and inspiration for this project were adapted from Piper Recording Studio.
This project is licensed under the MIT License. See the LICENSE file for details.
