Piper TTS is great — I use it successfully on macOS and Windows. I've trained my own voices, but there's one feature I’m missing. There’s no tag or function to define a pause and its duration. For example, something like <silence=100> to indicate a 100ms pause. I use Piper TTS to generate audiobooks, and for my needs, it’s good enough. However, sometimes it’s hard to follow the text when it flows without significant breaks.
Nowadays, you can feed the text into an AI and ask it to insert silence tags in various places according to your instructions — for example, after dialogue, at the beginning of a chapter, etc. A simple tag to define silence at a specific point would be enough to greatly improve the clarity of the text.
Additionally, it would be useful to introduce an option for random sentence pauses — currently, we only have a fixed value for --sentence-silence. Adding an option where --sentence-silence is randomly selected from a user-defined range (e.g., between 300 and 600ms) would make the output sound more natural and human-like.