-Regarding the size of the training data, in most cases you can build a reasonable custom neural voice with 500 utterances. According to our tests, adding more training data in most languages does not necessarily improve naturalness of the voice itself (tested using the MOS score), however, with more training data that covers more word instances, you have higher possibility to reduce the DSAT (dis-satisfied part of the speech, for example, the glitches) ratio for the voice. We provide [some DSAT examples](https://github.com/Azure-Samples/Cognitive-Speech-TTS/blob/master/CustomVoice/DSAT-examples.md) to help you learn what the DSAT part sounds like.
0 commit comments