-
-
Notifications
You must be signed in to change notification settings - Fork 704
Description
Describe the bug
- Kokoro does not see the difference between shorts like "etc." in the middle and the end of the sentence. There is no pause at all after the short even it is placed in the end of the sentence (which happens pretty often). Would be nice if Kokoro would sense a capital letter in the next word and make a full stop pause before it (Names and ABBReviations still may cause an issue though).
- Would be nice if Kokoro will and slightly raise tone of her voice for naturalness when reading "e.g.,".
- Kokoro seems to be ignoring parenthesis. Would be nice if Kokoro will make a "comma" pause and lower her tone while reading the text within parenthesis when it comes to a plain text (code spelling will need a different approach perhaps).
Example of text
Include the platform, version numbers of your docker, etc. Whether its GPU (Nvidia or other) or CPU, Mac, Linux, Windows, etc.
- Kokoro is chunking text in a wrong place at times when there are shorts perhaps as it does not see the difference between full stop period and the period after abbreviation. It causes unwanted long pause in the middle of some sentences.
Example of text
Keep this in mind:
These predictions are based on historical data and might not reflect the actual tide times for every year.
Tidal patterns can vary depending on the specific location within Auckland (e.g., inner harbor vs. outer harbor).
......
Logs
11:01:40 PM | INFO | text_processor:222 | Yielding chunk 1: 'Keep this in mind: These predictions are based on ...' (249 tokens)
11:01:40 PM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Keep this in mind: These predictions are based on historical data and might not reflect the actual t...'
11:01:49 PM | DEBUG | kokoro_v1:252 | Got audio chunk with shape: torch.Size([325200])
11:01:49 PM | INFO | text_processor:259 | Yielding final chunk 2: 'outer harbor). The predicted high tide time is for...' (134 tokens)
11:01:49 PM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'outer harbor). The predicted high tide time is for a specific point in space, and actual tidal condi...'
Branch / Deployment used
kokoro_v1:252
Operating System
Docker Engine v28.1.1
Docker container is running in WSL on Win10 Version10.0.19045 Build 19045 locally.
All the Kokoro processing offloaded to CPU.
Additional context
The setup was done a few days ago.
BTW, great job guys!