Skip to content

Issues: no pause after "etc." (and other shorts) when it is in the end of a sentence, no pause around parenthesis, wrong chunking on shorts in the middle of a sentence. #308

@Mark4Grey

Description

@Mark4Grey

Describe the bug

  • Kokoro does not see the difference between shorts like "etc." in the middle and the end of the sentence. There is no pause at all after the short even it is placed in the end of the sentence (which happens pretty often). Would be nice if Kokoro would sense a capital letter in the next word and make a full stop pause before it (Names and ABBReviations still may cause an issue though).
  • Would be nice if Kokoro will and slightly raise tone of her voice for naturalness when reading "e.g.,".
  • Kokoro seems to be ignoring parenthesis. Would be nice if Kokoro will make a "comma" pause and lower her tone while reading the text within parenthesis when it comes to a plain text (code spelling will need a different approach perhaps).

Example of text
Include the platform, version numbers of your docker, etc. Whether its GPU (Nvidia or other) or CPU, Mac, Linux, Windows, etc.

  • Kokoro is chunking text in a wrong place at times when there are shorts perhaps as it does not see the difference between full stop period and the period after abbreviation. It causes unwanted long pause in the middle of some sentences.

Example of text
Keep this in mind:
These predictions are based on historical data and might not reflect the actual tide times for every year.
Tidal patterns can vary depending on the specific location within Auckland (e.g., inner harbor vs. outer harbor).

......
Logs
11:01:40 PM | INFO | text_processor:222 | Yielding chunk 1: 'Keep this in mind: These predictions are based on ...' (249 tokens)
11:01:40 PM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'Keep this in mind: These predictions are based on historical data and might not reflect the actual t...'
11:01:49 PM | DEBUG | kokoro_v1:252 | Got audio chunk with shape: torch.Size([325200])
11:01:49 PM | INFO | text_processor:259 | Yielding final chunk 2: 'outer harbor). The predicted high tide time is for...' (134 tokens)
11:01:49 PM | DEBUG | kokoro_v1:245 | Generating audio for text with lang_code 'a': 'outer harbor). The predicted high tide time is for a specific point in space, and actual tidal condi...'

Branch / Deployment used
kokoro_v1:252

Operating System
Docker Engine v28.1.1
Docker container is running in WSL on Win10 Version10.0.19045 Build 19045 locally.
All the Kokoro processing offloaded to CPU.

Additional context
The setup was done a few days ago.

BTW, great job guys!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions