-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Subtitle Edit version: [Subtitle Edit 4.0.14]
Windows: [doplň verzi - např. Windows 10 22H2]
AllTalk: [v1.9c] running on http://127.0.0.1:7851
TTS Engine: AllTalk TTS (Coqui XTTSv2 based)
Language: Czech (cs)
Problem
When generating TTS via AllTalk from Subtitle Edit, Czech diacritics (ř, č, ž, etc.) are corrupted.
Example input in Subtitle Edit:
Takže čeština tu funguje suprově, co ty tomu říkáš?
What AllTalk receives (visible in console log):
Take etina tu funguje suprov, co ty tomu íká?
However, AllTalk itself handles Czech correctly when called directly via API.
Evidence / Reproduction
Calling AllTalk API directly via PowerShell works correctly and preserves diacritics:
$body = @{
text_input = "Takže čeština tu funguje suprově, co ty tomu říkáš?"
text_filtering = "none"
character_voice_gen = "female_01.wav"
narrator_enabled = "false"
narrator_voice_gen = "male_01.wav"
text_not_inside = "character"
language = "cs"
output_file_name = "ps_test"
output_file_timestamp= "true"
autoplay = "false"
autoplay_volume = "0.8"
}
Invoke-RestMethod -Method Post `
-Uri "http://127.0.0.1:7851/api/tts-generate" `
-ContentType "application/x-www-form-urlencoded; charset=utf-8" `
-Body $body
...AllTalk console output (correct):
[AllTalk TTSGen] Takže čeština tu funguje suprově, co ty tomu říkáš?
Root Cause (suspected)
Subtitle Edit likely encodes the POST request body using the system codepage
(Encoding.Default / CP-1250) instead of UTF-8 when sending
application/x-www-form-urlencoded data, causing non-ASCII characters
to be corrupted before reaching AllTalk.
Suggested Fix
Ensure UTF-8 encoding is used for application/x-www-form-urlencoded POST bodies
and explicitly set Content-Type: application/x-www-form-urlencoded; charset=utf-8
in the HTTP request headers.
Related
Similar encoding issues have been reported in: