Skip to content

AllTalk TTS: Czech diacritics lost (non-UTF8 form encoding in request) #10133

@karkojk

Description

@karkojk

Subtitle Edit version: [Subtitle Edit 4.0.14]
Windows: [doplň verzi - např. Windows 10 22H2]
AllTalk: [v1.9c] running on http://127.0.0.1:7851
TTS Engine: AllTalk TTS (Coqui XTTSv2 based)
Language: Czech (cs)

Problem

When generating TTS via AllTalk from Subtitle Edit, Czech diacritics (ř, č, ž, etc.) are corrupted.

Example input in Subtitle Edit:

Takže čeština tu funguje suprově, co ty tomu říkáš?

What AllTalk receives (visible in console log):

Take etina tu funguje suprov, co ty tomu íká?

However, AllTalk itself handles Czech correctly when called directly via API.

Evidence / Reproduction

Calling AllTalk API directly via PowerShell works correctly and preserves diacritics:

$body = @{
  text_input           = "Takže čeština tu funguje suprově, co ty tomu říkáš?"
  text_filtering       = "none"
  character_voice_gen  = "female_01.wav"
  narrator_enabled     = "false"
  narrator_voice_gen   = "male_01.wav"
  text_not_inside      = "character"
  language             = "cs"
  output_file_name     = "ps_test"
  output_file_timestamp= "true"
  autoplay             = "false"
  autoplay_volume      = "0.8"
}

Invoke-RestMethod -Method Post `
  -Uri "http://127.0.0.1:7851/api/tts-generate" `
  -ContentType "application/x-www-form-urlencoded; charset=utf-8" `
  -Body $body
...

AllTalk console output (correct):

[AllTalk TTSGen] Takže čeština tu funguje suprově, co ty tomu říkáš?

Root Cause (suspected)

Subtitle Edit likely encodes the POST request body using the system codepage
(Encoding.Default / CP-1250) instead of UTF-8 when sending
application/x-www-form-urlencoded data, causing non-ASCII characters
to be corrupted before reaching AllTalk.

Suggested Fix

Ensure UTF-8 encoding is used for application/x-www-form-urlencoded POST bodies
and explicitly set Content-Type: application/x-www-form-urlencoded; charset=utf-8
in the HTTP request headers.

Related

Similar encoding issues have been reported in:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions