Replies: 3 comments 5 replies
-
GPT is very prone to merging lines, at least in 3.5 - it took quite a few iterations to arrive at the current prompt, which more or less eliminates desyncs. Feeding it lines with a clear indication where it should fill in the translation helps to keep it on track (it just has to fill in the blanks). Validation/retry could probably fix desyncs even with a looser format... but it more than doubles the token count for the batch, since it has to resend the whole message chain, so it is unlikely to be a net win on that front! :-) My long term goal is to allow GPT to merge lines when it helps it produce a more fluent translation, then fix up the timings... it doesn't seem able to do that itself, unfortunately - GPT4 might be able to, but it's so much more expensive that I haven't experimented with it much. |
Beta Was this translation helpful? Give feedback.
-
It does actually use an index rather than a timestamp in the translation requests :-) Just using the index still didn't prevent desyncs though so it requires the model to adhere to a strict format in the response, which essentially compels it to keep lines distinct. |
Beta Was this translation helpful? Give feedback.
-
gpt-5, gpt-5-mini and gpt-5-nano all seem to be working OK (not supported in the official release yet - coming soon!). gpt-5-mini seems to be the only one worth using though - gpt-5 is very slow (presumably wasting a lot of tokens on over-thinking) and gpt-5-nano is just terrible, its translations read like old-style machine translated subs 😬 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I got this idea from Subtitle Edit's "Auto-translate via copy-paste" function where they process the .SRT file such that it ends up like this
Then you can just toss it into a translator like DeepL and get:
and the software maps the translation back to the original timestamps since it's a 1-1 mapping separated by each asterisk.
I've been experimenting with this approach with ChatGPT, the translation is often flawless, but the problem is it often ends up combining lines across the asterisks and makes the mapping back desync. But with the functionality of subtrans like batching, validating, re-translating will help enough with this that it becomes a non-problem?
If this works then the consumption of tokens will be largely reduced
Beta Was this translation helpful? Give feedback.
All reactions