FIX: Split content for translation before sending #249

nattsw · 2025-03-11T07:49:14Z

Requires: #248

We are seeing a lot of

Failed to machine-translate Post#1708161 to it: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

Timeouts may happen when we send large chunks over to our LLM for translation. Timeouts may also happen in two ways:

Site user presses 🌐 to translate a post manually (controller API); or
We send old posts for backfilling (job).

This PR solves (2). (1) will be handled in a different PR. We will loosely split content up into chunks. Example post raw is meta's /t/354449.

tgxworld · 2025-03-11T08:19:46Z

app/services/discourse_translator/discourse_ai.rb

+      text = text_for_translation(translatable)
+      chunks = DiscourseTranslator::ContentSplitter.split(text)
+      chunks
+        .map { |chunk| ::DiscourseAi::Translator.new(chunk, target_locale_sym).translate }


Not sure if you considered it but I'm wondering if we should translate chunks in parallel.

I did ... 🤔 but we may hit rate limits

nattsw · 2025-03-11T08:32:13Z

I'll add some limits to the splitter.

In discourse/discourse-translator#249 we introduced splitting content (post.raw) prior to sending to translation as we were using a sync api. Now that we're streaming thanks to #1424, we'll chunk based on the LlmModel.max_output_tokens.

nattsw force-pushed the split-raw-before-translate branch from 7580b96 to 32fae6d Compare March 11, 2025 08:14

tgxworld reviewed Mar 11, 2025

View reviewed changes

nattsw and others added 4 commits March 25, 2025 11:03

FIX: Split raw content to prevent job from timing out

e55eab0

RUBOOOOOOOcOOOooOOP

2ae7674

Additional test case

43d78c7

fix bad merge

d9a0322

xfalcox force-pushed the split-raw-before-translate branch from bb3d1f3 to d9a0322 Compare March 25, 2025 14:06

fix test

9f940ee

xfalcox approved these changes Mar 25, 2025

View reviewed changes

xfalcox merged commit 23b83fb into main Mar 25, 2025
6 checks passed

xfalcox deleted the split-raw-before-translate branch March 25, 2025 18:21

nattsw mentioned this pull request Jun 23, 2025

DEV: Split content based on llmmodel's max_output_tokens discourse/discourse-ai#1456

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX: Split content for translation before sending #249

FIX: Split content for translation before sending #249

Uh oh!

nattsw commented Mar 11, 2025 •

edited

Loading

Uh oh!

tgxworld Mar 11, 2025

Uh oh!

nattsw Mar 11, 2025 •

edited

Loading

Uh oh!

nattsw commented Mar 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

FIX: Split content for translation before sending #249

FIX: Split content for translation before sending #249

Uh oh!

Conversation

nattsw commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgxworld Mar 11, 2025

Choose a reason for hiding this comment

Uh oh!

nattsw Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nattsw commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

nattsw commented Mar 11, 2025 •

edited

Loading

nattsw Mar 11, 2025 •

edited

Loading

nattsw commented Mar 11, 2025 •

edited

Loading