-
Notifications
You must be signed in to change notification settings - Fork 52
FIX: Split content for translation before sending #249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7580b96 to
32fae6d
Compare
| text = text_for_translation(translatable) | ||
| chunks = DiscourseTranslator::ContentSplitter.split(text) | ||
| chunks | ||
| .map { |chunk| ::DiscourseAi::Translator.new(chunk, target_locale_sym).translate } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you considered it but I'm wondering if we should translate chunks in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did ... 🤔 but we may hit rate limits
|
I'll add some limits to the splitter. |
bb3d1f3 to
d9a0322
Compare
In discourse/discourse-translator#249 we introduced splitting content (post.raw) prior to sending to translation as we were using a sync api. Now that we're streaming thanks to #1424, we'll chunk based on the LlmModel.max_output_tokens.
Requires: #248
We are seeing a lot of
Timeouts may happen when we send large chunks over to our LLM for translation. Timeouts may also happen in two ways:
This PR solves (2). (1) will be handled in a different PR. We will loosely split content up into chunks. Example post raw is meta's /t/354449.