Skip to content

Conversation

@nattsw
Copy link
Contributor

@nattsw nattsw commented Mar 12, 2025

In 6c197c5, we changed to translate raw instead of cooked. This was largely due to DiscourseAI.

However some tests have shown that the original decision of translating cooked is likely due to translation providers like Google or Azure (and the rest) stripping newlines when translating. However these translator providers keep html as-is, so translating cooked is reasonable.

This PR reverts the other providers to use cooked, but keeps DiscourseAI as the only provider using raw.

The following is a trivial example, but in some more complex cases, the absence of certain characters (besides newline) removed by the provider when dealing with raw is a bit more jarring.

Original post written in Japanese (2img 2para)

Screenshot 2025-03-12 at 6 33 41 PM

Post translated using raw (Google)

Screenshot 2025-03-12 at 6 33 30 PM

Post translated using cooked (Google)

Screenshot 2025-03-12 at 6 33 20 PM

@nattsw
Copy link
Contributor Author

nattsw commented Mar 12, 2025

Fixing tests

text_for_translation(translatable, raw: true),
target_locale_sym,
).translate
DiscourseTranslator::TranslatedContentNormalizer.normalize(translatable, translated)
Copy link

@dbattersby dbattersby Mar 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normalizing will cook the response using PrettyText when it's a Post, but not topics from what I understand. Would there ever be a case when the topic title has HTML I wonder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question:

The normalizer™ takes the post/topic and its translation, and applies cook if it is a post, and applies cleanup if it is a topic. Technically speaking the topic title can contain html but the case has been covered here: #224

Copy link

@dbattersby dbattersby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, very nice 👍

@nattsw nattsw merged commit e974006 into main Mar 12, 2025
6 checks passed
@nattsw nattsw deleted the revert-cooked-raw branch March 12, 2025 12:44
@nattsw
Copy link
Contributor Author

nattsw commented Mar 12, 2025

Thank you @dbattersby

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants