Skip to content

Conversation

@nattsw
Copy link
Contributor

@nattsw nattsw commented Feb 5, 2025

Currently, Topic and Post have detected_language and translations in custom fields, e.g.

post.custom_fields[DiscourseTranslator::DETECTED_LANG_CUSTOM_FIELD]
post.custom_fields[DiscourseTranslator::TRANSLATED_CUSTOM_FIELD]
topic.custom_fields[DiscourseTranslator::DETECTED_LANG_CUSTOM_FIELD]
topic.custom_fields[DiscourseTranslator::TRANSLATED_CUSTOM_FIELD]

We are moving this into 4 tables/models

post has_one :content_locale, class_name: "DiscourseTranslator::PostLocale"
post has_many :translations, class_name: "DiscourseTranslator::PostTranslation"
topic has_one :content_locale, class_name: "DiscourseTranslator::TopicLocale"
topic has_many :translations, class_name: "DiscourseTranslator::TopicTranslation"

Since there are a lot of duplicates, this is implemented on the Post and Topic using a Concern, and any future translatable content can inherit this concern.

This PR also gets rid of the previous N+1 which happens when determining if the 🌐 translate button should appear for each post.

@nattsw nattsw force-pushed the custom-fields-to-table branch from 4febe61 to 5c67b2d Compare February 6, 2025 19:08
@nattsw nattsw force-pushed the custom-fields-to-table branch from 5c67b2d to 069d043 Compare February 6, 2025 19:31
@nattsw nattsw marked this pull request as ready for review February 7, 2025 09:26
Comment on lines +4 to +5
module Translatable
extend ActiveSupport::Concern
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These concerns are applied in the Post and Topic models.

Comment on lines +14 to +31
def set_detected_locale(locale)
# locales should be "en-US" instead of "en_US" per https://www.rfc-editor.org/rfc/rfc5646#section-2.1
locale = locale.to_s.gsub("_", "-")
(content_locale || build_content_locale).update!(detected_locale: locale)
end

def set_translation(locale, text)
locale = locale.to_s.gsub("_", "-")
translations.find_or_initialize_by(locale: locale).update!(translation: text)
end

def translation_for(locale)
translations.find_by(locale: locale)&.translation
end

def detected_locale
content_locale&.detected_locale
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of accessing custom fields directly, with the new tables, we can access the related translation metadata from the models themselves.

t.string :locale, null: false
t.text :translation, null: false
t.timestamps
end
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating these tables without indexes first.

Comment on lines +21 to +30
bounds = DB.query_single(<<~SQL, model:)
SELECT
COALESCE(MIN(id), 0) as min_id,
COALESCE(MAX(id), 0) as max_id
FROM #{model}_custom_fields
WHERE name IN ('post_detected_lang', 'translated_text')
SQL

start_id = bounds[0]
max_id = bounds[1]
Copy link
Contributor Author

@nattsw nattsw Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only work in batches between custom fields that have translation-related data.

For the following below, we're dealing with both post_detected_lang and translated_text in the same batch run.

)
SELECT 1
SQL
start_id += BATCH_SIZE
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The looping coverage is handled by a test below translation_tables_spec.rb

@nattsw
Copy link
Contributor Author

nattsw commented Feb 7, 2025

(After approval, I will merge at an appropriate time)


def set_detected_locale(locale)
# locales should be "en-US" instead of "en_US" per https://www.rfc-editor.org/rfc/rfc5646#section-2.1
locale = locale.to_s.gsub("_", "-")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking, but do we expect more than one _ to - substitution? If no, maybe just .sub is enough?

Copy link
Contributor Author

@nattsw nattsw Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read from https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names (Microsoft is one of our translation providers) that ca-ES-valencia is an example of a locale that can have double dash.

Though the Microsoft API only returns stuff like “tlh-Latn” one dash, I am being a bit over-cautious if other providers (AI?) return such cases.

Copy link
Member

@ZogStriP ZogStriP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

@nattsw
Copy link
Contributor Author

nattsw commented Feb 7, 2025

Thank you so much @ZogStriP. I will check again and merge on Monday.

@nattsw nattsw merged commit 7fc45d5 into main Feb 10, 2025
5 checks passed
@nattsw nattsw deleted the custom-fields-to-table branch February 10, 2025 04:29
@nattsw nattsw mentioned this pull request Feb 11, 2025
nattsw added a commit that referenced this pull request Feb 11, 2025
Currently (before the custom fields to tables migrations), locales are sometimes saved as "pt-PT" and "pt_BR" due to the API returning the former and us saving the latter through I18n.locale.

e.g. we are seeing the following in the custom fields, which would mean that the table migrations (#201) also have inherited the discrepancies.

```
#<PostCustomField:0x00007faffb49f798
 id: 12321231,
 post_id: 1231241,
 name: "translated_text",
 value: "{\"en_GB\":\"\\u003cp\\u003eGreat post my friend \\u00...",  # < locale is underscored
...>

# and

#<PostCustomField:0x00007faffb49dfd8
 id: 12313123,
 post_id: 123123,
 name: "post_detected_lang",
 value: "pt-PT", # < locale is hyphenated
 ...>
```

This commit adds a migration to convert all values to the hyphenated version, ensures we save the hyphenated ones to the db, and introduces a `locale_matches?` on the translatable models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants