-
Notifications
You must be signed in to change notification settings - Fork 52
DEV: Move translation custom fields into their own tables #201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
4febe61 to
5c67b2d
Compare
5c67b2d to
069d043
Compare
| module Translatable | ||
| extend ActiveSupport::Concern |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These concerns are applied in the Post and Topic models.
| def set_detected_locale(locale) | ||
| # locales should be "en-US" instead of "en_US" per https://www.rfc-editor.org/rfc/rfc5646#section-2.1 | ||
| locale = locale.to_s.gsub("_", "-") | ||
| (content_locale || build_content_locale).update!(detected_locale: locale) | ||
| end | ||
|
|
||
| def set_translation(locale, text) | ||
| locale = locale.to_s.gsub("_", "-") | ||
| translations.find_or_initialize_by(locale: locale).update!(translation: text) | ||
| end | ||
|
|
||
| def translation_for(locale) | ||
| translations.find_by(locale: locale)&.translation | ||
| end | ||
|
|
||
| def detected_locale | ||
| content_locale&.detected_locale | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of accessing custom fields directly, with the new tables, we can access the related translation metadata from the models themselves.
| t.string :locale, null: false | ||
| t.text :translation, null: false | ||
| t.timestamps | ||
| end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating these tables without indexes first.
| bounds = DB.query_single(<<~SQL, model:) | ||
| SELECT | ||
| COALESCE(MIN(id), 0) as min_id, | ||
| COALESCE(MAX(id), 0) as max_id | ||
| FROM #{model}_custom_fields | ||
| WHERE name IN ('post_detected_lang', 'translated_text') | ||
| SQL | ||
|
|
||
| start_id = bounds[0] | ||
| max_id = bounds[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only work in batches between custom fields that have translation-related data.
For the following below, we're dealing with both post_detected_lang and translated_text in the same batch run.
| ) | ||
| SELECT 1 | ||
| SQL | ||
| start_id += BATCH_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The looping coverage is handled by a test below translation_tables_spec.rb
|
(After approval, I will merge at an appropriate time) |
|
|
||
| def set_detected_locale(locale) | ||
| # locales should be "en-US" instead of "en_US" per https://www.rfc-editor.org/rfc/rfc5646#section-2.1 | ||
| locale = locale.to_s.gsub("_", "-") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpicking, but do we expect more than one _ to - substitution? If no, maybe just .sub is enough?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read from https://learn.microsoft.com/en-us/globalization/locale/standard-locale-names (Microsoft is one of our translation providers) that ca-ES-valencia is an example of a locale that can have double dash.
Though the Microsoft API only returns stuff like “tlh-Latn” one dash, I am being a bit over-cautious if other providers (AI?) return such cases.
ZogStriP
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 👍
|
Thank you so much @ZogStriP. I will check again and merge on Monday. |
Currently (before the custom fields to tables migrations), locales are sometimes saved as "pt-PT" and "pt_BR" due to the API returning the former and us saving the latter through I18n.locale. e.g. we are seeing the following in the custom fields, which would mean that the table migrations (#201) also have inherited the discrepancies. ``` #<PostCustomField:0x00007faffb49f798 id: 12321231, post_id: 1231241, name: "translated_text", value: "{\"en_GB\":\"\\u003cp\\u003eGreat post my friend \\u00...", # < locale is underscored ...> # and #<PostCustomField:0x00007faffb49dfd8 id: 12313123, post_id: 123123, name: "post_detected_lang", value: "pt-PT", # < locale is hyphenated ...> ``` This commit adds a migration to convert all values to the hyphenated version, ensures we save the hyphenated ones to the db, and introduces a `locale_matches?` on the translatable models.
Currently,
TopicandPosthave detected_language and translations in custom fields, e.g.We are moving this into 4 tables/models
Since there are a lot of duplicates, this is implemented on the
PostandTopicusing aConcern, and any future translatable content can inherit this concern.This PR also gets rid of the previous N+1 which happens when determining if the 🌐 translate button should appear for each post.