Use gettext for recurring phrases, fix a bunch of i18n issues!!!!! #684

johnzhou721 · 2025-06-28T22:57:21Z

Fixes #683, fixes #692, fixes #689 (3rd through plugin).

Motivation

The existing model for maintaining translations for recurring phrases (databags) causes some issues; translators often do not get the nessacary context, the strings are not full sentences or the entire label, and that the strings for other languages aren't nessacarily updated when a new English key is added; in addition, this is an ad-hoc format and there is no way to keep track of whether a translation is a genuine human one or a machine translation, or how much work we've done in translating databags.

gettext is the standard solution for most translation issues; lektor-i18n-plugin provides functionality to translate templates properly; this PR contains the nessacary setup, with a PR on lektor-i18n-plugin (beeware/lektor-i18n-plugin#6) containing some flags ensuring the sorting of strings for safety.

While we're updating the lektor-i18n-plugin, since a new release will take some time, some unrelated i18n issues have also been addressed.

Summary

In this PR:

Needed setup for string extraction to work.
Converts the templates to use the gettext format; some unlocalized and/or unpluralized have been fixed along the way. In addition some more l10n fixes are included, like when generating a superpower list we don’t just use ", " but we allow use of the PO file to customize the item separator.
Manually inputs all strings in the existing databags into the PO files.
- Since the lack of context means that some translations won’t be 100% correct even if done by native people, precautions have been taken to ensure that these likely-not-the-best translations and/or translations not produced by native speakers of that language (from commit history) are marked as FUZZY.
Updates the plugin version, and some miscellaneous manual modifications to the top of the PO files, that are easier to explain along with the plugin updates below.

Since I figured that updating the plugin isn’t really done often, I decided to fix all bugs I can find in the plugin to avoid the need to update lektor-i18n-plugin repo again. The changes for the i18n plugin includes the following list; I was trying to fix all the bugs since I figured making a release would be complicated, some of those changes aren’t really relevant to this PR.

Ensuring that all POT generation steps sorts the keys (related to Website translation).
Using xgettext to combine POT files, as it can properly combine the headers and context.
Filling out the Project-Id-Version header entry when generating PO files from the POT for the first time, since we’re in a position to know what the project name is in the plugin.
- This was needed for local work as the PO editor I used to fill in the strings would refuse to save any file without Project-Id-Version; however, new msgmerges simply uses the original header, so I have updated the headers of all the languages manually to match what it’d look like if the PO files are reinitialized (except for a comment block on top saying it’s distributed as the same license as the main project).
Template translation now supports pgettext and npgettext methods.
There used to be a bug due to a limitation of msginit in lektor-i18n-plugin upstream as well where if your content language is not in English there will be extra automatically populated strings in the English version of the PO file. This bug is now fixed. (not really relevant to our own Website, just trying to fix more bugs here)
Conversely, when the PO file is updated each time for the English version, the new msgids aren’t automatically filled into the msgstr fields (which is different from when the PO files are initially initialized). This is now resolved.
- There’s a functional impact of this for our website: if a person has a string but later adds a plural form for it, msgmerge will actually fill the new plural field with the previous singular msgstr, creating the illusion of plurals not working, and thus requires manual updating of the English PO file to resolve this.
Fixed a bug in lr file translation – see Two Seemingly Untranslated Strings #689.

Answers about Huge Diffs on Files

The huge diff in the POT file is caused by using xgettext to combine POTs; the POT file is now consistent with how our actual PO files are formatted, which is a good thing. The diff there will be one-off.
There’s a big diff in Chinese PO file. When attempting to fix a bug in the plugin for filling in msgstrs since I used polib I didn’t trust its formatting so I reformatted all the PO files at the end of end of each build, producing unnessacery diffs (since Weblate wraps differently).
- This no longer happens and I have tested the behavior.
- Nevertheless it’d be good to have strings wrapped the right way in gettext conventions so I rewrapped all the PO files just for consistency.

Things I marked as Fuzzy or Need Editing

GitHub blame shows that the phrase has been changed in a PR that is not purely a translation PR. This brings up the possibility of machine translation (although not always, I guess there might be polyglots but let's be on the safe side)
When plural forms have been either machine-translated or I’ve just copied the singular form over.
When the previous databag storage wasn’t an entire recurring label or sentence. Translators probably didn't have much context back then
When older extracted strings didn’t extract the punctuation out and I manually added them everywhere (since different languages used different conventions). Sometimes I try to add a semicolon or whatever for that language instead of the English semicolon.

Editing Done

All introduced strings have been re-audited an unfuzzied in Simplified Chinese
Some strings in Traditional Chinese Google-Translated.

Questions

Should I credit the translators properly by putting the databag-translator's name and email address into the po file, if no one so far had translated it on Weblate? Is it a privacy concern to commit... email addresses and (potentially full) names??
In the lektorproject, the locale for pt_BR is br and not pt_BR. What is the purpose of that?
https://github.com/beeware/beeware.github.io/blob/lektor/BeeWare.lektorproject#L17-L20 (EDIT -- normalized by me already)

johnzhou721

Some notes. I'm really sweating right now after slaving away a solid 3 hours of my life that I'll never get back but am happy to give away... so need to step away from this for now, updating all the po files by hand is going to be hard, so @freakboy3742 or any adults w/ a credit card who signed up for a deepL key, maybe we should just deepL all the strings? Or I'll copy them in tomorrow or after dinner today.

johnzhou721 · 2025-06-28T23:04:23Z

BeeWare.lektorproject

 lektor-gravatar = 0.1.3
 lektor-markdown-admonition = 0.3.1
-git+https://github.com/beeware/lektor-i18n-plugin@v0.5.4 =
+../lektor-i18n-plugin =


Need to replace with new version after beeware/lektor-i18n-plugin#6 gets merged. That is just for extra safety, seem to work fine without it but you never know when the internal impls of any of those programs change...

../ is my local checkout I used to update the pos

johnzhou721 · 2025-06-28T23:05:37Z

babel.cfg

@@ -1,2 +1,3 @@
 [jinja2: **/templates/**.html]
 encoding = utf-8
+trimmed = True


See https://stackoverflow.com/questions/68868257/how-do-i-enable-the-trimmed-policy-in-jinja2.

i18n/contents+ar.po

johnzhou721 · 2025-06-28T23:06:56Z

packages/lektor_beeware_plugin/lektor_beeware_plugin.py

 @pass_context
 def translate(context, string, bag_name="translate"):
+    if bag_name == 'translate':
+        raise RuntimeError("Use the new gettext system instead")


While doing this PR. eventually I will replace all other bags as well.

johnzhou721 · 2025-06-28T23:07:12Z

templates/blog-post.html

    <h1>{{ this.title }}</h1>
-    <p>{{ "posted_by"|trans }}
-        {% if this.mastodon_handle %}
+    {% set author_link %}


[sweating intensifies]

johnzhou721 · 2025-06-28T23:07:35Z

templates/event.html

+{% elif this.event_type == "keynote" %}
+  <p>{% trans title=this.title, url=this.url, talk_title=this.talk_title %}{{ speakers_list }} will be keynoting at {{ title }}, giving a presentation entitled "<a href="{{ url }}">{{ talk_title }}</a>".{% endtrans %}</p>
+{% elif this.event_type == "tutorial" %}
+  <p>{% trans title=this.title, url=this.url, talk_title=this.talk_title %}{{ speakers_list }} will be presenting a tutorial at {{ title }} entitled "<a href="{{ url }}">{{ talk_title }}</a>".{% endtrans %}</p>


sorry for the duplication here... but I realize I can't do away with it.

templates/project.html

johnzhou721 · 2025-06-29T00:05:54Z

@HalfWhitt I apologize that some code you wrote for translation bags might get deleted. The bird has flown away... (continuing the extended metaphor in the other thread)

johnzhou721 · 2025-06-29T01:44:09Z

Yikes... looks like if you're doing _(name) when name is a variable, it does not get extracted... this happens with the badge macro. Which means this PR must be reviewed extremely carefully for such things.

gettexting

johnzhou721 · 2025-06-29T01:47:22Z

FYI -- marked some arabic strings as fuzzy b/c cannot figure out where to put periods correctly... i'm using textedit to edit the po files directly, spent some time yak shaving to use emacs po-mode but realized I don't know how to use emacs...

johnzhou721 · 2025-06-29T02:09:38Z

Yikes. More strings not extracted out of event.html...

johnzhou721 · 2025-06-29T02:42:51Z

Isolating event.html into a seperate directory and deleting random components shows that deleting

{% for slug in this.speaker %}
    {%
        do speaker_names.update(
            {slug: members.filter(F._slug == slug).first().name}
        )
    %}
{% endfor %}

will cause babel to extract properly.

johnzhou721 · 2025-06-29T14:13:52Z

Progress: python-babel/babel#1216 reports a simple MWE for this situation.

johnzhou721 · 2025-06-29T16:12:06Z

Given that this seems to be a bug, I'm going to work around it by separating the logic into a seperate macro. To pass Python dicts around, I will put filters that converts to and from JSON in our plugin.

[sweats]

johnzhou721 · 2025-06-29T16:22:26Z

And... now the messages are extracted! Lektorbuilding to update the translation files and then committing.

johnzhou721 · 2025-06-30T02:13:09Z

FYI beeware/lektor-i18n-plugin#5 caused a huge diff in 42a8f3c because of the way xgetttext seems to output stuff differently, but a cursory diff will get you the conclusion that it's actually good because now the difference between the format of the pot and the po file is 0, sans the translated strings.

johnzhou721 · 2025-06-30T02:24:24Z

@freakboy3742 @HalfWhitt (latter -- since you started the freeform) I'd like some preliminary comments on this before I go copy all the strings into the po files and suggestions on how to work around the issue linked #684 (comment) ? 'cause this is going to produce huge, multithousand-line diffs.

Read the above comments though, if you have time, especially the last one. Thanks

johnzhou721 · 2025-06-30T02:58:29Z

Maybe CI is using Ubuntu 24.04 with gettext 0.21 while I'm using gettext 0.25 on macOS and we're hitting the test case over here at https://github.com/translate/translate/pull/5439/files which differentiates b/w <=0.23 and >0.23...

See: https://launchpad.net/ubuntu/noble/+package/gettext

OK... time to go yak shaving tomorrow to install gettext 0.21...

johnzhou721 · 2025-06-30T02:59:03Z

Hmm... maybe let me try my ubuntu 22.04 vm tomorrow. See how that works.

johnzhou721 · 2025-06-30T02:25:52Z

i18n/contents.pot


-#: https://beeware.org/ (content/contents+en.lr:button-block.label)


FYI beeware/lektor-i18n-plugin#5 caused a huge diff in 42a8f3c because of the way xgetttext seems to output stuff differently, but a cursory diff will get you the conclusion that it's actually good because now the difference between the format of the pot and the po file is 0, sans the translated strings.

Not in the specific hash but looking at these changes

johnzhou721 · 2025-06-30T15:23:37Z

Maybe CI is using Ubuntu 24.04 with gettext 0.21 while I'm using gettext 0.25 on macOS and we're hitting the test case over here at https://github.com/translate/translate/pull/5439/files which differentiates b/w <=0.23 and >0.23...

See: https://launchpad.net/ubuntu/noble/+package/gettext

OK... time to go yak shaving tomorrow to install gettext 0.21...

This is resolved now.

github-actions · 2025-08-04T08:52:24Z

Visit the preview URL for this PR (updated for commit 2ebe489):

https://beeware-org--pr684-jinjai18n-04pcm3hg.web.app

_{(expires Mon, 11 Aug 2025 08:51:55 GMT)}

_{🔥 via Firebase Hosting GitHub Action 🌎}

_{Sign: b0da44bc067e7d9a4255c77cb2c5fce572218cec}

johnzhou721 · 2025-08-06T00:06:30Z

Oh… talking about how Ubuntu 24.04 keeps wanting to rewrap the Weblate output when building this PR but not on main.

Might be related to the POT formatting changes… I’d need to get a minimal example working to see what is going on.

johnzhou721 · 2025-08-06T05:45:37Z

Okay... I think it's sort of clear that the rewrapping bug may be an error in Translate Toolkit. I have made an minimum working example and posted at translate/translate#5661

@freakboy3742 So what I'm actually asking for next besides potential ways the bug is happening (which I already filed over there) is for you to take a look at the plugin changes and see if you're okay with them. Doing this one step at a time, let's get that reviewed and merged first before we discuss more issues about this PR here. Thanks!

johnzhou721 · 2025-08-08T19:18:19Z

Fwiw before I get on an airplane: the line wrapping thing is a translate toolkit / weblate bug…not sure why it’s not happening to main yet.

(probably because there hasn’t been a bunch of translation update commits in awhile… seems that wrap only happens when the pot gets altered.)

johnzhou721 · 2025-08-17T04:47:42Z

FWIW: I had no idea why this sort of thing happened with the line wrapping... msgmerge seem to rewrap less but msgcat rewraps every time -- I was using msgcat in one of the old commits in lektor-i18n-plugin before (to reformat everything once they're done) on the final po file and that reformats. I've pasted in old-formatted strings into the new zh_CN po and clean and built and now there's no changes from the old-formatted strings -- so all this rewrap is one-off (hopefully) due to probably mismatched lektor plugin reinstalls, but I'm not reverting the rewrap b/c it seems to be consistent with what gettext would usually produce anyways.

@freakboy3742 -- this is ready for you to take a look at. sorry for the noise

johnzhou721 · 2025-09-07T18:33:59Z

@freakboy3742 I'm sort of sorry for the messy summmary... but I have no idea how to explain the changes I made in a more concise way... could a look be taken at this? Thank you very much.

freakboy3742 · 2025-09-08T23:13:15Z

I'm under a crunch preparing for a conference trip, so I'm going to be short on time for the next couple of weeks. I'll take a look if I get a chance, but I wouldn't bet on that happening soon.

However - as you've already identified - the PR is its own worst enemy. How can you explain the changes more concisely? Start by summarizing what you have done. What have you done? Why is it required? Why have you taken the approach you have? I read the PR summary here, and my eyes glaze over. I still only have the vaguest idea what you're trying to accomplish, and the mechanism by which it has been achieved. The first block of substantive content in the summary should not be a multi-paragraph ramble about the nature of fuzziness. If fuzziness is an issue - then start with "I've done X; this has some issues with fuzziness, details to follow" (or similar).

johnzhou721 · 2025-09-08T23:14:19Z

@freakboy3742 Thanks for the tips for preparing a good PR summary. I will do some extensive refactoring.

johnzhou721 · 2025-09-09T01:05:29Z

@freakboy3742 I have updated the PR summary to be more concise and a bit more logically organized in terms of what this does. Feel free to take a look at this and the plugin PR if you have a chance.

Note -- sorry for weird formatting in the description I used Microsoft Word to write the Markdown code but it pasted as an image to I pasted into textedit and then pasted the code into this PR.

Thank ya

johnzhou721 · 2025-09-27T21:40:50Z

@freakboy3742 It's been a few weeks, have you gotten time to take a look at this yet? Thanks! No worries if you didn't.

johnzhou721 · 2025-10-11T14:18:09Z

Hey @kattni Thanks for reviewing. Could you please reapply preview as that would make some discussion easier over at lektor-i18n-plugin?

johnzhou721 · 2025-11-03T22:30:28Z

@kattni FYI: I haven't added _("") to the new docs landing page yet, since the amount of added strings in this PR is already a bit big. I'm planing to resolve that and other untranslated template-provided text in a future PR that will fix #699. Is this okay with you?

kattni · 2025-11-03T22:58:51Z

Hello, John. I have an update regarding the website.

First of all, I want to say that your translation updates have been good work. They have also been complicated work, including a lot of moving parts. They solve an important problem with the current Lektor situation. However, the thing we've learned from the last twelve months of working with Lektor is that it's no longer working for us.

To that end, I am looking into shifting the website to MkDocs. This shift would provide us with a much more accessible option, significantly smoother translation tooling, as well as uniformity across the website and all documentation. I am still in the proof of concept phase with this, however it is not at a point where additional assistance will be helpful. Overall, there are very few potential blockers. I should know for sure soon whether this is the solution we're going to use.

So, for now, let's put a pin in these changes. At this point, given that we ultimately need to consider finding a replacement for Lektor, it does not, at the moment, make sense to continue working on these changes, or to continue the review process. If we reach a point where it turns out Lektor is, in fact, our only viable option, we can come back to this.

We'll leave this PR open for now.

johnzhou721 added 4 commits June 28, 2025 17:17

switch to gettext

331ee2b

verify that all |trans statements without bag specification are replaced

86d0e9f

translate most chinese strings

af41d4f

more context

d9499db

johnzhou721 commented Jun 28, 2025

View reviewed changes

Update templates/project.html

13540dd

start filling in arabic translations, also correct badge macro

e6b1e27

gettexting

work done today... still no fix for missing strings

2936cf9

johnzhou721 added 6 commits June 29, 2025 11:23

workaround a translaiton bug

e6ba812

restore pluralization

88d637a

update the plugin a bit

42a8f3c

add plural forms

5dc5542

plural forms (filled in missing ones, need the sprinting mentor part)

69ac875

add comment

07ed969

use an older gettext on a gnu/linux vm

84e158c

johnzhou721 commented Jun 30, 2025

View reviewed changes

johnzhou721 marked this pull request as ready for review June 30, 2025 15:23

freakboy3742 added the preview Approved for an automated preview label Aug 4, 2025

Merge branch 'lektor' into jinjai18n

c641e15

johnzhou721 and others added 5 commits August 17, 2025 15:21

misc fixups

c8125e8

Merge branch 'lektor' into jinjai18n

3d94bab

reupdate using ubuntu conventions

ffe1cca

reformat using msgcat

394c235

Merge branch 'lektor' into jinjai18n

93a8a63

johnzhou721 added 2 commits September 16, 2025 18:08

Merge branch 'lektor' into jinjai18n

e6deb66

Merge branch 'lektor' into jinjai18n

522d17c

freakboy3742 requested a review from kattni October 10, 2025 06:24

johnzhou721 and others added 6 commits October 15, 2025 19:29

Merge remote-tracking branch 'upstream/lektor' into jinjai18n

bbd2154

Rewrap on Ubuntu

e173e2b

Merge commit 'fa337ce' into jinjai18n

74e364c

Merge remote-tracking branch 'origin/jinjai18n' into jinjai18n

bbfeaa9

Update translatio files.

53353a6

Merge branch 'lektor' into jinjai18n

9a314e2

kattni mentioned this pull request Nov 3, 2025

v0.5.5 beeware/lektor-i18n-plugin#6

Open

4 tasks


		#: https://beeware.org/ (content/contents+en.lr:button-block.label)

Uh oh!

Use gettext for recurring phrases, fix a bunch of i18n issues!!!!! #684

Are you sure you want to change the base?

Use gettext for recurring phrases, fix a bunch of i18n issues!!!!! #684

Uh oh!

Conversation

johnzhou721 commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Summary

Answers about Huge Diffs on Files

Things I marked as Fuzzy or Need Editing

Editing Done

Questions

Uh oh!

johnzhou721 left a comment

Choose a reason for hiding this comment

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

johnzhou721 Jun 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 29, 2025

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

johnzhou721 Jun 30, 2025

Choose a reason for hiding this comment

Uh oh!

johnzhou721 commented Jun 30, 2025

Uh oh!

github-actions bot commented Aug 4, 2025

Uh oh!

johnzhou721 commented Aug 6, 2025

Uh oh!

johnzhou721 commented Aug 6, 2025

Uh oh!

johnzhou721 commented Aug 8, 2025

Uh oh!

johnzhou721 commented Aug 17, 2025

Uh oh!

johnzhou721 commented Sep 7, 2025

Uh oh!

freakboy3742 commented Sep 8, 2025

johnzhou721 commented Jun 28, 2025 •

edited

Loading