Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
fbb64f7
Ensure everything is sorted whenever a command that generates po / po…
johnzhou721 Jun 28, 2025
51807ff
Apply suggestions from code review
johnzhou721 Jun 29, 2025
8769755
Update CHANGELOG.md
johnzhou721 Jun 29, 2025
2f74b35
use xgettext
johnzhou721 Jun 29, 2025
2089826
Update CHANGELOG.md
johnzhou721 Jun 29, 2025
8d88d70
don't change the date
johnzhou721 Jun 30, 2025
92431fd
Update lektor_i18n.py
johnzhou721 Jun 30, 2025
96a49c0
add version + package name
johnzhou721 Jun 30, 2025
34c9032
fix
johnzhou721 Jun 30, 2025
3e846be
changelog
johnzhou721 Jun 30, 2025
3c6f297
Better docs
johnzhou721 Jun 30, 2025
d71e9dc
Update CHANGELOG.md
johnzhou721 Jun 30, 2025
29acb7b
a fixup that i forgot to commit
johnzhou721 Jul 2, 2025
b40cb57
pgettext, npgettext
johnzhou721 Jul 2, 2025
e3e7162
simplify code, resolve bug w/ non-english content, fill in english
johnzhou721 Jul 2, 2025
9b39a94
depend on polib
johnzhou721 Jul 2, 2025
78f7e1e
fix path
johnzhou721 Jul 2, 2025
7c563f2
regression fix
johnzhou721 Jul 2, 2025
a04fb16
Update CHANGELOG.md
johnzhou721 Jul 2, 2025
a6be921
Update lektor_i18n.py
johnzhou721 Jul 3, 2025
a9e841b
documenation
johnzhou721 Jul 3, 2025
bd9bc39
Update CHANGELOG.md
johnzhou721 Jul 3, 2025
aba93aa
Update README.md
johnzhou721 Jul 3, 2025
c0806b9
Update lektor_i18n.py
johnzhou721 Jul 3, 2025
65717bb
Update README.md
johnzhou721 Jul 3, 2025
b63def8
Improve docs
johnzhou721 Jul 3, 2025
9fb78d0
Apply suggestions from code review
johnzhou721 Jul 4, 2025
845cf14
fixup
johnzhou721 Jul 4, 2025
53e7737
Apply suggestions from code review
johnzhou721 Jul 22, 2025
34a5f26
Fix issue with extraction
johnzhou721 Jul 22, 2025
d44f885
changelog
johnzhou721 Jul 22, 2025
5339dc2
Update lektor_i18n.py
johnzhou721 Jul 22, 2025
fdfa3d7
Update lektor_i18n.py
johnzhou721 Jul 22, 2025
0638bed
fuzzy handle
johnzhou721 Jul 22, 2025
f27756f
Update lektor_i18n.py
johnzhou721 Jul 22, 2025
7f56076
simp logic
johnzhou721 Jul 23, 2025
066b6e5
Update README.md
johnzhou721 Aug 3, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# Changelog

## 0.5.5

* Ensure that POT content is now sorted by path when merging POTs from multiple sources (i.e., templates and content).
* `xgettext` is used to merge POT files instead of `msgcat`, providing a better header and merging of same strings from different sources.
* The initially generated PO files will now have a header compatible with GNOME's Translation Editor, since they will have a non-placeholder `Project-Id-Version`. Existing users hitting this problem will need to fill in the `Project-Id-Version` header manually.
* Translations in templates now provide `pgettext` and `npgettext` methods.
* The bug where deletion of strings from the English PO file with non-English content is resolved.
* When updating translated PO files, the content-language PO file strings are automatically filled with the message IDs. Side effects with plurals are documented.
* A bug with discriminating between Markdown headings and Lektor block seperations has been fixed; the older heuristic with checking for colons in the previous line is replaced with simply checking for 3 dashes after stripped.

## 0.5.4

* POT content is now sorted by path.
Expand Down
26 changes: 20 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,20 @@ A `babel.cfg` must be created in your project root with the following content:
[jinja2: **/templates/**.html]
encoding = utf-8

If you plan to extract from your templates, and the templates use functionality provided in Jinja2 extensions, specify
something like the following on an additional line in the config file. For example, if you use ``do`` statements, the
configuration file shall be:
```
[jinja2: **/templates/**.html]
encoding = utf-8
extensions = jinja2.ext.do
```

#### Whitespace Trimming during Extraction

If you're using `{% trans %}` blocks in your template files, the `trimmed` policy is enabled for Jinja's i18n plugin, so all whitespaces would be trimmed at the beginning and end of those
blocks. However, in order for PyBabel's extraction to also work properly this way, one shall add `trimmed = True` to the jinja2 section of the `babel.cfg` configuration file.

### Translatable fields

In order for a field to be marked as translatable, an option has to be set in the field definition. Both blocks and flowblocks fields are translatable.
Expand Down Expand Up @@ -96,12 +110,6 @@ For example:

As with the previous example, `body` and `title` field content will be translated. However, in this example, `image` and `image_position` will not.

### Non-english content

Due to a limitation of `msginit`, it is difficult to translate a site when the primary language is set to anything but English.

If your default content language is not English, you will have to edit the first `contents-en.po` file and remove the translations.

## Installation

### Prerequisites
Expand Down Expand Up @@ -166,6 +174,12 @@ All translation files (`contents-*.po`) are then compiled and merged with the or

You must run `lektor build` once to generate the list of `contents-xx.po` files. After that, once a translation change is applied to a `contents-xx.po` file, the site must be built again for the changes to be applied to the associated `contents-xx.lr` file. This results in the changes being rendered on the site.

### Plural Forms

If you're using `{% pluralize %}` or `ngettext` or the like in your Jinja templates, make sure you fill in the plural forms in the PO headers manually, then make sure you have the correct
number of `msgstr[x]`s. The plugin automatically fills `msgstr`s into the PO file of your source lanaguage (which msginit only does for English), but since it doesn't parse plural forms,
any non-English PO file will not have its plural message strings filled in. Those must be done manually in the source-language PO file if simply singular and plural strings does not suffice .

### Project file

You must modify the `.lektorproject` file to include the expected languages.
Expand Down
130 changes: 102 additions & 28 deletions lektor_i18n.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,11 @@
import re
import tempfile
import time
from os.path import exists, join, relpath
from os.path import exists, join, relpath, basename
from pprint import PrettyPrinter
from textwrap import dedent
from urllib.parse import urljoin
import polib

from lektor.context import get_ctx
from lektor.db import Page
Expand Down Expand Up @@ -55,6 +56,14 @@ def ngettext(self, *x):
self.init_translator()
return self.translator.ngettext(*x)

def pgettext(self, *x):
self.init_translator()
return self.translator.pgettext(*x)

def npgettext(self, *x):
self.init_translator()
return self.translator.npgettext(*x)


class Translations:
"""Memory of translations"""
Expand Down Expand Up @@ -136,15 +145,29 @@ def write_pot(self, pot_filename, language):
f.write(self.as_pot(language, header))

@staticmethod
def merge_pot(from_filenames, to_filename):
msgcat = locate_executable("msgcat")
if msgcat is None:
msgcat = "/usr/bin/msgcat"
cmdline = [msgcat, "--use-first"]
def merge_pot(from_filenames, to_filename, projectname):
# Get the POT Creation Date of the first file and inject it later.
pattern = r'("POT-Creation-Date:\s*)(\d{4}-\d{2}-\d{2}.*)(\\n")'
with open(from_filenames[0], 'r', encoding='utf-8') as f:
original_file1 = f.read()
date1 = re.search(pattern, original_file1).group(2)

xgettext = locate_executable("xgettext")
if xgettext is None:
xgettext = "/usr/bin/xgettext"
cmdline = [xgettext, "--sort-by-file", "--package-name=" + projectname, "--package-version=1.0"]
cmdline.extend(from_filenames)
cmdline.extend(("-o", to_filename))
reporter.report_debug_info("msgcat cmd line", cmdline)
reporter.report_debug_info("xgettext cmd line", cmdline)
portable_popen(cmdline).wait()

# Inject the creation date back into the produced file
with open(to_filename, 'r', encoding='utf-8') as f:
finishedfile_orig = f.read()
replacement = r'\g<1>' + date1 + r'\g<3>'
finishedcontent = re.sub(pattern, replacement, finishedfile_orig, count=1)
with open(to_filename, 'w', encoding='utf-8') as f:
f.write(finishedcontent)

@staticmethod
def parse_templates(to_filename):
Expand All @@ -158,6 +181,51 @@ def parse_templates(to_filename):

translations = Translations() # let's have a singleton

def clear_entry(entry):
entry.msgstr = ''
if entry.msgstr_plural:
for idx in entry.msgstr_plural:
entry.msgstr_plural[idx] = ''
if 'fuzzy' in entry.flags:
entry.flags.remove('fuzzy')

def clear_translations(po_filepath, save_path=None):
po = polib.pofile(po_filepath)
for entry in po:
clear_entry(entry)

po.save(save_path or po_filepath)

def fill_translations(po_filepath, save_path=None):
po = polib.pofile(po_filepath)

for entry in po:
# If we fuzzy-matched, we'd need to properly re-fill
# the entries so we clear. Particularly important is
# that when you add the plural form of a string...
# msgmerge seem to fill the plural field with the
# singular one, and mark it fuzzy... incorrect within
# source language.
if entry.fuzzy:
clear_entry(entry)

# Actually fill in the entries with msgid within the
# source language.
if not entry.msgstr:
entry.msgstr = entry.msgid

need_plural_fill = False
if entry.msgstr_plural:
for idx in entry.msgstr_plural:
if not entry.msgstr_plural[idx]:
need_plural_fill = True
if need_plural_fill and '+en.po' in basename(po_filepath):
for idx in entry.msgstr_plural:
if not entry.msgstr_plural[idx]:
entry.msgstr_plural[idx] = entry.msgid if int(idx) == 0 else entry.msgid_plural

po.save(save_path or po_filepath)


class POFile:
FILENAME_PATTERN = "contents+{}.po"
Expand Down Expand Up @@ -186,6 +254,8 @@ def _msg_init(self):
]
reporter.report_debug_info("msginit cmd line", cmdline)
portable_popen(cmdline, cwd=self.i18npath).wait()
clear_translations(os.path.join(self.i18npath, self.FILENAME_PATTERN.format(self.language)))
self.reformat()

def _msg_merge(self):
"""Merges an existing <language>.po file with .pot file"""
Expand All @@ -201,6 +271,11 @@ def _msg_merge(self):
]
reporter.report_debug_info("msgmerge cmd line", cmdline)
portable_popen(cmdline, cwd=self.i18npath).wait()

def reformat(self):
msgcat = locate_executable("msgcat")
cmdline = [msgcat, self.FILENAME_PATTERN.format(self.language), "-o", self.FILENAME_PATTERN.format(self.language)]
portable_popen(cmdline, cwd=self.i18npath).wait()

def _prepare_locale_dir(self):
"""Prepares the i18n/<language>/LC_MESSAGES/ to store the .mo file;
Expand Down Expand Up @@ -238,20 +313,6 @@ def compile(self):
self._msg_fmt(locale_dirname)


def line_starts_new_block(line, prev_line):
"""
Detect a new block in a Lektor document. Blocks are delimited by a line
containing 3 or more dashes. This actually matches the definition of a
markdown level 2 heading, so this function returns False if no colon was
found in the line before, e.g. it isn't a new block with a key: value pair
before.
"""
if not prev_line or ":" not in prev_line:
return False # could be a Markdown heading
line = line.strip()
return line == "-" * len(line) and len(line) >= 3


def split_paragraphs(document):
if isinstance(document, (list, tuple)):
document = "".join(document) # list of lines
Expand Down Expand Up @@ -394,19 +455,30 @@ def __parse_source_structure(lines):
blocks = []
count_lines_block = 0 # counting the number of lines of the current block
is_content = False
prev_line = None
flow_level = 3
for line in lines:
stripped_line = line.strip()
if not stripped_line: # empty line
blocks.append(("raw", "\n"))
continue
# line like "---*" or a new block tag
if line_starts_new_block(stripped_line, prev_line) or block2re.search(
stripped_line
):
# New block tag.
# The following two ifs will determine the start of a new "block" of content that we can further
# parse. Special care is needed, as the amount of allowed -s dictate whether it's a Markdown heading
# or a flow / field seperation.
if block2re.search(stripped_line):
count_lines_block = 0
is_content = False
blocks.append(("raw", line))
# Count the amount of preceding #s, as that determines the amount of -s allowed
# before it gets counted as a Markdown heading.
flow_level = len(stripped_line) - len(stripped_line.lstrip('#'))
# You're allowed to have between 3 and your maximum allowed number of -s.
elif stripped_line == '-' * len(stripped_line) and 3 <= len(stripped_line) <= flow_level:
count_lines_block = 0
is_content = False
blocks.append(("raw", line))
# If there's less -s than the flow level, back down on the amount of allowed -s.
flow_level = len(stripped_line)
else:
count_lines_block += 1
match = command_re.search(stripped_line)
Expand All @@ -423,7 +495,6 @@ def __parse_source_structure(lines):
is_content = True
if is_content:
blocks.append(("translatable", line))
prev_line = line
# join neighbour blocks of same type
newblocks = []
for type, data in blocks:
Expand Down Expand Up @@ -558,7 +629,7 @@ def on_after_build_all(self, builder, **extra):
reporter.report_generic(f"{relpath(pots[0], builder.env.root_path)} generated")
pots = [p for p in pots if os.path.exists(p)] # only keep existing ones
if len(pots) > 1:
translations.merge_pot(pots, contents_pot_filename)
translations.merge_pot(pots, contents_pot_filename, self.env.project.name)
reporter.report_generic(
f"Merged POT files "
f"{', '.join(relpath(p, builder.env.root_path) for p in pots)}"
Expand All @@ -567,3 +638,6 @@ def on_after_build_all(self, builder, **extra):
for language in self.translations_languages:
po_file = POFile(language, self.i18npath)
po_file.generate()
if language == self.content_language:
fill_translations(os.path.join(po_file.i18npath, po_file.FILENAME_PATTERN.format(po_file.language)))
po_file.reformat()
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@ authors = [
maintainers = [
{name="BeeWare Team", email="[email protected]"},
]
dependencies = [
"polib",
]

[project.optional-dependencies]
# Extras used by developers *of* briefcase are pinned to specific versions to
Expand Down