Skip to content
Open
Show file tree
Hide file tree
Changes from 36 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
73b5ac8
Add test for current behaviour of msgfmt.py
s-ball Dec 2, 2018
5fb1575
Final fix for bpo-9741, with tests proving it.
s-ball Dec 2, 2018
b1968e9
Update docstrings for the script and make function
s-ball Dec 2, 2018
0bc4ad3
Give make a simpler and more consistent interface.
s-ball Dec 4, 2018
a9e67b4
Add a Misc/NEWS.d entry.
s-ball Dec 5, 2018
6c59d6c
Merge branch 'multi_inputs' of https://github.com/s-ball/cpython into…
s-ball Dec 5, 2018
1ce22c0
Merge branch 'main' into multi_inputs
s-ball Jan 23, 2025
863bd97
Merge branch 'main' into multi_inputs
s-ball Jan 23, 2025
4390ede
Fix an import error in test_i18n.py .
s-ball Jan 23, 2025
008ea27
Fix another import error.
s-ball Jan 23, 2025
8744743
Merge branch 'main' into multi_inputs
s-ball Jan 23, 2025
93a6eb7
Merge branch 'main' into multi_inputs
s-ball Feb 23, 2025
ba26b80
Revert version number to 1.2
s-ball Feb 23, 2025
150cea1
Merge branch 'main' into multi_inputs
s-ball Feb 23, 2025
d9afabb
Merge branch 'main' into multi_inputs
s-ball Feb 24, 2025
9004387
Merge branch 'main' into multi_inputs
s-ball Feb 24, 2025
7556f79
Merge branch 'multi_inputs' of https://github.com/s-ball/cpython into…
s-ball Feb 27, 2025
80947d1
Merge branch 'main' into multi_inputs
s-ball Feb 27, 2025
12acb83
Merge branch 'main' into multi_inputs
s-ball Feb 27, 2025
1ecc1f3
Fix a merge error
s-ball Feb 27, 2025
e59ba68
Merge branch 'multi_inputs' of https://github.com/s-ball/cpython into…
s-ball Feb 27, 2025
4ffd20a
Merge branch 'main' into multi_inputs
s-ball Feb 27, 2025
7505f2b
Merge branch 'main' into multi_inputs
s-ball Feb 28, 2025
2c27120
Merge branch 'main' into multi_inputs
s-ball Feb 28, 2025
1f4e5ac
fix details
merwok Feb 28, 2025
4170796
Merge branch 'main' into multi_inputs
s-ball Mar 1, 2025
46c08c5
Move tests for the gh-79516 issue to test_msgfmt.py
s-ball Mar 1, 2025
24d89a6
Cosmetic improvements after review.
s-ball Mar 1, 2025
17b4e05
Fix an import error
s-ball Mar 1, 2025
bfc8a44
Merge branch 'main' into multi_inputs
s-ball Mar 1, 2025
106dd40
Apply suggestions from code review
s-ball Mar 2, 2025
9cb9395
Cosmetic improvements after review.
s-ball Mar 2, 2025
08bc8d7
In test_msgfmt move data files to the data folder.
s-ball Mar 2, 2025
9d992cd
Remove duplicate tests from test_msgfmt.
s-ball Mar 2, 2025
31fd434
Merge branch 'main' into multi_inputs
s-ball Mar 2, 2025
916aec7
Merge branch 'main' into multi_inputs
s-ball Mar 2, 2025
51fcf09
Rename data files for test_msgfmt.Test_multi_input
s-ball Mar 3, 2025
d51ad50
Apply suggestions from code review
s-ball Mar 3, 2025
677f720
Cosmetic improvements after review.
s-ball Mar 3, 2025
b4ea80a
compile_messages now accepts several input files
s-ball Mar 4, 2025
3120add
whitespace nit
merwok Mar 4, 2025
d642923
Merge branch 'main' into multi_inputs
s-ball Mar 4, 2025
9d91f12
Make explicit how mo files can be re-generated.
s-ball Mar 4, 2025
4d83cb7
Merge branch 'multi_inputs' of https://github.com/s-ball/cpython into…
s-ball Mar 4, 2025
421272b
Apply suggestions from code review
s-ball Mar 4, 2025
09b97d9
Update Lib/test/test_tools/test_msgfmt.py
s-ball Mar 4, 2025
12cae51
Merge branch 'main' into multi_inputs
s-ball Mar 5, 2025
3760851
Merge branch 'main' into multi_inputs
s-ball Mar 15, 2025
d45039c
Generate json files for MultiInputTest data files.
s-ball Mar 16, 2025
dde5ef1
Apply suggestions from code review
s-ball Mar 16, 2025
bb6e0c5
Cosmetic improvements (long lines...)
s-ball Mar 16, 2025
797990a
Merge branch 'multi_inputs' of https://github.com/s-ball/cpython into…
s-ball Mar 16, 2025
c95af16
Merge branch 'main' into multi_inputs
s-ball Mar 17, 2025
905ec70
Revert an unwanted change.
s-ball Mar 17, 2025
b7a7c48
Simplifies the general logic of msgfmt.py
s-ball Mar 18, 2025
1b3b73e
Update Tools/i18n/msgfmt.py
s-ball Mar 18, 2025
1db2c66
Changes per review.
s-ball Mar 18, 2025
3a6e1ef
Merge branch 'multi_inputs' of https://github.com/s-ball/cpython into…
s-ball Mar 18, 2025
743bdc5
Removes the now unused get_names function.
s-ball Mar 19, 2025
50e7145
Tests for extension corner cases for msgfmt.py
s-ball Mar 19, 2025
caa8955
Merge branch 'main' into multi_inputs
s-ball Apr 4, 2025
a4f1769
Fix a parameter inversion in test_msgfmt.py
s-ball Apr 4, 2025
11f6e69
Merge branch 'main' into multi_inputs
s-ball Apr 4, 2025
0ed3107
Merge branch 'main' into multi_inputs
s-ball Apr 8, 2025
d1e0a26
Merge branch 'main' into multi_inputs
s-ball Apr 8, 2025
213afcb
Merge branch 'main' into multi_inputs
s-ball May 20, 2025
4d54e50
Merge main into multi_inputs
s-ball May 22, 2025
ab97edd
Merge branch 'main' into multi_inputs
s-ball May 22, 2025
9fd1d57
Merge branch 'main' into multi_inputs
s-ball Oct 4, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Lib/test/test_tools/msgfmt_data/.gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
file1_fr.po eol=crlf
file2_fr.po eol=lf
Copy link
Member

@merwok merwok Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a note somewhere (in tests or in a readme file here) explaining how to recreate the mo files?

And what do you think of naming the files file1_fr_crlf.po and file1_fr_lf.po?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mo files are simple recreated by the update_catalog_snapshots function triggered by passing the argument --snapshot-update to the test. Exactly the way the other .mo files of the msgfmt_data are re-created. Do you really think it deserves a special message?

Anyway I agree with you for the other point the file names should make the eol mode explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I hadn’t seen the function / argument in the test!
Good that it exists. But maybe this is making my point? someone else looking at the test data files (python dev or redistributor – they care about source files) may not find the answer quickly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding one comment in the test_msgfmt.py file like # regenerate files in Lib/test/test_tools/msgfmt_data for the benefit of people grepping would make me satisfied!

Binary file added Lib/test/test_tools/msgfmt_data/file12_fr.mo
Binary file not shown.
Binary file added Lib/test/test_tools/msgfmt_data/file1_fr.mo
Binary file not shown.
29 changes: 29 additions & 0 deletions Lib/test/test_tools/msgfmt_data/file1_fr.po
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# French translations for python package.
# Copyright (C) 2018 THE python\'S COPYRIGHT HOLDER
# This file is distributed under the same license as the python package.
# s-ball <[email protected]>, 2018.
#
msgid ""
msgstr ""
"Project-Id-Version: python 3.8\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-11-30 23:46+0100\n"
"PO-Revision-Date: 2018-11-30 23:47+0100\n"
"Last-Translator: s-ball <[email protected]>\n"
"Language-Team: French\n"
"Language: fr\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n > 1);\n"

#: file1.py:6
msgid "Hello!"
msgstr "Bonjour !"

#: file1.py:7
#, python-brace-format
msgid "{n} horse"
msgid_plural "{n} horses"
msgstr[0] "{n} cheval"
msgstr[1] "{n} chevaux"
Binary file added Lib/test/test_tools/msgfmt_data/file2_fr.mo
Binary file not shown.
26 changes: 26 additions & 0 deletions Lib/test/test_tools/msgfmt_data/file2_fr.po
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# French translations for python package.
# Copyright (C) 2018 THE python'S COPYRIGHT HOLDER
# This file is distributed under the same license as the python package.
# s-ball <[email protected]>, 2018.
#
msgid ""
msgstr ""
"Project-Id-Version: python 3.8\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2018-11-30 23:57+0100\n"
"PO-Revision-Date: 2018-11-30 23:57+0100\n"
"Last-Translator: s-ball <[email protected]>\n"
"Language-Team: French\n"
"Language: fr\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=2; plural=(n > 1);\n"

#: file2.py:6
msgid "It's over."
msgstr "C'est terminé."

#: file2.py:7
msgid "Bye..."
msgstr "Au revoir ..."
51 changes: 50 additions & 1 deletion Lib/test/test_tools/test_msgfmt.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
"""Tests for the Tools/i18n/msgfmt.py tool."""

import filecmp
import os
import shutil
import sys
import unittest
from gettext import GNUTranslations
Expand Down Expand Up @@ -91,6 +93,7 @@ def test_generic_syntax_error(self):
err = res.err.decode('utf-8')
self.assertIn('Syntax error', err)


class CLITest(unittest.TestCase):

def test_help(self):
Expand Down Expand Up @@ -121,6 +124,52 @@ def test_nonexistent_file(self):
assert_python_failure(msgfmt, 'nonexistent.po')


class Test_multi_input(unittest.TestCase):
"""Tests for the issue https://github.com/python/cpython/issues/79516
msgfmt.py shall accept multiple input files
"""

def test_no_outputfile(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test looks redundant. test_both_without_outputfile supersedes it, and this is MultiInputTest.

"""Test script without -o option - 1 single file"""
with temp_cwd(None):
shutil.copy(data_dir / 'file2_fr.po', '.')
assert_python_ok(msgfmt, 'file2_fr.po')
self.assertTrue(
filecmp.cmp(data_dir / 'file2_fr.mo', 'file2_fr.mo'),
'Wrong compiled file2_fr.mo')

def test_both_with_outputfile(self):
"""Test script with -o option and 2 input files

The current behaviour is to merge entries having distinct ids
and keep last one if the same id occurs in multiple files.

Here the first file has Windows endings (cflr) while second has
Unix endings (lf)
"""
with temp_cwd(None):
assert_python_ok(msgfmt, '-o', 'file12.mo',
data_dir / 'file1_fr.po',
data_dir / 'file2_fr.po')
self.assertTrue(
filecmp.cmp(data_dir / 'file12_fr.mo', 'file12.mo'),
'Wrong compiled file12.mo')

def test_both_without_outputfile(self):
"""Test script without -o option and 2 input files"""

with temp_cwd(None):
shutil.copy(data_dir /'file1_fr.po', '.')
shutil.copy(data_dir /'file2_fr.po', '.')
assert_python_ok(msgfmt, 'file1_fr.po', 'file2_fr.po')
self.assertTrue(
filecmp.cmp(data_dir / 'file1_fr.mo', 'file1_fr.mo'),
'Wrong compiled file1_fr.mo')
self.assertTrue(
filecmp.cmp(data_dir / 'file2_fr.mo', 'file2_fr.mo'),
'Wrong compiled file2_fr.mo')


def update_catalog_snapshots():
for po_file in data_dir.glob('*.po'):
mo_file = po_file.with_suffix('.mo')
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
:program:`msgfmt.py` is now able to merge more than one single po file into a compiled mo
file. When an entry exists in more than on input file, the last file wins.
88 changes: 60 additions & 28 deletions Tools/i18n/msgfmt.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@
This program converts a textual Uniforum-style message catalog (.po file) into
a binary GNU catalog (.mo file). This is essentially the same function as the
GNU msgfmt program, however, it is a simpler implementation. Currently it
does not handle plural forms but it does handle message contexts.
handles plural forms and message contexts, but does not generate hash table.

Usage: msgfmt.py [OPTIONS] filename.po
Usage: msgfmt.py [OPTIONS] filename.po [filename.po ...]

Options:
-o file
Expand All @@ -23,6 +23,14 @@
-V
--version
Display version information and exit.

If more than one input file is given, and if an output file is passed with
-o option, then all the input files are merged. If keys are repeated (common
for "" key for the header) the one from last file is used.

If more than one input file is given, and no -o option is present, then
every input file is compiled in its corresponding mo file (same name with mo
replacing po)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this should be under the -o section otherwise it may be missed as it looks unrelated.

"""

import os
Expand All @@ -47,29 +55,27 @@ def usage(code, msg=''):
sys.exit(code)


def add(ctxt, id, str, fuzzy):
def add(ctxt, id, str, fuzzy, messages):
"Add a non-fuzzy translation to the dictionary."
global MESSAGES
if not fuzzy and str:
if ctxt is None:
MESSAGES[id] = str
messages[id] = str
else:
MESSAGES[b"%b\x04%b" % (ctxt, id)] = str
messages[b"%b\x04%b" % (ctxt, id)] = str


def generate():
def generate(messages):
"Return the generated output."
global MESSAGES
# the keys are sorted in the .mo file
keys = sorted(MESSAGES.keys())
keys = sorted(messages.keys())
offsets = []
ids = strs = b''
for id in keys:
# For each string, we need size and file offset. Each string is NUL
# terminated; the NUL does not count into the size.
offsets.append((len(ids), len(id), len(strs), len(MESSAGES[id])))
offsets.append((len(ids), len(id), len(strs), len(messages[id])))
ids += id + b'\0'
strs += MESSAGES[id] + b'\0'
strs += messages[id] + b'\0'
output = ''
# The header is 7 32-bit unsigned integers. We don't use hash tables, so
# the keys start right after the index tables.
Expand Down Expand Up @@ -98,18 +104,44 @@ def generate():
return output


def make(filename, outfile):
ID = 1
STR = 2
CTXT = 3
def make(filenames, outfile):
""" Compiles one or several po files(s).

filenames is a string or an iterable of strings representing input file(s)
outfile is a string for the name of an input file or None.

If it is not None, the output file receives a merge of the input files.
If it is None, then filenames must be a string and the name of the output
file is obtained by replacing the po extension with mo.
Both ways are for compatibility reasons with previous behaviour.
"""
messages = {}
if isinstance(filenames, str):
infile, outfile = get_names(filenames, outfile)
process(infile, messages)
elif outfile is None:
raise TypeError("outfile cannot be None with more than one infile")
else:
for filename in filenames:
infile, _ = get_names(filename, outfile)
process(infile, messages)
output = generate(messages)
writefile(outfile, output)

def get_names(filename, outfile):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is no longer used.

# Compute .mo name from .po name and arguments
if filename.endswith('.po'):
infile = filename
else:
infile = filename + '.po'
if outfile is None:
outfile = os.path.splitext(infile)[0] + '.mo'
return infile, outfile

def process(infile, messages):
ID = 1
STR = 2
CTXT = 3

try:
with open(infile, 'rb') as f:
Expand Down Expand Up @@ -140,7 +172,7 @@ def make(filename, outfile):
lno += 1
# If we get a comment line after a msgstr, this is a new entry
if l[0] == '#' and section == STR:
add(msgctxt, msgid, msgstr, fuzzy)
add(msgctxt, msgid, msgstr, fuzzy, messages)
section = msgctxt = None
fuzzy = 0
# Record a fuzzy mark
Expand All @@ -152,13 +184,13 @@ def make(filename, outfile):
# Now we are in a msgid or msgctxt section, output previous section
if l.startswith('msgctxt'):
if section == STR:
add(msgctxt, msgid, msgstr, fuzzy)
add(msgctxt, msgid, msgstr, fuzzy, messages)
section = CTXT
l = l[7:]
msgctxt = b''
elif l.startswith('msgid') and not l.startswith('msgid_plural'):
if section == STR:
add(msgctxt, msgid, msgstr, fuzzy)
add(msgctxt, msgid, msgstr, fuzzy, messages)
if not msgid:
# See whether there is an encoding declaration
p = HeaderParser()
Expand Down Expand Up @@ -213,21 +245,19 @@ def make(filename, outfile):
sys.exit(1)
# Add last entry
if section == STR:
add(msgctxt, msgid, msgstr, fuzzy)

# Compute output
output = generate()
add(msgctxt, msgid, msgstr, fuzzy, messages)

def writefile(outfile, output):
try:
with open(outfile,"wb") as f:
f.write(output)
except IOError as msg:
print(msg, file=sys.stderr)


def main():
def main(argv):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def main(argv):
def main():

Why change this? sys.argv is more common in stdlib/tools from my experience.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that passing argv that way allowed simpler tests for the decoding of the command line. But as you are more experimented in CPython than I am, I shall follow your advice :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That’s right but only for tests that import and call a main function, not when calling as a script 🙂

try:
opts, args = getopt.getopt(sys.argv[1:], 'hVo:',
opts, args = getopt.getopt(argv, 'hVo:',
['help', 'version', 'output-file='])
except getopt.error as msg:
usage(1, msg)
Expand All @@ -247,10 +277,12 @@ def main():
print('No input file given', file=sys.stderr)
print("Try `msgfmt --help' for more information.", file=sys.stderr)
return

for filename in args:
make(filename, outfile)
if outfile is None:
for filename in args:
make(filename, None)
else:
make(args, outfile)


if __name__ == '__main__':
main()
main(sys.argv[1:])
Loading