Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 11 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,6 @@ Instructions:

## Testing

### vcrpy

In addition to [`pytest`](https://docs.pytest.org/), we also use the [`vcrpy`](https://vcrpy.readthedocs.io/) library when writing our tests.

### tox

To run the tests, install the project dependencies in a [virtual environment](https://docs.python.org/3/library/venv.html#module-venv)
Expand Down Expand Up @@ -41,6 +37,17 @@ pip install "<package_name>"
pip freeze > requirements.txt
```

### vcrpy

In addition to [`pytest`](https://docs.pytest.org/), we also use the [`vcrpy`](https://vcrpy.readthedocs.io/) library when writing our tests.

If you need to update or regenerate a cassette for a test, i.e. [`tests/cassettes/test_translate_missing_messages_without_sorting.yml`](https://github.com/hypercision/i18ntools/blob/main/tests/cassettes/test_translate_missing_messages_without_sorting.yml), then:

- delete the cassette yml file
- update the `os.environ["TRANSLATOR_API_SUBSCRIPTION_KEY"]` line in the test so it is set to a real API key (but do not commit this change)
- run the tests with `tox`. This will regenerate the cassette yml file
- revert the `os.environ["TRANSLATOR_API_SUBSCRIPTION_KEY"]` line in the test so it is no longer a real API key

### Editable installation

Alternatively, you can perform an [editable installation](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)
Expand Down
101 changes: 96 additions & 5 deletions src/i18ntools/parse_i18n_file.py
Original file line number Diff line number Diff line change
@@ -1,27 +1,33 @@
#!/usr/bin/env python
"""Parses an i18n Java properties file and returns the data as a dictionary.

The benefit of this method over using configparser is that the whitespace in
If called with remove_backslashes=False, then the whitespace in
multiline values is preserved.

If called with remove_backslashes=True, then configparser is used
and the whitespace and backslashes in multiline values are removed.

Note that this method does not work properly for multiline translations
with an "=" character in them.

See related question: https://stackoverflow.com/questions/76047202
"""

import argparse
import configparser
import tempfile
from pathlib import Path


def parse_i18n_file(file_path):
def parse_i18n_file(file_path, remove_backslashes=False):
"""Parses an i18n Java properties file and returns the data as a dictionary.

Note that this method does not work properly for multiline translations
with an "=" character in them.

Keyword arguments:
file_path -- filepath of the i18n Java properties file to parse
remove_backslashes -- when true, the data returned will not have the
backslashes used in multiline values.
Multiline values will be transformed into single line values.
"""
if not Path(file_path).exists():
raise FileNotFoundError(f"File {file_path} does not exist")
Expand Down Expand Up @@ -60,6 +66,82 @@ def parse_i18n_file(file_path):
f"It has at least one duplicate key: {duplicate_keys}"
)

if remove_backslashes:
# Now that we've ensured the file has no duplicate properties, return
# the data as a dictionary with multiline values transformed into
# single line values.
return parse_i18n_file_without_backslashes(file_path)

return data


def convert_properties_to_ini(input_path, ini_path):
"""Reads a properties file and writes it as an .ini file with a [DEFAULT] section
header to make it compatible with configparser.

Keyword arguments:
input_path -- filepath of the i18n Java properties file to convert
ini_path -- filepath of the output .ini file
"""
with (
open(input_path, "r", encoding="utf-8") as infile,
open(ini_path, "w", encoding="utf-8") as outfile,
):
# Add a dummy section header
outfile.write("[DEFAULT]\n")
outfile.writelines(infile.readlines())


def merge_multiline_string(multiline_string: str) -> str:
"""Takes a multiline string as input, removes the backslashes at the
end of each line, and returns a single line string.

Keyword arguments:
multiline_string -- the input string potentially containing multiple lines
with backslash continuations.
"""
# Split the string into lines and strip any leading/trailing whitespace
# from each line
lines = multiline_string.splitlines()
# Remove the backslash from the end of each line
processed_lines = [line.rstrip("\\").strip() for line in lines]
# Join the lines into a single string, filtering out any empty lines
# to avoid leading/trailing spaces
merged_string = " ".join(line for line in processed_lines if line)
return merged_string


def parse_i18n_file_without_backslashes(file_path):
"""Parses an i18n Java properties file and returns the data as a dictionary.
Multiline values will be transformed into single line values with the
backslashes removed.

Note that this method does not work properly for multiline translations
with an "=" character in them.

Keyword arguments:
file_path -- filepath of the i18n Java properties file to parse
"""
# Use a temporary file that is automatically cleaned up
with tempfile.NamedTemporaryFile(
mode="w", suffix=".ini", encoding="utf-8", delete=True
) as temp_ini:
# Convert the properties file into a temporary .ini file
convert_properties_to_ini(file_path, temp_ini.name)

# Parse the temporary .ini file
# Use RawConfigParser to avoid any interpolation or automatic conversions
config = configparser.RawConfigParser(empty_lines_in_values=False)
# Override the optionxform method to prevent lowercase conversion of the keys
config.optionxform = str # type: ignore

config.read(temp_ini.name, encoding="utf-8")

data = {}
for key, value in config["DEFAULT"].items():
merged_string = merge_multiline_string(value)
data[key] = merged_string

return data


Expand All @@ -80,8 +162,17 @@ def main():
"Can be specified as a relative or absolute file path."
),
)
parser.add_argument(
"-r",
"--remove_backslashes",
action="store_true",
help=(
"the data returned will not have the "
"backslashes used in multiline values."
),
)
args = parser.parse_args()
result = parse_i18n_file(args.input_file)
result = parse_i18n_file(args.input_file, args.remove_backslashes)
for key, value in result.items():
print("key", key)
print("value", value)
Expand Down
19 changes: 17 additions & 2 deletions src/i18ntools/translate.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ def translate_file(
output_file_path=None,
input_lang=default_lang,
translator_region=default_region,
remove_backslashes=False,
):
if not Path(input_file_path).exists():
raise FileNotFoundError(f"File {input_file_path} does not exist")
Expand All @@ -120,7 +121,7 @@ def translate_file(
output_file_path = get_default_filepath(input_file_path, output_lang)

# Parse the input file into a dictionary
input_data = parse_i18n_file(input_file_path)
input_data = parse_i18n_file(input_file_path, remove_backslashes)

# Open the input file in read mode to read its contents
with open(input_file_path, "r", encoding="utf-8") as f:
Expand Down Expand Up @@ -214,9 +215,23 @@ def main():
default=default_region,
help="region of the Azure translator resource. Defaults to eastus2",
)
parser.add_argument(
"-rbs",
"--remove_backslashes",
action="store_true",
help=(
"any backslashes from multiline values in the input file "
"will not be included in the text that gets translated."
),
)
args = parser.parse_args()
translate_file(
args.input_file, args.to, args.output_file, args.from_lang, args.region
args.input_file,
args.to,
args.output_file,
args.from_lang,
args.region,
args.remove_backslashes,
)


Expand Down
15 changes: 13 additions & 2 deletions src/i18ntools/translate_missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ def translate_missing_messages(
output_file_path=None,
input_lang=default_lang,
translator_region=default_region,
remove_backslashes=False,
):
if not Path(input_file_path).exists():
raise FileNotFoundError(f"File {input_file_path} does not exist")
Expand All @@ -46,8 +47,8 @@ def translate_missing_messages(
raise FileNotFoundError(f"File {output_file_path} does not exist")

# Parse the input file and output file into a dictionary
input_data = parse_i18n_file(input_file_path)
output_data = parse_i18n_file(output_file_path)
input_data = parse_i18n_file(input_file_path, remove_backslashes)
output_data = parse_i18n_file(output_file_path, remove_backslashes)

# Find any i18n messages missing from the output file
# and put those keys and values in the payload_data dictionary
Expand Down Expand Up @@ -148,6 +149,15 @@ def main():
default=default_region,
help="region of the Azure translator resource. Defaults to eastus2",
)
parser.add_argument(
"-rbs",
"--remove_backslashes",
action="store_true",
help=(
"any backslashes from multiline values in the input file "
"will not be included in the text that gets translated."
),
)
parser.add_argument(
"-s",
"--sort",
Expand All @@ -166,6 +176,7 @@ def main():
args.output_file,
args.from_lang,
args.region,
args.remove_backslashes,
)


Expand Down
67 changes: 67 additions & 0 deletions tests/cassettes/test_translate_file_without_backslashes.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
interactions:
- request:
body: '[{"text": "Property [{0}] of class [{1}] with value [{2}] is less than
minimum value [{3}]"}, {"text": "I want to see you knocking at the door. I wanna
leave you out there waiting in the downpour. Singing that you\u2019re sorry,
dripping on the hall floor."}, {"text": "The customSubmitTS parameter is missing.
It must be present and of type Date."}, {"text": "{0} session removed."}, {"text":
"The trial period has ended for your account and you can no longer use the application."},
{"text": "Instructor is disabled"}, {"text": " Attendance actions made on this
page will also be made for every session in this group."}, {"text": "Errors:
{0}. \\n\\n Sessions successfully removed: {1}"}]'
headers:
Accept:
- '*/*'
Accept-Encoding:
- gzip, deflate
Connection:
- keep-alive
Content-Length:
- '690'
Content-Type:
- application/json
Ocp-Apim-Subscription-Region:
- eastus2
User-Agent:
- python-requests/2.32.5
method: POST
uri: https://api.cognitive.microsofttranslator.com/translate?api-version=3.0&from=en&to=de
response:
body:
string: "[{\"translations\":[{\"text\":\"Die Eigenschaft [{0}] der Klasse [{1}]
mit Wert [{2}] ist kleiner als der Mindestwert [{3}]\",\"to\":\"de\"}]},{\"translations\":[{\"text\":\"Ich
will dich an der T\xFCr klopfen sehen. Ich will dich drau\xDFen im Wolkenbruch
warten lassen. Singend, dass es dir leid tut, tropfend auf den Flurboden.\",\"to\":\"de\"}]},{\"translations\":[{\"text\":\"Der
customSubmitTS-Parameter fehlt. Es muss anwesend und vom Typ Datum sein.\",\"to\":\"de\"}]},{\"translations\":[{\"text\":\"{0}
Sitzung entfernt.\",\"to\":\"de\"}]},{\"translations\":[{\"text\":\"Die Testphase
f\xFCr dein Konto ist beendet und du kannst die Anwendung nicht mehr nutzen.\",\"to\":\"de\"}]},{\"translations\":[{\"text\":\"Der
Ausbilder ist behindert\",\"to\":\"de\"}]},{\"translations\":[{\"text\":\"F\xFCr jede Sitzung in dieser Gruppe werden auch die auf dieser Seite vorgenommenen
Anwesenheitsma\xDFnahmen vorgenommen.\",\"to\":\"de\"}]},{\"translations\":[{\"text\":\"Fehler:
{0}. \\\\n\\\\n Sitzungen erfolgreich entfernt: {1}\",\"to\":\"de\"}]}]"
headers:
Connection:
- keep-alive
Content-Type:
- application/json; charset=utf-8
Date:
- Wed, 14 Jan 2026 20:16:28 GMT
Strict-Transport-Security:
- max-age=31536000; includeSubDomains
Transfer-Encoding:
- chunked
access-control-expose-headers:
- X-RequestId,X-Metered-Usage,X-MT-System
x-content-type-options:
- nosniff
x-envoy-upstream-service-time:
- '438'
x-metered-usage:
- '571'
x-mt-system:
- Microsoft
x-requestid:
- 5e1ef095-341f-4bca-b80b-b07d6089bf28.EUWE.0114T2016
status:
code: 200
message: OK
version: 1
Loading