Skip to content

Commit 364d42c

Browse files
Merge pull request #56 from pescheckit/feature_fixed-all-bugs-added-more-skipping-and-config
Major Update: Configuration System, Gitignore Support, Auto Language Detection + Code Quality
2 parents ab93eff + b68f664 commit 364d42c

37 files changed

+5271
-384
lines changed

README.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -73,11 +73,14 @@ export AZURE_OPENAI_API_VERSION='2024-02-01'
7373

7474
### Basic Translation
7575
```bash
76-
# Translate to German
76+
# Translate to German (default: shows warnings/errors only)
7777
gpt-po-translator --folder ./locales --lang de
7878

79-
# Multiple languages
80-
gpt-po-translator --folder ./locales --lang de,fr,es --bulk
79+
# With progress information
80+
gpt-po-translator --folder ./locales --lang de -v
81+
82+
# Multiple languages with verbose output
83+
gpt-po-translator --folder ./locales --lang de,fr,es -v --bulk
8184
```
8285

8386
### Different AI Providers
@@ -131,15 +134,18 @@ This helps you:
131134
| Option | Description |
132135
|--------|-------------|
133136
| `--folder` | Path to .po files |
134-
| `--lang` | Target languages (e.g., `de,fr,es`) |
137+
| `--lang` | Target languages (e.g., `de,fr,es`, `fr_CA`, `pt_BR`) |
135138
| `--provider` | AI provider: `openai`, `azure_openai`, `anthropic`, `deepseek` |
136-
| `--bulk` | Enable batch translation (recommended) |
139+
| `--bulk` | Enable batch translation (recommended for large files) |
137140
| `--bulksize` | Entries per batch (default: 50) |
138141
| `--model` | Specific model to use |
139142
| `--list-models` | Show available models |
140143
| `--fix-fuzzy` | Translate fuzzy entries |
141144
| `--folder-language` | Auto-detect languages from folders |
142145
| `--no-ai-comment` | Disable AI tagging |
146+
| `-v, --verbose` | Show progress information (use `-vv` for debug) |
147+
| `-q, --quiet` | Only show errors |
148+
| `--version` | Show version and exit |
143149

144150
## 🛠️ Development
145151

docker-entrypoint.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,12 @@ if [ $# -eq 0 ]; then
1414
echo " Format: -v /host/path:/container/path"
1515
echo " The '/container/path' is what you'll use with the --folder parameter."
1616
echo
17+
echo "Configuration:"
18+
echo " The tool automatically loads configuration from pyproject.toml files found in:"
19+
echo " • Mounted volume directories"
20+
echo " • The target translation folder and its parent directories"
21+
echo " See examples/docker-pyproject.toml for Docker-optimized configuration."
22+
echo
1723
echo "Examples:"
1824
echo " # Translate files in the current directory to German"
1925
echo " docker run -v $(pwd):/data -e OPENAI_API_KEY=<your_key> ghcr.io/pescheckit/python-gpt-po --folder /data --lang de"

docs/usage.md

Lines changed: 195 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,7 @@ Below is a detailed explanation of all command-line arguments:
104104
*Behind the scenes:* The tool recursively scans this folder and processes every file ending with `.po`.
105105

106106
- **`--lang <language_codes>`**
107-
*Description:* A comma-separated list of ISO 639-1 language codes (e.g., `de,fr`).
107+
*Description:* A comma-separated list of ISO 639-1 language codes (e.g., `de,fr`) or locale codes (e.g., `fr_CA,pt_BR`).
108108
*Behind the scenes:* The tool filters PO files by comparing these codes with the file metadata and folder names (if `--folder-language` is enabled).
109109

110110
### Optional Options
@@ -173,13 +173,206 @@ Below is a detailed explanation of all command-line arguments:
173173

174174
- **`--folder-language`**
175175
*Description:* Enables inferring the target language from the folder structure.
176-
*Behind the scenes:* The tool inspects the path components (directory names) of each PO file and matches them against the provided language codes.
176+
*Behind the scenes:* The tool inspects the path components (directory names) of each PO file and matches them against the provided language codes. Supports locale codes (e.g., folder `fr_CA` matches `-l fr_CA` for Canadian French, or falls back to `-l fr` for standard French).
177177

178178
- **`--no-ai-comment`**
179179
*Description:* Disables the automatic addition of 'AI-generated' comments to translated entries.
180180
*Behind the scenes:* **By default (without this flag), every translation made by the AI is marked with a `#. AI-generated` comment in the PO file.** This flag prevents that marking, making AI translations indistinguishable from human translations in the file.
181181
*Note:* AI tagging is enabled by default for tracking, compliance, and quality assurance purposes.
182182

183+
- **`-v, --verbose`**
184+
*Description:* Increases output verbosity. Can be used multiple times for more detail.
185+
*Behind the scenes:* Controls the logging level:
186+
- No flag: Shows only warnings and errors (default)
187+
- `-v`: Shows info messages including progress tracking
188+
- `-vv`: Shows debug messages for troubleshooting
189+
*Note:* Progress tracking shows translation progress for both single and bulk modes.
190+
191+
- **`-q, --quiet`**
192+
*Description:* Reduces output to only show errors.
193+
*Behind the scenes:* Sets logging level to ERROR, suppressing all info and warning messages.
194+
195+
- **`--version`**
196+
*Description:* Shows the program version and exits.
197+
*Behind the scenes:* Displays the current version from package metadata.
198+
199+
---
200+
201+
## Locale and Regional Variant Handling
202+
203+
### Overview
204+
205+
The tool now fully supports locale codes (e.g., `fr_CA`, `pt_BR`, `en_US`) in addition to simple language codes. This allows you to translate content for specific regional variants of a language.
206+
207+
### How Locale Matching Works
208+
209+
The tool uses a smart matching system that:
210+
1. **First tries exact match**: `fr_CA` matches `fr_CA`
211+
2. **Then tries format conversion**: `fr_CA` matches `fr-CA` (underscore ↔ hyphen)
212+
3. **Finally tries base language fallback**: `fr_CA` matches `fr`
213+
214+
### Language Detection Priority
215+
216+
When a PO file is processed, the language is determined in this order:
217+
1. **File metadata**: The `Language` field in the PO file header
218+
2. **Folder structure** (with `--folder-language`): Directory names in the file path
219+
220+
### Examples
221+
222+
**Working with Canadian French:**
223+
```bash
224+
# Translate specifically to Canadian French
225+
gpt-po-translator --folder ./locales --lang fr_CA
226+
227+
# With detailed language name for better AI context
228+
gpt-po-translator --folder ./locales --lang fr_CA --detail-lang "Canadian French"
229+
230+
# Process files in fr_CA folders
231+
gpt-po-translator --folder ./locales --lang fr_CA --folder-language
232+
```
233+
234+
**Working with Brazilian Portuguese:**
235+
```bash
236+
# Translate to Brazilian Portuguese (different vocabulary from European Portuguese)
237+
gpt-po-translator --folder ./locales --lang pt_BR --detail-lang "Brazilian Portuguese"
238+
239+
# Fall back to European Portuguese
240+
gpt-po-translator --folder ./locales --lang pt
241+
```
242+
243+
### What the AI Sees
244+
245+
The language code or detail name is passed directly to the AI in the translation prompt:
246+
247+
| Command | AI Sees in Prompt |
248+
|---------|-------------------|
249+
| `-l fr` | "Translate to fr" |
250+
| `-l fr_CA` | "Translate to fr_CA" |
251+
| `-l fr_CA --detail-lang "Canadian French"` | "Translate to Canadian French" |
252+
| `-l pt_BR --detail-lang "Brazilian Portuguese"` | "Translate to Brazilian Portuguese" |
253+
254+
### Folder Language Behavior
255+
256+
With `--folder-language`, the tool matches folder names against your `-l` parameter:
257+
258+
| Folder | `-l` Parameter | Result |
259+
|--------|----------------|--------|
260+
| `locales/fr_CA/` | `fr_CA` | Translates to Canadian French |
261+
| `locales/fr_CA/` | `fr` | Translates to standard French (fallback) |
262+
| `locales/pt_BR/` | `pt_BR` | Translates to Brazilian Portuguese |
263+
| `locales/pt_BR/` | `pt` | Translates to European Portuguese (fallback) |
264+
265+
### Best Practices
266+
267+
1. **For regional variants**, always use the full locale code:
268+
```bash
269+
gpt-po-translator --folder ./locales --lang fr_CA,pt_BR,en_US
270+
```
271+
272+
2. **Add detail names** for better AI understanding:
273+
```bash
274+
gpt-po-translator --folder ./locales --lang fr_CA,pt_BR \
275+
--detail-lang "Canadian French,Brazilian Portuguese"
276+
```
277+
278+
3. **Use folder detection** for projects with locale-based directory structure:
279+
```bash
280+
# Processes files in locales/fr_CA/, locales/pt_BR/, etc.
281+
gpt-po-translator --folder ./locales --lang fr_CA,pt_BR --folder-language
282+
```
283+
284+
---
285+
286+
## Performance and Progress Tracking
287+
288+
### Overview
289+
290+
The tool provides intelligent performance warnings and progress tracking to help you manage large translation tasks efficiently.
291+
292+
### Performance Modes
293+
294+
1. **Single Mode (Default)**: Makes one API call per translation
295+
- Better for small files (< 30 entries)
296+
- More accurate for context-sensitive translations
297+
- Shows progress for each entry with `-v` flag
298+
299+
2. **Bulk Mode (`--bulk`)**: Batches multiple translations per API call
300+
- Recommended for large files (> 30 entries)
301+
- Significantly faster (up to 10x for large files)
302+
- Shows progress per batch with `-v` flag
303+
304+
### Automatic Performance Warnings
305+
306+
When processing files with more than 30 entries in single mode, the tool will:
307+
1. Display a performance warning with time estimates
308+
2. Recommend switching to bulk mode
309+
3. For very large files (>100 entries), provide a 10-second countdown to cancel
310+
311+
Example warning:
312+
```
313+
2024-01-15 10:30:45 - WARNING - PERFORMANCE WARNING
314+
2024-01-15 10:30:45 - WARNING - Current mode: SINGLE (1 API call per translation)
315+
2024-01-15 10:30:45 - WARNING - This will make 548 separate API calls
316+
2024-01-15 10:30:45 - WARNING - Estimated time: ~14 minutes
317+
2024-01-15 10:30:45 - WARNING -
318+
2024-01-15 10:30:45 - WARNING - Recommendation: Use BULK mode for faster processing
319+
2024-01-15 10:30:45 - WARNING - Command: add --bulk --bulksize 50
320+
2024-01-15 10:30:45 - WARNING - Estimated time with bulk: ~2 minutes
321+
2024-01-15 10:30:45 - WARNING - Speed improvement: 7x faster
322+
```
323+
324+
### Progress Tracking
325+
326+
Enable progress tracking with the `-v` flag:
327+
328+
```bash
329+
# See progress for each file and translation
330+
gpt-po-translator --folder ./locales --lang fr -v
331+
332+
# Output includes:
333+
# - File processing status
334+
# - Translation progress (X/Y entries)
335+
# - Percentage completion
336+
# - Batch progress (in bulk mode)
337+
```
338+
339+
Example progress output:
340+
```
341+
2024-01-15 10:31:00 - INFO - Processing: ./locales/fr/messages.po (45 entries)
342+
2024-01-15 10:31:01 - INFO - [SINGLE 1/45] Translating entry...
343+
2024-01-15 10:31:02 - INFO - [SINGLE 2/45] Translating entry...
344+
2024-01-15 10:31:10 - INFO - Progress: 10/45 entries completed (22.2%)
345+
```
346+
347+
### Verbosity Levels
348+
349+
Control output detail with verbosity flags:
350+
351+
| Flag | Level | Shows |
352+
|------|-------|-------|
353+
| (default) | WARNING | Performance warnings, errors |
354+
| `-v` | INFO | Progress tracking, status updates |
355+
| `-vv` | DEBUG | Detailed API calls, responses |
356+
| `-q` | ERROR | Only critical errors |
357+
358+
### Best Practices for Large Files
359+
360+
1. **Always use bulk mode for files > 100 entries**:
361+
```bash
362+
gpt-po-translator --folder ./locales --lang fr --bulk --bulksize 50 -v
363+
```
364+
365+
2. **Adjust batch size based on content**:
366+
- Short entries (1-5 words): `--bulksize 100`
367+
- Medium entries (sentences): `--bulksize 50` (default)
368+
- Long entries (paragraphs): `--bulksize 20`
369+
370+
3. **Monitor progress for long-running tasks**:
371+
```bash
372+
# Run with progress tracking
373+
gpt-po-translator --folder ./large-project --lang de,fr,es --bulk -v
374+
```
375+
183376
---
184377

185378
## AI Translation Tracking

pyproject.toml

Lines changed: 88 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -22,16 +22,17 @@ requires-python = ">=3.9"
2222
license = {text = "MIT"}
2323
dependencies = [
2424
"polib==1.2.0",
25-
"openai==1.58.1",
26-
"python-dotenv==1.0.0",
27-
"pytest==8.2.2",
25+
"openai==1.99.9",
26+
"python-dotenv==1.0.1",
27+
"pytest==8.3.4",
2828
"tenacity==9.0.0",
2929
"setuptools-scm==8.1.0",
3030
"pycountry==24.6.1",
31-
"anthropic==0.48.0",
31+
"anthropic==0.63.0",
3232
"requests==2.32.3",
33-
"responses==0.25.6",
34-
"isort==6.0.1",
33+
"responses==0.25.8",
34+
"isort==5.13.2",
35+
"tomli==2.2.1",
3536
]
3637
classifiers = [
3738
"Development Status :: 5 - Production/Stable",
@@ -57,8 +58,86 @@ classifiers = [
5758
[project.scripts]
5859
gpt-po-translator = "python_gpt_po.main:main"
5960

60-
[tool.flake8]
61-
max-line-length = 120
62-
6361
[tool.isort]
6462
line_length = 120
63+
64+
[tool.gpt-po-translator]
65+
# Configuration for gpt-po-translator
66+
67+
# ===== FILE SCANNING =====
68+
# Whether to respect .gitignore files (enabled by default)
69+
respect_gitignore = true
70+
71+
# Additional patterns to ignore (beyond .gitignore)
72+
ignore_patterns = [
73+
"*.pyc",
74+
"__pycache__/",
75+
"*.egg-info/",
76+
".pytest_cache/",
77+
".coverage",
78+
".tox/",
79+
".mypy_cache/",
80+
"htmlcov/",
81+
]
82+
83+
# Default patterns that are always ignored (can be overridden by setting to empty list)
84+
default_ignore_patterns = [
85+
".git/",
86+
".venv/",
87+
"venv/",
88+
"env/",
89+
".env/",
90+
"node_modules/",
91+
".cache/",
92+
"build/",
93+
"dist/",
94+
"*.egg-info/",
95+
"__pycache__/",
96+
".pytest_cache/",
97+
".tox/",
98+
".mypy_cache/",
99+
]
100+
101+
# ===== TRANSLATION BEHAVIOR =====
102+
# Default verbosity level (0=WARNING, 1=INFO, 2=DEBUG)
103+
default_verbosity = 1
104+
105+
# Default batch size for bulk mode
106+
default_batch_size = 50
107+
108+
# Enable bulk mode by default
109+
default_bulk_mode = false
110+
111+
# Whether to mark AI-generated translations with comments by default
112+
mark_ai_generated = true
113+
114+
# Whether to use folder-based language detection by default
115+
folder_language_detection = false
116+
117+
# Whether to fix fuzzy entries by default
118+
fix_fuzzy_entries = false
119+
120+
# ===== PROVIDER DEFAULTS =====
121+
# Default provider to use if multiple API keys are available
122+
# Options: "openai", "anthropic", "groq", "together", "xai"
123+
# default_provider = "openai"
124+
125+
# Default models for each provider (will be used if no model is specified)
126+
default_models = { openai = "gpt-4o-mini", anthropic = "claude-3-5-sonnet-20241022" }
127+
128+
# ===== PERFORMANCE =====
129+
# Maximum retries for failed translations
130+
max_retries = 3
131+
132+
# Timeout for API requests (seconds)
133+
request_timeout = 120
134+
135+
# ===== OUTPUT =====
136+
# Skip files that are already fully translated
137+
skip_translated_files = true
138+
139+
# Show progress indicators during translation
140+
show_progress = true
141+
142+
# Show detailed summary at the end
143+
show_summary = true

0 commit comments

Comments
 (0)