Skip to content

Commit b1cf6a4

Browse files
Merge pull request #65 from pescheckit/feature_added-context-with-msgctxt
Added context with msgctxt
2 parents a9a6d6c + dcf703b commit b1cf6a4

File tree

6 files changed

+388
-55
lines changed

6 files changed

+388
-55
lines changed

README.md

Lines changed: 23 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
![PyPI](https://img.shields.io/pypi/v/gpt-po-translator?label=gpt-po-translator)
55
![Downloads](https://pepy.tech/badge/gpt-po-translator)
66

7-
**Translate gettext (.po) files using AI models.** Supports OpenAI, Azure OpenAI, Anthropic/Claude, DeepSeek, and Ollama (local) with automatic AI translation tagging.
7+
**Translate gettext (.po) files using AI models.** Supports OpenAI, Azure OpenAI, Anthropic/Claude, and DeepSeek with automatic AI translation tagging and context-aware translations.
88

99
## 🚀 Quick Start
1010

@@ -21,8 +21,8 @@ gpt-po-translator --folder ./locales --bulk
2121

2222
## ✨ Key Features
2323

24-
- **Multiple AI providers** - OpenAI, Azure OpenAI, Anthropic/Claude, DeepSeek, and Ollama (local)
25-
- **Privacy option** - Use Ollama for local, offline translations with no cloud API
24+
- **Multiple AI providers** - OpenAI, Azure OpenAI, Anthropic/Claude, DeepSeek, Ollama
25+
- **Context-aware translations** - Automatically uses `msgctxt` for better accuracy with ambiguous terms
2626
- **AI translation tracking** - Auto-tags AI-generated translations with `#. AI-generated` comments
2727
- **Bulk processing** - Efficient batch translation for large files
2828
- **Smart language detection** - Auto-detects target languages from folder structure
@@ -70,26 +70,6 @@ export AZURE_OPENAI_ENDPOINT='https://your-resource.openai.azure.com/'
7070
export AZURE_OPENAI_API_VERSION='2024-02-01'
7171
```
7272

73-
### Or Use Ollama (Local, No API Key Needed)
74-
75-
```bash
76-
# 1. Install Ollama
77-
curl -fsSL https://ollama.com/install.sh | sh
78-
79-
# 2. Pull a model
80-
ollama pull qwen2.5 # Best for multilingual (Arabic, Chinese, etc.)
81-
# OR
82-
ollama pull llama3.2 # Fast for European languages
83-
84-
# 3. Translate (no API key required!)
85-
gpt-po-translator --provider ollama --folder ./locales
86-
87-
# For non-Latin scripts, use qwen2.5 WITHOUT --bulk
88-
gpt-po-translator --provider ollama --model qwen2.5 --folder ./locales --lang ar
89-
```
90-
91-
> **💡 Important:** For Ollama with **non-Latin languages** (Arabic, Chinese, Japanese, etc.), **omit the `--bulk` flag**. Single-item translation is more reliable because the model doesn't have to format responses as JSON.
92-
9373
## 💡 Usage Examples
9474

9575
### Basic Translation
@@ -115,7 +95,7 @@ gpt-po-translator --provider deepseek --folder ./locales --lang de
11595
# Use Azure OpenAI with auto-detection
11696
gpt-po-translator --provider azure_openai --folder ./locales --bulk
11797

118-
# Use Ollama (local, private, free) - omit --bulk for non-Latin scripts
98+
# Use Ollama (local, see docs/usage.md for setup)
11999
gpt-po-translator --provider ollama --folder ./locales
120100
```
121101

@@ -127,8 +107,7 @@ docker run -v $(pwd):/data \
127107
ghcr.io/pescheckit/python-gpt-po:latest \
128108
--folder /data --bulk
129109

130-
# With Ollama (local, no API key needed)
131-
# Note: Omit --bulk for better quality with non-Latin scripts
110+
# With Ollama (see docs/usage.md for full setup guide)
132111
docker run --rm \
133112
-v $(pwd):/data \
134113
--network host \
@@ -145,6 +124,24 @@ docker run -v $(pwd):/data \
145124
--provider azure_openai --folder /data --lang de
146125
```
147126

127+
## 🎯 Context-Aware Translations
128+
129+
**Automatically uses `msgctxt` for better accuracy:**
130+
131+
```po
132+
msgctxt "button"
133+
msgid "Save"
134+
msgstr "" → "Speichern" (button action)
135+
136+
msgctxt "money"
137+
msgid "Save"
138+
msgstr "" → "Sparen" (save money)
139+
```
140+
141+
The tool extracts context from your PO files and passes it to the AI for more accurate translations of ambiguous terms.
142+
143+
**Tip:** Use detailed context for best results: `msgctxt "status label (not verb)"` works better than just `msgctxt "status"`.
144+
148145
## 🏷️ AI Translation Tracking
149146

150147
**All AI translations are automatically tagged** for transparency and compliance:

docs/usage.md

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -751,6 +751,55 @@ Both modes use the same preservation logic, ensuring consistent behavior.
751751

752752
---
753753

754+
## Context-Aware Translations with msgctxt
755+
756+
### Overview
757+
758+
The tool automatically uses `msgctxt` (message context) from PO entries to provide context to the AI, improving translation accuracy for ambiguous terms.
759+
760+
### How It Works
761+
762+
When a PO entry includes `msgctxt`, it's automatically passed to the AI:
763+
764+
```po
765+
msgctxt "button"
766+
msgid "Save"
767+
msgstr ""
768+
```
769+
770+
The AI receives:
771+
```
772+
CONTEXT: button
773+
IMPORTANT: Choose the translation that matches this specific context and usage.
774+
775+
Translate to German: Save
776+
```
777+
778+
Result: **"Speichern"** (button action) instead of **"Sparen"** (to save money)
779+
780+
### Best Practices
781+
782+
**✓ Good - Detailed, Explicit Context:**
783+
```po
784+
msgctxt "status: not Halten (verb), but Angehalten/Wartend (state)"
785+
msgid "Hold"
786+
msgstr "" → "Angehalten" ✓
787+
```
788+
789+
**⚠️ Limited - Simple Context:**
790+
```po
791+
msgctxt "status"
792+
msgid "Hold"
793+
msgstr "" → "Halten" (may still be wrong)
794+
```
795+
796+
**Key Points:**
797+
- **Be explicit** - Describe what you want AND what you don't want
798+
- **Provide examples** - Include similar terms or expected word forms
799+
- **Human review still needed** - msgctxt improves results but doesn't guarantee perfection
800+
801+
---
802+
754803
## Behind the Scenes: API Calls and Post-Processing
755804

756805
- **Provider-Specific API Calls:**

0 commit comments

Comments
 (0)