-
-
Notifications
You must be signed in to change notification settings - Fork 57
Home
hydropix edited this page Jan 24, 2026
·
23 revisions
Last updated: 2026-01-23 23:36
This wiki contains translation quality benchmarks for various LLM models across 19 languages.
Important: These benchmarks evaluate translation quality on challenging literary texts featuring complex vocabulary, stylistic devices, and nuanced expressions. Performance on simpler content (technical documentation, news articles, or straightforward informative texts) is typically 15-25% higher.
| Indicator | Range | Label |
|---|---|---|
| 🟢 | 9-10 | Excellent |
| 🟡 | 7-8 | Good |
| 🟠 | 5-6 | Acceptable |
| 🔴 | 3-4 | Poor |
| ⚫ | 1-2 | Failed |
Overall performance across all tested languages:
| Rank | Model | Avg Score | Accuracy | Fluency | Style | Languages Tested |
|---|---|---|---|---|---|---|
| 1 | google/gemini-3-flash-preview | 🟡 7.6 | 7.9 | 7.4 | 7.3 | 95 |
| 2 | mistralai/mistral-medium-3.1 | 🟡 7.5 | 7.9 | 7.3 | 7.3 | 95 |
| 3 | google/gemini-2.0-flash-001 | 🟡 7.4 | 8.0 | 7.3 | 7.2 | 95 |
| 4 | gemma3:27b-it-qat | 🟡 7.1 | 7.6 | 7.0 | 6.8 | 95 |
| 5 | gemma3:27b | 🟡 7.1 | 7.7 | 7.1 | 6.8 | 95 |
| 6 | translategemma:27b-it-q8_0 | 🟡 7.1 | 7.5 | 7.1 | 6.6 | 95 |
| 7 | ministral-3:14b | 🟠 6.9 | 7.4 | 6.9 | 6.6 | 95 |
| 8 | translategemma:27b | 🟠 6.9 | 7.4 | 7.0 | 6.4 | 95 |
| 9 | translategemma:12b-it-q4_K_M | 🟠 6.8 | 7.3 | 6.8 | 6.2 | 95 |
| 10 | translategemma:12b | 🟠 6.8 | 7.2 | 6.8 | 6.1 | 95 |
| 11 | qwen3:30b | 🟠 6.7 | 7.4 | 6.7 | 6.4 | 95 |
| 12 | gemma3:12b | 🟠 6.7 | 7.3 | 6.6 | 6.4 | 95 |
| 13 | qwen3:30b-instruct | 🟠 6.6 | 7.2 | 6.6 | 6.2 | 95 |
| 14 | mistral-small:24b | 🟠 6.4 | 7.2 | 6.4 | 6.2 | 95 |
| 15 | glm-4.7-flash:latest | 🟠 6.4 | 7.0 | 6.6 | 5.8 | 95 |
| 16 | ministral-3 | 🟠 6.3 | 7.0 | 6.2 | 5.9 | 95 |
| 17 | translategemma:4b | 🟠 6.1 | 6.5 | 6.3 | 5.3 | 95 |
| 18 | qwen3:14b | 🟠 6.0 | 6.7 | 6.0 | 5.7 | 95 |
| 19 | translategemma:4b-it-q4_K_M | 🟠 6.0 | 6.5 | 6.2 | 5.2 | 95 |
| 20 | qwen3:4b | 🟠 5.9 | 6.7 | 5.8 | 5.5 | 95 |
| 21 | gemma3:4b | 🟠 5.7 | 6.5 | 5.8 | 5.3 | 95 |
| 22 | qwen3:8b | 🟠 5.5 | 6.3 | 5.4 | 5.2 | 95 |
| 23 | llama3.1:8b | 🔴 4.2 | 4.9 | 4.2 | 3.8 | 95 |
| 24 | llama3.2 | ⚫ 2.5 | 3.4 | 2.5 | 2.3 | 95 |
Best translation quality by target language:
| Rank | Language | Native | Avg Score | Best Model | Tests |
|---|---|---|---|---|---|
| 1 | Spanish | Español | 🟡 7.4 | glm-4.7-flash:latest | 120 |
| 2 | French | Français | 🟡 7.2 | mistralai/mistral-medium-3.1 | 120 |
| 3 | Portuguese | Português | 🟡 7.2 | glm-4.7-flash:latest | 120 |
| 4 | Italian | Italiano | 🟡 7.0 | google/gemini-2.0-flash-001 | 120 |
| 5 | Chinese (Simplified) | 简体中文 | 🟠 6.9 | glm-4.7-flash:latest | 120 |
| 6 | Chinese (Traditional) | 繁體中文 | 🟠 6.9 | glm-4.7-flash:latest | 120 |
| 7 | German | Deutsch | 🟠 6.8 | ministral-3:14b | 120 |
| 8 | Russian | Русский | 🟠 6.8 | mistralai/mistral-medium-3.1 | 120 |
| 9 | Vietnamese | Tiếng Việt | 🟠 6.4 | mistralai/mistral-medium-3.1 | 120 |
| 10 | Polish | Polski | 🟠 6.3 | google/gemini-3-flash-preview | 120 |
| 11 | Ukrainian | Українська | 🟠 6.2 | google/gemini-3-flash-preview | 120 |
| 12 | Arabic | العربية | 🟠 6.2 | gemma3:27b-it-qat | 120 |
| 13 | Japanese | 日本語 | 🟠 6.0 | glm-4.7-flash:latest | 120 |
| 14 | Thai | ไทย | 🟠 6.0 | mistralai/mistral-medium-3.1 | 120 |
| 15 | Hindi | हिन्दी | 🟠 5.8 | google/gemini-3-flash-preview | 120 |
- Total Models Tested: 24
- Total Languages: 19
- Total Translations: 2280
- Evaluator Model: anthropic/claude-haiku-4.5
- Source Language: English
| Language | Avg Score | Best Model |
|---|---|---|
| Spanish | 🟡 7.4 | glm-4.7-flash:latest |
| French | 🟡 7.2 | mistralai/mistral-medium-3.1 |
| Portuguese | 🟡 7.2 | glm-4.7-flash:latest |
| Italian | 🟡 7.0 | google/gemini-2.0-flash-001 |
| German | 🟠 6.8 | ministral-3:14b |
| Polish | 🟠 6.3 | google/gemini-3-flash-preview |
| Language | Avg Score | Best Model |
|---|---|---|
| Chinese (Simplified) | 🟠 6.9 | glm-4.7-flash:latest |
| Chinese (Traditional) | 🟠 6.9 | glm-4.7-flash:latest |
| Vietnamese | 🟠 6.4 | mistralai/mistral-medium-3.1 |
| Japanese | 🟠 6.0 | glm-4.7-flash:latest |
| Thai | 🟠 6.0 | mistralai/mistral-medium-3.1 |
| Hindi | 🟠 5.8 | google/gemini-3-flash-preview |
| Korean | 🟠 5.7 | google/gemini-2.0-flash-001 |
| Bengali | 🟠 5.2 | google/gemini-3-flash-preview |
| Tamil | 🟠 5.2 | mistralai/mistral-medium-3.1 |
| Language | Avg Score | Best Model |
|---|---|---|
| Russian | 🟠 6.8 | mistralai/mistral-medium-3.1 |
| Ukrainian | 🟠 6.2 | google/gemini-3-flash-preview |
| Language | Avg Score | Best Model |
|---|---|---|
| Arabic | 🟠 6.2 | gemma3:27b-it-qat |
| Hebrew | 🟠 5.2 | gemma3:27b-it-qat |
- By Language: All Languages
- By Model: All Models
- Benchmark Documentation: How to Run Benchmarks
- Raw Data: Download JSON
Generated by TranslateBookWithLLM benchmark system