Skip to content
hydropix edited this page Jan 24, 2026 · 23 revisions

Translation Quality Benchmark

Last updated: 2026-01-23 23:36

This wiki contains translation quality benchmarks for various LLM models across 19 languages.

Important: These benchmarks evaluate translation quality on challenging literary texts featuring complex vocabulary, stylistic devices, and nuanced expressions. Performance on simpler content (technical documentation, news articles, or straightforward informative texts) is typically 15-25% higher.

Score Legend

Indicator Range Label
🟢 9-10 Excellent
🟡 7-8 Good
🟠 5-6 Acceptable
🔴 3-4 Poor
1-2 Failed

Model Rankings

Overall performance across all tested languages:

Rank Model Avg Score Accuracy Fluency Style Languages Tested
1 google/gemini-3-flash-preview 🟡 7.6 7.9 7.4 7.3 95
2 mistralai/mistral-medium-3.1 🟡 7.5 7.9 7.3 7.3 95
3 google/gemini-2.0-flash-001 🟡 7.4 8.0 7.3 7.2 95
4 gemma3:27b-it-qat 🟡 7.1 7.6 7.0 6.8 95
5 gemma3:27b 🟡 7.1 7.7 7.1 6.8 95
6 translategemma:27b-it-q8_0 🟡 7.1 7.5 7.1 6.6 95
7 ministral-3:14b 🟠 6.9 7.4 6.9 6.6 95
8 translategemma:27b 🟠 6.9 7.4 7.0 6.4 95
9 translategemma:12b-it-q4_K_M 🟠 6.8 7.3 6.8 6.2 95
10 translategemma:12b 🟠 6.8 7.2 6.8 6.1 95
11 qwen3:30b 🟠 6.7 7.4 6.7 6.4 95
12 gemma3:12b 🟠 6.7 7.3 6.6 6.4 95
13 qwen3:30b-instruct 🟠 6.6 7.2 6.6 6.2 95
14 mistral-small:24b 🟠 6.4 7.2 6.4 6.2 95
15 glm-4.7-flash:latest 🟠 6.4 7.0 6.6 5.8 95
16 ministral-3 🟠 6.3 7.0 6.2 5.9 95
17 translategemma:4b 🟠 6.1 6.5 6.3 5.3 95
18 qwen3:14b 🟠 6.0 6.7 6.0 5.7 95
19 translategemma:4b-it-q4_K_M 🟠 6.0 6.5 6.2 5.2 95
20 qwen3:4b 🟠 5.9 6.7 5.8 5.5 95
21 gemma3:4b 🟠 5.7 6.5 5.8 5.3 95
22 qwen3:8b 🟠 5.5 6.3 5.4 5.2 95
23 llama3.1:8b 🔴 4.2 4.9 4.2 3.8 95
24 llama3.2 ⚫ 2.5 3.4 2.5 2.3 95

Language Rankings (Top 15)

Best translation quality by target language:

Rank Language Native Avg Score Best Model Tests
1 Spanish Español 🟡 7.4 glm-4.7-flash:latest 120
2 French Français 🟡 7.2 mistralai/mistral-medium-3.1 120
3 Portuguese Português 🟡 7.2 glm-4.7-flash:latest 120
4 Italian Italiano 🟡 7.0 google/gemini-2.0-flash-001 120
5 Chinese (Simplified) 简体中文 🟠 6.9 glm-4.7-flash:latest 120
6 Chinese (Traditional) 繁體中文 🟠 6.9 glm-4.7-flash:latest 120
7 German Deutsch 🟠 6.8 ministral-3:14b 120
8 Russian Русский 🟠 6.8 mistralai/mistral-medium-3.1 120
9 Vietnamese Tiếng Việt 🟠 6.4 mistralai/mistral-medium-3.1 120
10 Polish Polski 🟠 6.3 google/gemini-3-flash-preview 120
11 Ukrainian Українська 🟠 6.2 google/gemini-3-flash-preview 120
12 Arabic العربية 🟠 6.2 gemma3:27b-it-qat 120
13 Japanese 日本語 🟠 6.0 glm-4.7-flash:latest 120
14 Thai ไทย 🟠 6.0 mistralai/mistral-medium-3.1 120
15 Hindi हिन्दी 🟠 5.8 google/gemini-3-flash-preview 120

View all 19 languages...


Quick Stats

  • Total Models Tested: 24
  • Total Languages: 19
  • Total Translations: 2280
  • Evaluator Model: anthropic/claude-haiku-4.5
  • Source Language: English

Categories

By Language Category

European Major Languages

Language Avg Score Best Model
Spanish 🟡 7.4 glm-4.7-flash:latest
French 🟡 7.2 mistralai/mistral-medium-3.1
Portuguese 🟡 7.2 glm-4.7-flash:latest
Italian 🟡 7.0 google/gemini-2.0-flash-001
German 🟠 6.8 ministral-3:14b
Polish 🟠 6.3 google/gemini-3-flash-preview

Asian Languages

Language Avg Score Best Model
Chinese (Simplified) 🟠 6.9 glm-4.7-flash:latest
Chinese (Traditional) 🟠 6.9 glm-4.7-flash:latest
Vietnamese 🟠 6.4 mistralai/mistral-medium-3.1
Japanese 🟠 6.0 glm-4.7-flash:latest
Thai 🟠 6.0 mistralai/mistral-medium-3.1
Hindi 🟠 5.8 google/gemini-3-flash-preview
Korean 🟠 5.7 google/gemini-2.0-flash-001
Bengali 🟠 5.2 google/gemini-3-flash-preview
Tamil 🟠 5.2 mistralai/mistral-medium-3.1

Cyrillic Languages

Language Avg Score Best Model
Russian 🟠 6.8 mistralai/mistral-medium-3.1
Ukrainian 🟠 6.2 google/gemini-3-flash-preview

Semitic Languages

Language Avg Score Best Model
Arabic 🟠 6.2 gemma3:27b-it-qat
Hebrew 🟠 5.2 gemma3:27b-it-qat

Browse


Generated by TranslateBookWithLLM benchmark system

Clone this wiki locally