Skip to content

Model mistralai mistral medium 3 1

hydropix edited this page Dec 23, 2025 · 1 revision

mistralai/mistral-medium-3.1

Ollama Model ID: mistralai/mistral-medium-3.1


Summary

Metric Value
Average Score 🟡 7.5/10
Accuracy 7.9/10
Fluency 7.3/10
Style 7.3/10
Languages Tested 19
Total Translations 95
Best Language French (8.4)
Worst Language Hindi (6.8)

Language Performance

Top Languages

Rank Language Overall Accuracy Fluency Style
1 French 🟡 8.4 8.4 8.4 8.0
2 Spanish 🟡 8.2 8.6 8.2 8.2
3 Russian 🟡 8.2 8.4 7.8 7.8
4 German 🟡 8.0 8.0 8.0 7.6
5 Chinese (Simplified) 🟡 8.0 8.0 8.0 7.6
6 Italian 🟡 7.8 8.2 7.8 7.6
7 Portuguese 🟡 7.8 7.8 7.8 7.4
8 Polish 🟡 7.8 8.0 7.8 7.6
9 Chinese (Traditional) 🟡 7.8 8.0 7.6 7.4
10 Japanese 🟡 7.4 7.8 7.0 7.4
View all 19 languages
Rank Language Overall Accuracy Fluency Style
1 French 🟡 8.4 8.4 8.4 8.0
2 Spanish 🟡 8.2 8.6 8.2 8.2
3 Russian 🟡 8.2 8.4 7.8 7.8
4 German 🟡 8.0 8.0 8.0 7.6
5 Chinese (Simplified) 🟡 8.0 8.0 8.0 7.6
6 Italian 🟡 7.8 8.2 7.8 7.6
7 Portuguese 🟡 7.8 7.8 7.8 7.4
8 Polish 🟡 7.8 8.0 7.8 7.6
9 Chinese (Traditional) 🟡 7.8 8.0 7.6 7.4
10 Japanese 🟡 7.4 7.8 7.0 7.4
11 Ukrainian 🟡 7.4 8.0 7.2 7.6
12 Vietnamese 🟡 7.2 7.6 6.8 7.0
13 Hebrew 🟡 7.2 7.8 7.0 7.2
14 Korean 🟡 7.0 7.4 6.8 6.6
15 Thai 🟡 7.0 7.6 6.6 6.8
16 Bengali 🟡 7.0 7.6 6.6 7.0
17 Tamil 🟡 7.0 7.6 6.6 7.0
18 Arabic 🟡 7.0 7.6 6.8 6.8
19 Hindi 🟠 6.8 7.8 6.8 7.0

Performance by Category

European Major Languages

Language Overall Accuracy Fluency Style
French 🟡 8.4 8.4 8.4 8.0
Spanish 🟡 8.2 8.6 8.2 8.2
German 🟡 8.0 8.0 8.0 7.6
Italian 🟡 7.8 8.2 7.8 7.6
Portuguese 🟡 7.8 7.8 7.8 7.4
Polish 🟡 7.8 8.0 7.8 7.6

Category Average: 🟡 8.0

Cyrillic Languages

Language Overall Accuracy Fluency Style
Russian 🟡 8.2 8.4 7.8 7.8
Ukrainian 🟡 7.4 8.0 7.2 7.6

Category Average: 🟡 7.8

Asian Languages

Language Overall Accuracy Fluency Style
Chinese (Simplified) 🟡 8.0 8.0 8.0 7.6
Chinese (Traditional) 🟡 7.8 8.0 7.6 7.4
Japanese 🟡 7.4 7.8 7.0 7.4
Vietnamese 🟡 7.2 7.6 6.8 7.0
Korean 🟡 7.0 7.4 6.8 6.6
Thai 🟡 7.0 7.6 6.6 6.8
Bengali 🟡 7.0 7.6 6.6 7.0
Tamil 🟡 7.0 7.6 6.6 7.0
Hindi 🟠 6.8 7.8 6.8 7.0

Category Average: 🟡 7.2

Semitic Languages

Language Overall Accuracy Fluency Style
Hebrew 🟡 7.2 7.8 7.0 7.2
Arabic 🟡 7.0 7.6 6.8 6.8

Category Average: 🟡 7.1


Translation Examples

Best Translation

French - A Study in Scarlet Score: 🟢 9.0/10

View Translation

Original:

Translation:

— Vous avez servi en Afghanistan, à ce que je vois. — Mais comment diable l’avez-vous deviné ? m’exclamai-je, stupéfait. — Peu importe, répondit-il en riant sous cape. La question, à présent, concerne l’hémoglobine. Vous ne voyez sans doute pas toute la portée de ma découverte ? — C’est assurément passionnant sur le plan chimique, admis-je, mais en pratique… — En pratique ? Mais mon cher, c’est la plus grande avancée médico-légale de ces dernières années ! Ne comprenez-vous pas que cela nous fou

Feedback: The translation faithfully preserves the meaning and dialogue dynamics of the original, with excellent choices like 'riant sous cape' for 'chuckling to himself' and proper handling of technical terms. The French flows naturally with appropriate register shifts and maintains Holmes's characteristic deductive confidence and Watson's bewilderment throughout.

Worst Translation

Arabic - The Picture of Dorian Gray Score: 🟠 5.0/10

View Translation

Original:

Translation:

ملأ عبق الورود الغنيّ الاستوديو، وعندما هبّت نسمة الصيف الخفيفة بين أشجار الحديقة، تدفّق من الباب المفتوح عبير الياسمين الثقيل، أو رائحة الزيزفون الوردية الأرقّ. ومن زاوية الأريكة المغطاة بسرج فارسية كان مستلقيًا عليها، يدخّن كما هي عادته، سيجارًا تلو الآخر بلا حساب، استطاع اللورد هنري ووتون أن يلمح لمعان أزهار الخزامى الذهبية اللون والعسلية الرائحة.

Feedback: The translation captures the general sensory richness but contains significant botanical errors (jasmine and linden instead of lilac and hawthorn, laburnum misidentified as lavender) that distort Wilde's specific aesthetic imagery. The sentence structure becomes awkward and loses the flowing elegance of the original prose.


Score Distribution

Score Range Count Percentage
🟢 Excellent (9-10) 7 7.4%
🟡 Good (7-8) 78 82.1%
🟠 Acceptable (5-6) 10 10.5%
🔴 Poor (3-4) 0 0.0%
⚫ Failed (1-2) 0 0.0%

Performance Metrics

Metric Value
Average Translation Time 2906.0ms
Success Rate 100.0%

← Back to Home | All Models

Clone this wiki locally