-
Notifications
You must be signed in to change notification settings - Fork 51
Open
Description
It would be great if all high-scoring small models on MMLU-Pro could be validated to provide reliable and complete scores. These small models are valuable as they're fast and cheap to run while showcasing important trends in model and distillation efficiency.
Small, high-scoring models
QwQ Family
- QwQ-32B-Preview (32B)
- QwQ-32B (32B)
Microsoft/Phi Family
- Phi-4 (14B)
- Phi-4-mini (5.6B)
- Phi3-medium-4k (14B)
Qwen Family
- Qwen2.5-32B (32B)
- Qwen2.5-14B (14B)
Google Family
- Gemma-3-27B-it (27B)
- Gemma-3-12B-it (12B)
- Gemma-2-27B-it (27B)
Mistral Family
- Mistral-Small-instruct (24B)
- Mistral-Small-base (24B)
Other Models
- SkyThought-T1 (32B)
- Reka 3 (21B)
- RRD2.5-9B (9B)
- EXAONE-3.5-32B-Instruct (32B)
- Internlm3-8B-Instruct (8B)
Metadata
Metadata
Assignees
Labels
No labels