Skip to content

Validate small high-scoring models #69

@EwoutH

Description

@EwoutH

It would be great if all high-scoring small models on MMLU-Pro could be validated to provide reliable and complete scores. These small models are valuable as they're fast and cheap to run while showcasing important trends in model and distillation efficiency.

Small, high-scoring models

QwQ Family

  • QwQ-32B-Preview (32B)
  • QwQ-32B (32B)

Microsoft/Phi Family

  • Phi-4 (14B)
  • Phi-4-mini (5.6B)
  • Phi3-medium-4k (14B)

Qwen Family

  • Qwen2.5-32B (32B)
  • Qwen2.5-14B (14B)

Google Family

  • Gemma-3-27B-it (27B)
  • Gemma-3-12B-it (12B)
  • Gemma-2-27B-it (27B)

Mistral Family

  • Mistral-Small-instruct (24B)
  • Mistral-Small-base (24B)

Other Models

  • SkyThought-T1 (32B)
  • Reka 3 (21B)
  • RRD2.5-9B (9B)
  • EXAONE-3.5-32B-Instruct (32B)
  • Internlm3-8B-Instruct (8B)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions