Marco-Bench-MIF: A Benchmark for Multilingual Instruction-Following Evaluation

Introduction

Marco-Bench-MIF is the first deeply localized multilingual benchmark designed to evaluate instruction-following capabilities across 30 languages. Unlike existing benchmarks that rely primarily on machine translation, Marco-Bench-MIF implements fine-grained cultural adaptations to provide more accurate assessment. Our research demonstrates that machine-translated data underestimates model performance by 7-22% in multilingual environments.

Key Features

Extensive Language Coverage: 30 languages spanning 6 major language families, including high-resource (English, Chinese, German) and low-resource languages (Yoruba, Nepali)
Deep Cultural Localization: Three-step process of lexical replacement, theme transformation, and pragmatic reconstruction to ensure cultural and linguistic appropriateness
Diverse Constraint Types: 541 instruction-response pairs covering single/multiple constraints, expressive/content constraints, and various instruction types
Comparative Dataset: Machine-translated and culturally-localized versions available for specific languages (Arabic, Chinese, Spanish, etc.) to enable comparative research

Dataset Access

The dataset is available through our GitHub repository and Hugging Face:

# Github
git clone https://github.com/AIDC-AI/Marco-Bench-MIF.git

# Huggingface
https://huggingface.co/datasets/AIDC-AI/Marco-Bench-MIF

Key Findings

Our benchmark evaluated 20+ LLM models and revealed:

Model scale strongly correlates with performance, with 70B+ models outperforming 8B models by 45-60%
A 25-35% performance gap exists between high-resource languages (German, Chinese) and low-resource languages (Yoruba, Nepali)
Significant differences between localized and machine-translated evaluations, especially for complex instructions

Contact

For questions or suggestions, please submit a GitHub issue or contact us:

Email: [email protected]
Project homepage: https://github.com/AIDC-AI/Marco-Bench-MIF

License

This dataset is licensed under the Apache License 2.0.

Acknowledgments

Special thanks to all annotators and translators who participated in dataset construction and validation. This project is supported by Alibaba International Digital Commerce Group.

Citation

@inproceedings{zeng-etal-2025-marco,
  title     = "Marco-Bench-{MIF}: On Multilingual Instruction-Following Capability of Large Language",
  author    = "Zeng, Bo and Lyu, Chenyang and Liu, Sinuo and Zeng, Mingyan and Wu, Minghao and Ni, Xuanfan and Shi, Tianqi and Zhao, Yu and Liu, Yefeng and Zhu, Chenyu and Li, Ruizhe and Geng, Jiahui and Li, Qing and Tong, Yu and Wang, Longyue and Luo, Weihua and Zhang, Kaifu",
  editor    = "Che, Wanxiang and Nabende, Joyce and Shutova, Ekaterina and Pilehvar, Mohammad Taher",
  booktitle = "Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
  month     = jul,
  year      = "2025",
  address   = "Vienna, Austria",
  publisher = "Association for Computational Linguistics",
  url       = "https://aclanthology.org/2025.acl-long.1172/",
  doi       = "10.18653/v1/2025.acl-long.1172",
  pages     = "24058--24072",
  ISBN      = "979-8-89176-251-0"
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Marco-Bench-MIF		Marco-Bench-MIF
image		image
NOTICE		NOTICE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Marco-Bench-MIF: A Benchmark for Multilingual Instruction-Following Evaluation

Introduction

Key Features

Dataset Access

Key Findings

Contact

License

Acknowledgments

Citation

About

Uh oh!

Releases

Packages

AIDC-AI/Marco-Bench-MIF

Folders and files

Latest commit

History

Repository files navigation

Marco-Bench-MIF: A Benchmark for Multilingual Instruction-Following Evaluation

Introduction

Key Features

Dataset Access

Key Findings

Contact

License

Acknowledgments

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages