Skip to content

Latest commit

 

History

History
128 lines (110 loc) · 5.75 KB

File metadata and controls

128 lines (110 loc) · 5.75 KB

BasqueBench

Paper

BasqueBench is a benchmark for evaluating language models in Basque tasks. This is, it evaluates the ability of a language model to understand and generate Basque text. BasqueBench offers a combination of pre-existing, open datasets and datasets developed exclusivelly for this benchmark. All the details of BasqueBench will be published in a paper soon.

The new evaluation datasets included in BasqueBench are:

Task Category Homepage
ARC_eu Question Answering https://huggingface.co/datasets/HiTZ/ARC-eu
MGSM_eu Math https://huggingface.co/datasets/HiTZ/MGSM-eu
PAWS_eu Paraphrasing https://huggingface.co/datasets/HiTZ/PAWS-eu
PIQA_eu Question Answering https://huggingface.co/datasets/HiTZ/PIQA-eu
WNLI_eu Natural Language Inference https://huggingface.co/datasets/HiTZ/WNLI-eu
XCOPA_eu Commonsense Reasoning https://huggingface.co/datasets/HiTZ/XCOPA-eu

The datasets included in BasqueBench that have been made public in previous publications are:

Task Category Paper title Homepage
Belebele_eu Reading Comprehension The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants https://huggingface.co/datasets/facebook/belebele
EusExams Question Answering Latxa: An Open Language Model and Evaluation Suite for Basque https://huggingface.co/datasets/HiTZ/EusExams
EusProficiency Question Answering Latxa: An Open Language Model and Evaluation Suite for Basque https://huggingface.co/datasets/HiTZ/EusProficiency
EusReading Reading Comprehension Latxa: An Open Language Model and Evaluation Suite for Basque https://huggingface.co/datasets/HiTZ/EusReading
EusTrivia Question Answering Latxa: An Open Language Model and Evaluation Suite for Basque https://huggingface.co/datasets/HiTZ/EusTrivia
FLORES_eu Translation No Language Left Behind: Scaling Human-Centered Machine Translation https://huggingface.co/datasets/facebook/flores
QNLIeu Natural Language Inference BasqueGLUE: A Natural Language Understanding Benchmark for Basque https://huggingface.co/datasets/orai-nlp/basqueGLUE
XNLIeu Natural Language Inference XNLIeu: a dataset for cross-lingual NLI in Basque https://huggingface.co/datasets/HiTZ/xnli-eu
XStoryCloze_eu Commonsense Reasoning Few-shot Learning with Multilingual Generative Language Models https://huggingface.co/datasets/juletxara/xstory_cloze

Citation

@inproceedings{baucells-etal-2025-iberobench,
    title = "{I}bero{B}ench: A Benchmark for {LLM} Evaluation in {I}berian Languages",
    author = "Baucells, Irene  and
      Aula-Blasco, Javier  and
      de-Dios-Flores, Iria  and
      Paniagua Su{\'a}rez, Silvia  and
      Perez, Naiara  and
      Salles, Anna  and
      Sotelo Docio, Susana  and
      Falc{\~a}o, J{\'u}lia  and
      Saiz, Jose Javier  and
      Sepulveda Torres, Robiert  and
      Barnes, Jeremy  and
      Gamallo, Pablo  and
      Gonzalez-Agirre, Aitor  and
      Rigau, German  and
      Villegas, Marta",
    editor = "Rambow, Owen  and
      Wanner, Leo  and
      Apidianaki, Marianna  and
      Al-Khalifa, Hend  and
      Eugenio, Barbara Di  and
      Schockaert, Steven",
    booktitle = "Proceedings of the 31st International Conference on Computational Linguistics",
    month = jan,
    year = "2025",
    address = "Abu Dhabi, UAE",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.coling-main.699/",
    pages = "10491--10519",
}

Groups and Tasks

Groups

  • basque_bench: All tasks included in BasqueBench.
  • flores_eu: All FLORES translation tasks from or to Basque.

Tasks

The following tasks evaluate tasks on BasqueBench dataset using various scoring methods.

  • arc_eu_challenge
  • arc_eu_easy
  • belebele_eus_Latn
  • eus_exams_eu
  • eus_proficiency
  • eus_reading
  • eus_trivia
  • flores_eu
  • flores_eu-ca
  • flores_eu-de
  • flores_eu-en
  • flores_eu-es
  • flores_eu-fr
  • flores_eu-gl
  • flores_eu-it
  • flores_eu-pt
  • flores_ca-eu
  • flores_de-eu
  • flores_en-eu
  • flores_es-eu
  • flores_fr-eu
  • flores_gl-eu
  • flores_it-eu
  • flores_pt-eu
  • mgsm_direct_eu
  • mgsm_native_cot_eu
  • paws_eu
  • piqa_eu
  • qnlieu
  • wnli_eu
  • xcopa_eu
  • xnli_eu
  • xnli_eu_native
  • xstorycloze_eu

Some of these tasks are taken from benchmarks already available in LM Evaluation Harness. These are:

  • belebele_eus_Latn: Belebele Basque
  • qnlieu: From BasqueGLUE

Checklist

  • Is the task an existing benchmark in the literature?
    • Have you referenced the original paper that introduced the task?
    • If yes, does the original paper provide a reference implementation?
      • Yes, original implementation contributed by author of the benchmark

If other tasks on this dataset are already supported:

  • Is the "Main" variant of this task clearly denoted?
  • Have you provided a short sentence in a README on what each new variant adds / evaluates?
  • Have you noted which, if any, published evaluation setups are matched by this variant?