Skip to content

Commit d5ddccd

Browse files
Add support for Titulm Bangla MMLU dataset (#3317)
* Added YAML task for [task/bangla] * brief description of the task * Update README.md fix --------- Co-authored-by: Baber Abbasi <[email protected]>
1 parent 690ef8b commit d5ddccd

File tree

3 files changed

+71
-0
lines changed

3 files changed

+71
-0
lines changed

lm_eval/tasks/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ provided to the individual README.md files for each subfolder.
2626
| [asdiv](asdiv/README.md) | Tasks involving arithmetic and mathematical reasoning challenges. | English |
2727
| [babi](babi/README.md) | Tasks designed as question and answering challenges based on simulated stories. | English |
2828
| [babilong](babilong/README.md) | Tasks designed to test whether models can find and reason over facts in long contexts. | English |
29+
| [bangla_mmlu](bangla/README.md) | Benchmark dataset for evaluating language models' performance on Bangla (Bengali) language tasks.Includes diverse NLP tasks to measure model understanding and generation capabilities in Bangla. | Bengali/Bangla |
2930
| [basque_bench](basque_bench/README.md) | Collection of tasks in Basque encompassing various evaluation areas. | Basque |
3031
| [basqueglue](basqueglue/README.md) | Tasks designed to evaluate language understanding in Basque language. | Basque |
3132
| [bbh](bbh/README.md) | Tasks focused on deep semantic understanding through hypothesization and reasoning. | English, German |

lm_eval/tasks/bangla/README.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Titulm Bangla MMLU
2+
3+
This repository contains resources related to **Titulm Bangla MMLU**, a benchmark dataset designed for evaluating Bangla language models. The dataset is used for training, development, and comparative evaluation of language models in the Bangla language.
4+
5+
---
6+
7+
## Overview
8+
9+
**TituLLMs** is a family of Bangla large language models (LLMs) with comprehensive benchmarking designed to advance natural language processing for the Bangla language. The benchmark dataset `Titulm Bangla MMLU` covers multiple-choice questions across a diverse range of topics in Bangla.
10+
11+
This dataset is primarily used to train, validate, and evaluate Bangla language models and compare their performance with other existing models.
12+
13+
For more details, please refer to the original research paper:
14+
[https://arxiv.org/abs/2502.11187](https://arxiv.org/abs/2502.11187)
15+
16+
17+
---
18+
19+
## Dataset
20+
21+
The `Titulm Bangla MMLU` dataset can be found on Hugging Face:
22+
[https://huggingface.co/datasets/hishab/titulm-bangla-mmlu](https://huggingface.co/datasets/hishab/titulm-bangla-mmlu)
23+
24+
This dataset was used as a benchmark in the development and evaluation of TituLLMs and related models.
25+
26+
---
27+
28+
## Usage
29+
30+
The dataset is intended for use within the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) repository to evaluate and compare the performance of Bangla language models.
31+
32+
---
33+
34+
## Note: The dataset can also be used to evaluate other models
35+
36+
### Other datasets like boolq, openbookqa ... soon to be added
37+
## Citation
38+
39+
If you use this dataset or model, please cite the original paper:
40+
41+
```bibtex
42+
@misc{nahin2025titullmsfamilybanglallms,
43+
title={TituLLMs: A Family of Bangla LLMs with Comprehensive Benchmarking},
44+
author={Shahriar Kabir Nahin and Rabindra Nath Nandi and Sagor Sarker and Quazi Sarwar Muhtaseem and Md Kowsher and Apu Chandraw Shill and Md Ibrahim and Mehadi Hasan Menon and Tareq Al Muntasir and Firoj Alam},
45+
year={2025},
46+
eprint={2502.11187},
47+
archivePrefix={arXiv},
48+
primaryClass={cs.CL},
49+
url={https://arxiv.org/abs/2502.11187},
50+
}
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
2+
task: bangla_mmlu
3+
dataset_path: hishab/titulm-bangla-mmlu
4+
dataset_name: all
5+
description: "The following are multiple choice questions (with answers) about range of topics in Bangla"
6+
test_split: test
7+
fewshot_split: dev
8+
fewshot_config:
9+
sampler: first_n
10+
output_type: multiple_choice
11+
doc_to_text: "{{question.strip()}} A. {{options[0]}} B. {{options[1]}} C. {{options[2]}} D. {{options[3]}} Answer:"
12+
doc_to_choice: ["A", "B", "C", "D"]
13+
doc_to_target: answer
14+
metric_list:
15+
- metric: acc
16+
aggregation: mean
17+
higher_is_better: true
18+
- metric: acc_norm
19+
aggregation: mean
20+
higher_is_better: true

0 commit comments

Comments
 (0)