GanitLLM: Bengali Mathematical Reasoning with Language Models

Official implementation of "GanitLLM: Bengali Mathematical Reasoning with Language Models"

Overview

GanitLLM is the first comprehensive framework for training and evaluating Bengali mathematical reasoning capabilities in language models. We introduce:

Ganit Dataset: A curated dataset of 25k Bengali math problems with difficulty-stratified splits
Curriculum GRPO (CGRPO): A novel training approach combining curriculum learning with Group Relative Policy Optimization
GanitLLM Models: A family of models (0.6B, 1.7B, 4B parameters) achieving state-of-the-art performance on Bengali math reasoning

Key Features

Multi-stage training pipeline: SFT → Curriculum Learning → GRPO
Difficulty-aware data curation using ensemble LLM evaluation
Support for both "thinking" and "instruct" model variants
Comprehensive evaluation on MGSM, MSVAMP, and in-domain test sets
Efficient training with LoRA, FlashAttention, and vLLM

Installation

Requirements

Python 3.12+
CUDA 12.8+
GPU with at least 24GB VRAM (for training)

Setup

# Clone the repository
git clone https://github.com/dipta007/GanitLLM.git
cd GanitLLM

# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create virtual environment and install dependencies
make setup

# Or manually:
uv venv --python 3.12 --seed
source .venv/bin/activate
uv sync

Flash Attention (Optional, recommended)

MAX_JOBS=4 uv pip install flash-attn --no-build-isolation --no-cache

Dataset

The Ganit dataset is available on HuggingFace:

Name	Description	Size
dipta007/Ganit	GanitSFT + GanitRLVR + GanitDEV	25k rows

Dataset Structure

from datasets import load_dataset

dataset = load_dataset("dipta007/Ganit")

Each example contains:

problem: Bengali math problem statement
bengali_solution: Answer in Bengali numerals
english_solution: Answer in English numerals
difficulty: One of {easy, medium, hard, olympiad}
messages: Chat-formatted training data

Difficulty Distribution

Difficulty	Criteria	Count
Easy	>75% LLMs correct	~4k
Medium	50-75% LLMs correct	~4k
Hard	25-50% LLMs correct	~4k
Olympiad	<25% LLMs correct	~4k

Models

Pre-trained Models

All models are available on HuggingFace:

Model	Parameters	Training	HuggingFace Link
GanitLLM-4B_SFT_CGRPO	4B	SFT + CGRPO	Link
GanitLLM-4B_SFT_GRPO	4B	SFT + GRPO	Link
GanitLLM-4B_SFT	4B	SFT	Link
GanitLLM-4B_CGRPO	4B	CGRPO	Link
GanitLLM-1.7B_SFT_CGRPO	1.7B	SFT + CGRPO	Link
GanitLLM-1.7B_SFT_GRPO	1.7B	SFT + GRPO	Link
GanitLLM-1.7B_SFT	1.7B	SFT	Link
GanitLLM-1.7B_CGRPO	1.7B	CGRPO	Link
GanitLLM-0.6B_SFT_CGRPO	0.6B	SFT + CGRPO	Link
GanitLLM-0.6B_SFT_GRPO	0.6B	SFT + GRPO	Link
GanitLLM-0.6B_SFT	0.6B	SFT	Link
GanitLLM-0.6B_CGRPO	0.6B	CGRPO	Link

Model Types:

SFT_CGRPO: Best performing models with multi-stage training (SFT → Curriculum GRPO)
SFT_GRPO: Models with standard SFT followed by GRPO
SFT: Models with supervised fine-tuning only (foundation for RL training)
CGRPO: Models trained with curriculum GRPO only (no SFT)

Usage

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "dipta007/GanitLLM-4B_SFT_CGRPO"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

# Format the prompt
problem = "একটি দোকানে ১২টি আপেল আছে। যদি ৫টি আপেল বিক্রি হয়, তাহলে কতটি আপেল বাকি থাকবে?"

prompt = f"""A conversation takes place between the user and the assistant. The user asks questions, and the assistant solves them. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer> answer here </answer>. Reasoning and answer must be in Bengali. Final answer must be a number nothing else.

Question: {problem}"""

messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True)

outputs = model.generate(inputs.to(model.device), max_new_tokens=2048, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training

Supervised Fine-Tuning (SFT)

PYTHONPATH=. uv run src/training/sft.py \
    --model_name Qwen/Qwen3-4B \
    --dataset_path data/sft/train.jsonl \
    --output_dir checkpoints/sft

GRPO Training

PYTHONPATH=. uv run src/training/unsloth/grpo_lora.py \
    --model_name checkpoints/sft \
    --dataset_path data/rl/train.jsonl \
    --output_dir checkpoints/grpo

Curriculum GRPO (CGRPO)

First, prepare curriculum-ordered data:

PYTHONPATH=. uv run src/data_gen/curriculum_shuffle.py \
    --input data/rl/train.jsonl \
    --output data/rl/train_curriculum.jsonl

Then run GRPO with curriculum data:

PYTHONPATH=. uv run src/training/unsloth/grpo_lora.py \
    --model_name checkpoints/sft \
    --dataset_path data/rl/train_curriculum.jsonl \
    --output_dir checkpoints/cgrpo

Evaluation

Evaluate on MGSM Bengali:

PYTHONPATH=. uv run src/evaluation/eval.py \
    --model dipta007/GanitLLM-4B_SFT_CGRPO \
    --dataset mgsm \
    --output_dir results/

Data Processing Pipeline

To process your own Bengali math data:

# Step 1: Filter and validate Bengali content
PYTHONPATH=. uv run src/data_gen/0.filter.py --input raw_data.jsonl --output filtered.jsonl

# Step 2: Remove exact duplicates
PYTHONPATH=. uv run src/data_gen/1.deduplication.py --input filtered.jsonl --output deduped.jsonl

# Step 3: Remove near-duplicates using MinHash
PYTHONPATH=. uv run src/data_gen/2.minhash.py --input deduped.jsonl --output minhash.jsonl

# Step 4: Decontaminate against test sets
PYTHONPATH=. uv run src/data_gen/3.decontamination.py --input minhash.jsonl --output clean.jsonl

# Step 5: Tag difficulty using ensemble LLM evaluation
PYTHONPATH=. uv run src/data_gen/4.tag_difficulty.py --input clean.jsonl --output final.jsonl

Or run the full pipeline:

make process-dataset INPUT=raw_data.jsonl OUTPUT=final.jsonl

Reward Functions

Our GRPO training uses multi-component rewards:

Format Reward: Validates <think> and <answer> tag structure
Bengali Reasoning Reward: Ensures >80% Bengali text in reasoning
Answer Correctness:
- Bengali numeral match: +2.0
- English numeral match: +1.0
Length Penalty: Soft penalty for responses >2000 tokens

Results

For results and more details, please refer to our paper.

Citation

If you find this work useful, please cite our paper:

will be updated

Acknowledgments

Unsloth for efficient fine-tuning
vLLM for fast inference
Qwen for base models

Contact

For questions or issues, please open a GitHub issue or contact the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docs		docs
src		src
.gitignore		.gitignore
.python-version		.python-version
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GanitLLM: Bengali Mathematical Reasoning with Language Models

Table of Contents

Overview

Key Features

Installation

Requirements

Setup

Flash Attention (Optional, recommended)

Dataset

Dataset Structure

Difficulty Distribution

Models

Pre-trained Models

Usage

Inference

Training

Supervised Fine-Tuning (SFT)

GRPO Training

Curriculum GRPO (CGRPO)

Evaluation

Data Processing Pipeline

Reward Functions

Results

Citation

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

dipta007/GanitLLM

Folders and files

Latest commit

History

Repository files navigation

GanitLLM: Bengali Mathematical Reasoning with Language Models

Table of Contents

Overview

Key Features

Installation

Requirements

Setup

Flash Attention (Optional, recommended)

Dataset

Dataset Structure

Difficulty Distribution

Models

Pre-trained Models

Usage

Inference

Training

Supervised Fine-Tuning (SFT)

GRPO Training

Curriculum GRPO (CGRPO)

Evaluation

Data Processing Pipeline

Reward Functions

Results

Citation

Acknowledgments

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages