Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition

Description

We introduce Uni-MuMER, which fully fine-tunes the Qwen2.5-VL-3B model for the HMER task without modifying its architecture, effectively injecting domain-specific knowledge into a generalist framework. Our method integrates three data-driven tasks: Tree-Aware Chain-of-Thought (Tree-CoT) for structured spatial reasoning, Error-Driven Learning (EDL) for reducing confusion among visually similar characters, and Symbol Counting (SC) for improving recognition consistency in long expressions.

Experiments on the CROHME and HME100K datasets show that Uni-MuMER achieves new state-of-the-art performance, surpassing the best lightweight specialized model, SSAN, by 16.31% and the top-performing VLM Gemini2.5-flash by 24.42% in the zero-shot setting.

📢 Updates

2025-09-18: This work got accepted to NeurIPS 2025 as a Spotlight (688/21575).
2025-09-09 : Release dataset (Uni-MuMER-Data and valid/test data) and training code. [See Training]
2025-06-02: Release of model weights and inference scripts.

📦 Dataset Preparation

Download data.zip from GitHub, Huggingface, or Google Drive link.
Unzip it at the project root. After extraction, you should have:

data
├── CROHME/
├── CROHME2023/
├── HME100K/
├── Im2LaTeXv2/
├── MathWriting/
└── MNE/

🏃 Inference

After the dataset is in place, you can run batch inference over all three test sets with one of the two commands below.

Shell wrapper (recommended)

bash eval/eval_crohme.sh  -i <input-dir> -o <output-dir> -m <model> -b <batch_size>

Example

bash eval/eval_all.sh -m models/Uni-MuMER-3B -s test1 -b 32768

Direct Python call

python scripts/vllm_infer.py --input-dir <input-dir> --output-dir <output-dir> --model <model> --batch_size <batch_size>

Tip:

To select GPUs on multi‑GPU machines just export CUDA_VISIBLE_DEVICES before running the script, e.g., export CUDA_VISIBLE_DEVICES=1,2
For batch_size, you can use the --batch_size argument to control the number of samples per vLLM.generate() call. The default value is 32768, which is prevented from being too large to avoid OOM errors.

🏋️ Training

Our training code depends on LLaMA-Factory.

For training dependencies, please refer to LLaMA-Factory or requirements_training.txt.

llamafactory-cli train train/Uni-MuMER-train.yaml

✅ TODO

🙏 Acknowledgements

Thanks to the following projects:

📝 Citation

If you find Uni-MuMER useful for your study or research, please cite our paper with:

@article{li2025unimumer,
  title = {Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition},
  author = {Li, Yu and Jiang, Jin and Zhu, Jianhua and Peng, Shuai and Wei, Baole and Zhou, Yuxuan and Gao, Liangcai},
  year = {2025},
  journal={arXiv preprint arXiv:2505.23566},
}

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
asserts/fig		asserts/fig
eval		eval
example_data		example_data
scripts		scripts
train		train
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
data.zip		data.zip
requirements.txt		requirements.txt
requirements_training.txt		requirements_training.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition

Description

📢 Updates

📦 Dataset Preparation

🏃 Inference

Shell wrapper (recommended)

Direct Python call

🏋️ Training

✅ TODO

🙏 Acknowledgements

📝 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Uni-MuMER: Unified Multi-Task Fine-Tuning of Vision-Language Model for Handwritten Mathematical Expression Recognition

Description

📢 Updates

📦 Dataset Preparation

🏃 Inference

Shell wrapper (recommended)

Direct Python call

🏋️ Training

✅ TODO

🙏 Acknowledgements

📝 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages