|
1 | | -# VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation |
| 1 | +# VisCoder |
| 2 | +Homepage of VisCoder, an open-source large language model fine-tuned for Python visualization code generation and iterative self-correction. |
2 | 3 |
|
3 | | -[**🌐 Project Page**](https://tiger-ai-lab.github.io/VisCoder) | [**📖 arXiv**](https://arxiv.org/abs/2506.03930) | [**🤗 VisCode-200K Dataset**](https://huggingface.co/datasets/TIGER-Lab/VisCode-200K) | [**🤗 VisCoder-3B**](https://huggingface.co/TIGER-Lab/VisCoder-3B) | [**🤗 VisCoder-7B**](https://huggingface.co/TIGER-Lab/VisCoder-7B) |
| 4 | +This website is adapted from [MathVista](https://nerfies.github.io) and [MMMU](https://mmmu-benchmark.github.io/). |
4 | 5 |
|
5 | | -This repository provides the training and evaluation code for our paper: |
6 | | -> **VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation** |
7 | | -> Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen |
8 | | -
|
9 | | ---- |
10 | | - |
11 | | -## 🔔 News |
12 | | - |
13 | | -- **🔥 [2025-06-05] VisCoder and VisCode-200K are now publicly released! Check out our [paper](https://arxiv.org/abs/2506.03930) and [collections](https://huggingface.co/collections/TIGER-Lab/viscoder-6840333efe87c4888bc93046).** |
14 | | ---- |
15 | | - |
16 | | -## 🧠 Introduction |
17 | | - |
18 | | -**VisCoder** is an open-source large language model fine-tuned for **Python visualization code generation and iterative self-correction**. It is trained on **VisCode-200K**, a large-scale instruction-tuning dataset tailored for executable plotting tasks and runtime-guided revision. |
19 | | - |
20 | | -VisCoder addresses a core challenge in data analysis: generating Python code that produces not only syntactically correct, but also **visually meaningful plots**. Unlike general code generation tasks, visualization requires grounding across **natural language instructions, data structures**, and **rendered visual outputs**. |
21 | | - |
22 | | -To enable this, **VisCode-200K** includes: |
23 | | -- ✅ **150K+ executable visualization examples**, validated through runtime checks and paired with plot images. |
24 | | -- 🔁 **45K multi-turn correction dialogues** from the Code-Feedback dataset, providing supervision for fixing faulty code based on execution feedback. |
25 | | - |
26 | | - |
27 | | - |
28 | | -We further propose a **self-debug evaluation protocol**, simulating real-world developer workflows through multi-round error correction. VisCoder is benchmarked on **PandasPlotBench** against GPT-4o, GPT-4o-mini, Qwen, and LLaMA, demonstrating robust performance and strong recovery from execution failures. |
29 | | - |
30 | | ---- |
31 | | -## 📊 Main Results on PandasPlotBench |
32 | | - |
33 | | -We evaluate VisCoder on **PandasPlotBench**, a benchmark for executable Python visualization code generation across three libraries: **Matplotlib**, **Seaborn**, and **Plotly**. The figure below summarizes model performance in terms of execution success and GPT-4o-judged alignment scores. |
34 | | - |
35 | | - |
36 | | - |
37 | | -> With **self-debug**, **VisCoder-7B** achieves over **90% execution pass rate** on both **Matplotlib** and **Seaborn**, outperforming strong open-source baselines and approaching GPT-4o performance on multiple libraries. |
38 | | -
|
39 | | ---- |
40 | | - |
41 | | -## 🛠️ Training & Evaluation |
42 | | - |
43 | | -We provide both training and evaluation scripts for VisCoder. |
44 | | - |
45 | | -- 📦 **Training** is performed using the [ms-swift](https://github.com/modelscope/swift) framework with full-parameter supervised fine-tuning on VisCode-200K. |
46 | | -- 📊 **Evaluation** is based on the [PandasPlotBench](https://github.com/JetBrains-Research/PandasPlotBench). We **augment the original evaluation** with an additional **Execution Pass Rate** metric and introduce a new **self-debug evaluation mode** that allows models to revise failed generations over multiple rounds. |
47 | | - |
48 | | -See the following folders for details: |
49 | | - |
50 | | -- [`train/`](./train): Training scripts and configurations based on ms-swift. |
51 | | -- [`eval/`](./eval): Evaluation scripts adapted from PandasPlotBench with our self-debug extension. |
52 | | - |
53 | | -## Contact |
54 | | - |
55 | | - |
56 | | - |
57 | | -## 📖 Citation |
58 | | - |
59 | | -**BibTeX:** |
60 | | -```bibtex |
61 | | -@misc{ni2025viscoderfinetuningllmsexecutable, |
62 | | - title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation}, |
63 | | - author={Yuansheng Ni and Ping Nie and Kai Zou and Xiang Yue and Wenhu Chen}, |
64 | | - year={2025}, |
65 | | - eprint={2506.03930}, |
66 | | - archivePrefix={arXiv}, |
67 | | - primaryClass={cs.SE}, |
68 | | - url={https://arxiv.org/abs/2506.03930}, |
69 | | -} |
70 | | -``` |
| 6 | +# Website License |
| 7 | +<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. |
0 commit comments