Skip to content

Commit 743fb75

Browse files
committed
Initial webpage commit
1 parent 58c22e2 commit 743fb75

File tree

244 files changed

+3949
-480889
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

244 files changed

+3949
-480889
lines changed

.gitignore

Lines changed: 0 additions & 79 deletions
This file was deleted.

LICENSE

Lines changed: 0 additions & 201 deletions
This file was deleted.

README.md

Lines changed: 5 additions & 68 deletions
Original file line numberDiff line numberDiff line change
@@ -1,70 +1,7 @@
1-
# VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation
1+
# VisCoder
2+
Homepage of VisCoder, an open-source large language model fine-tuned for Python visualization code generation and iterative self-correction.
23

3-
[**🌐 Project Page**](https://tiger-ai-lab.github.io/VisCoder) | [**📖 arXiv**](https://arxiv.org/abs/2506.03930) | [**🤗 VisCode-200K Dataset**](https://huggingface.co/datasets/TIGER-Lab/VisCode-200K) | [**🤗 VisCoder-3B**](https://huggingface.co/TIGER-Lab/VisCoder-3B) | [**🤗 VisCoder-7B**](https://huggingface.co/TIGER-Lab/VisCoder-7B)
4+
This website is adapted from [MathVista](https://nerfies.github.io) and [MMMU](https://mmmu-benchmark.github.io/).
45

5-
This repository provides the training and evaluation code for our paper:
6-
> **VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation**
7-
> Yuansheng Ni, Ping Nie, Kai Zou, Xiang Yue, Wenhu Chen
8-
9-
---
10-
11-
## 🔔 News
12-
13-
- **🔥 [2025-06-05] VisCoder and VisCode-200K are now publicly released! Check out our [paper](https://arxiv.org/abs/2506.03930) and [collections](https://huggingface.co/collections/TIGER-Lab/viscoder-6840333efe87c4888bc93046).**
14-
---
15-
16-
## 🧠 Introduction
17-
18-
**VisCoder** is an open-source large language model fine-tuned for **Python visualization code generation and iterative self-correction**. It is trained on **VisCode-200K**, a large-scale instruction-tuning dataset tailored for executable plotting tasks and runtime-guided revision.
19-
20-
VisCoder addresses a core challenge in data analysis: generating Python code that produces not only syntactically correct, but also **visually meaningful plots**. Unlike general code generation tasks, visualization requires grounding across **natural language instructions, data structures**, and **rendered visual outputs**.
21-
22-
To enable this, **VisCode-200K** includes:
23-
-**150K+ executable visualization examples**, validated through runtime checks and paired with plot images.
24-
- 🔁 **45K multi-turn correction dialogues** from the Code-Feedback dataset, providing supervision for fixing faulty code based on execution feedback.
25-
26-
![Alt text](assets/pipeline.png)
27-
28-
We further propose a **self-debug evaluation protocol**, simulating real-world developer workflows through multi-round error correction. VisCoder is benchmarked on **PandasPlotBench** against GPT-4o, GPT-4o-mini, Qwen, and LLaMA, demonstrating robust performance and strong recovery from execution failures.
29-
30-
---
31-
## 📊 Main Results on PandasPlotBench
32-
33-
We evaluate VisCoder on **PandasPlotBench**, a benchmark for executable Python visualization code generation across three libraries: **Matplotlib**, **Seaborn**, and **Plotly**. The figure below summarizes model performance in terms of execution success and GPT-4o-judged alignment scores.
34-
35-
![Alt text](assets/main_results.png)
36-
37-
> With **self-debug**, **VisCoder-7B** achieves over **90% execution pass rate** on both **Matplotlib** and **Seaborn**, outperforming strong open-source baselines and approaching GPT-4o performance on multiple libraries.
38-
39-
---
40-
41-
## 🛠️ Training & Evaluation
42-
43-
We provide both training and evaluation scripts for VisCoder.
44-
45-
- 📦 **Training** is performed using the [ms-swift](https://github.com/modelscope/swift) framework with full-parameter supervised fine-tuning on VisCode-200K.
46-
- 📊 **Evaluation** is based on the [PandasPlotBench](https://github.com/JetBrains-Research/PandasPlotBench). We **augment the original evaluation** with an additional **Execution Pass Rate** metric and introduce a new **self-debug evaluation mode** that allows models to revise failed generations over multiple rounds.
47-
48-
See the following folders for details:
49-
50-
- [`train/`](./train): Training scripts and configurations based on ms-swift.
51-
- [`eval/`](./eval): Evaluation scripts adapted from PandasPlotBench with our self-debug extension.
52-
53-
## Contact
54-
- Yuansheng Ni: [email protected]
55-
- Wenhu Chen: [email protected]
56-
57-
## 📖 Citation
58-
59-
**BibTeX:**
60-
```bibtex
61-
@misc{ni2025viscoderfinetuningllmsexecutable,
62-
title={VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation},
63-
author={Yuansheng Ni and Ping Nie and Kai Zou and Xiang Yue and Wenhu Chen},
64-
year={2025},
65-
eprint={2506.03930},
66-
archivePrefix={arXiv},
67-
primaryClass={cs.SE},
68-
url={https://arxiv.org/abs/2506.03930},
69-
}
70-
```
6+
# Website License
7+
<a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.

0 commit comments

Comments
 (0)