Benchmarking Data Agents across the Full Data Intelligence Lifecycle

🌐 Website | 📑 Paper | 🤗 Dataset | 🐥 Twitter

📰 News

2025-12-08: 🔥 We release the DAComp dataset and the paper.

👋 Overview

DAComp offers a research-grade benchmark spanning full data intelligence workflows: repository-level data engineering (DAComp-DE), open-ended data analysis (DAComp-DA), a Chinese-localized split (DAComp-zh), and accompanying baseline agents with evaluation suites curated in this repository.

🔍 Installation

Set up the environment using the following commands:

conda create -n dacomp python=3.12
conda activate dacomp

pip install -r requirements.txt
pip install openhands-ai
conda install -c conda-forge nodejs
conda install -c conda-forge poetry

🚀 Quick access DAComp Dataset

DAComp consists of two subsets: DA (Analysis) and DE (Engineering). You can download the dataset from DAComp.

Please use the provided scripts in dacomp-da/download.py and dacomp-de/download.py to download the data automatically.

# --- Download DAComp-DA Dataset ---
cd dacomp-da
# Download DAComp-DA dataset，English tasks into `dacomp-da/tasks` and Chinese tasks into `dacomp-da/tasks_zh`. Change repo_id and download_dir in download.py.
python download.py   

# --- Download DAComp-DE Dataset ---
cd dacomp-de
# Download DAComp-DE dataset，English tasks into `dacomp-de/tasks` and Chinese tasks into `dacomp-de/tasks_zh`. Change repo_id and download_dir in download.py.
python download.py

🚀 Quickstart

DAComp-DA

Agents: pick methods/da-agent (three-stage baseline), methods/spider-agent (single, image-first baseline), or OpenHands; fill in your model config, install requirements, and run run.py as shown in each agent README.

DAComp-DE

Agents: pick methods/de-agent (OpenHands integration); fill in your model config, install requirements, as shown in README.

⚖️ Evaluation

DAComp-DA

Standard DAComp-DA Tasks: follow dacomp-da/evaluation_suite/README.md to evaluate DA tasks.
Results: export a run to dacomp-da/evaluation_suite/agent_results with get_results.py from the agent folder.

DAComp-DE

Standard DAComp-DE Tasks: follow dacomp-de/evaluation_suite/README.md to evaluate DE-Impl and DE-Evol tasks.
DE-Arch Unified Evaluator: follow dacomp-de/evaluation_suite_arch/README.md to evaluate DE-Arch tasks.

📋 Leaderboard Submission

To submit your agent results to the leaderboard, please follow the instructions in DAComp Submission Guidelines.

🙇‍♂️ Acknowledgement

We thank the OpenHands team for their valuable contributions to the open-source community.

✍️ Citation

If you find our work helpful, please cite as

@misc{lei2025dacomp,
      title={DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle}, 
      author={Fangyu Lei and Jinxiang Meng and Yiming Huang and Junjie Zhao and Yitong Zhang and Jianwen Luo and Xin Zou and Ruiyi Yang and Wenbo Shi and Yan Gao and Shizhu He and Zuo Wang and Qian Liu and Yang Wang and Ke Wang and Jun Zhao and Kang Liu},
      year={2025},
      eprint={2512.04324},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.04324}, 
}

🌱 About ByteDance Seed Team

Founded in 2023, ByteDance Seed Team is dedicated to crafting the industry's most advanced AI foundation models. The team aspires to become a world-class research team and make significant contributions to the advancement of science and society.

You can get to know us better through the following channels👇

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
dacomp-da		dacomp-da
dacomp-de		dacomp-de
methods		methods
.gitignore		.gitignore
LICENCE		LICENCE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Benchmarking Data Agents across the Full Data Intelligence Lifecycle

📰 News

👋 Overview

🔍 Installation

🚀 Quick access DAComp Dataset

🚀 Quickstart

DAComp-DA

DAComp-DE

⚖️ Evaluation

DAComp-DA

DAComp-DE

📋 Leaderboard Submission

🙇‍♂️ Acknowledgement

✍️ Citation

🌱 About ByteDance Seed Team

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

ByteDance-Seed/DAComp

Folders and files

Latest commit

History

Repository files navigation

Benchmarking Data Agents across the Full Data Intelligence Lifecycle

📰 News

👋 Overview

🔍 Installation

🚀 Quick access DAComp Dataset

🚀 Quickstart

DAComp-DA

DAComp-DE

⚖️ Evaluation

DAComp-DA

DAComp-DE

📋 Leaderboard Submission

🙇‍♂️ Acknowledgement

✍️ Citation

🌱 About ByteDance Seed Team

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages