ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

📑 Paper | 🤗 Dataset | 🤖 Model | 🖥️ Model Demo

Vision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously with great potential. However, developing robust CUAs requires extensive in-domain knowledge about software interfaces and operations. Unlike image–text pairs that are widely available on the Internet, computer-use data, particularly operation trajectories, are rare, costly to collect. Consequently, progress in this field remains constrained by both data scale and the limited transferability of existing VLMs. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task domains, via a closed-loop pipeline uniting automated agents with human experts. Trained on this scaled-up data, ScaleCUA can operate seamlessly across platforms. Specifically, it delivers strong gains over baselines (+26.6 on WebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-art results (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% on WebArena-Lite-v2). These findings underscore the power of data-driven scaling for general-purpose cross-platform CUAs.

🤖 Video Demo

scalecua_demo.mp4

📋 Table of Contents

ScaleCUA: An Open-Source Agent for Cross-Platform GUI Automation

🎉 News

2025/09/19: ScaleCUA-Data is being uploaded to HuggingFace. Please be patient.
2025/09/19: We have released models and code of ScaleCUA.

🚀 Key Features

ScaleCUA-Data: A large-Scale cross-platform dataset spanning 6 operating systems and 3 GUI-centric task domains.
ScaleCUA-Models: An cross-platform general-purpose agent that excels at GUI-centric task completion on various environments.
SFT Codebase: A comprehensive training framework that supports training computer use agent based on Qwen2.5-VL and InternVL.
Interactive Playground: A series of realistic, interactive environments across Ubuntu, Android, and Web.
Online Evaluation Suite: A set of online benchmarks to evaluate agents' capabilities of task completion on various platforms.

📂 Project Structure

This repository is organized into three main components:

evaluation: Contains all the code and benchmarks for the end-to-end evaluation of our agents.
playground: Provides interactive environments (Android, Ubuntu, Web) and model implementations for users to experience the agent's capabilities firsthand.
agent-sft: Includes the training code, configurations, and instructions needed to reproduce ScaleCUA on the ScaleCUD dataset.

⚙️ Setup

Clone the repository:

git clone https://github.com/OpenGVLab/ScaleCUA.git
cd ScaleCUA

Install dependencies:
```
pip install -r requirements.txt
```

🎮 Playground

The Playground allows you to interactively experience the ScaleCUA agent's capabilities firsthand across Ubuntu, Android, and Web platforms. For a complete guide, please see the [Playground].

Follow these two steps to begin:

Deploy the ScaleCUA models with vLLM following our [Model Deployment]. We support two modes of operation: Native Agentic Model using a single model for both UI grounding and planning, and Agentic Workflow supporting two different models for UI planning and grounding.
Set up your environment following [Playground Environment]. We provide pre-configured, interactive virtual machines for Ubuntu, Android, and Web to simplify this process.

Now, you can try our agent in the interactive environment!

📊 Evaluation

We provide a suite of benchmarks for end-to-end agent evaluation using a vision-only setup. ScaleCUA support using vLLM to deploy and evaluate it through an OpenAI-compatible API. To run the evaluation benchmarks, please refer to the specific instructions within the [Evaluation].

Our evaluation suite covers desktop, mobile, and web environments:

Android: AndroidWorld, AndroidLab
Ubuntu: OSWorld
macOS: MacOSArena
Web: WebArenaLite-V2 (A refined version of WebArena-Lite suitable for visual-based agents)
Windows: WindowsAgentArena

🧠 Training

The directory agent-sft/ contains all the necessary code and configuration files to train the ScaleCUA from scratch using our ScaleCUA-Data. We support training Computer Use Agents with both InternVL and Qwen-VL.

💐 Acknowledgements

Thanks to the following open-sourced projects:

OSWorld WindowAgentArena WebArena AndroidWorld ScienceBoard AGUVIS MMBench-GUI Qwen-VL InternVL

⚖️ License

This project is licensed under the Apache 2.0 License.

📜 Citation

If you find our work useful, please consider citing our paper:

@article{liu2025scalecua,
  title        = {ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data},
  author       = {Liu, Zhaoyang and Xie, Jingjing and Ding, Zichen and Li, Zehao and Yang, Bowen and Wu, Zhenyu and Wang, Xuehui and Sun, Qiushi and Liu, Shi and Wang, Weiyun and Ye, Shenglong and Li, Qingyun and Dong, Xuan and Yu, Yue and Lu, Chenyu and Mo, YunXiang and Yan, Yao and Tian, Zeyue and Zhang, Xiao and Huang, Yuan and Liu, Yiqian and Su, Weijie and Luo, Gen and Yue, Xiangyu and Qi, Biqing and Chen, Kai and Zhou, Bowen and Qiao, Yu and Chen, Qifeng and Wang, Wenhai},
  journal      = {arXiv preprint arXiv:2509.15221},
  year         = {2025},
  note         = {Preprint},
  url          = {https://github.com/OpenGVLab/ScaleCUA}
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
agent-sft		agent-sft
evaluation		evaluation
playground		playground
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

🤖 Video Demo

📋 Table of Contents

🎉 News

🚀 Key Features

📂 Project Structure

⚙️ Setup

🎮 Playground

📊 Evaluation

🧠 Training

💐 Acknowledgements

⚖️ License

📜 Citation

About

Uh oh!

Releases 1

Packages

Contributors 4

Uh oh!

Languages

License

OpenGVLab/ScaleCUA

Folders and files

Latest commit

History

Repository files navigation

ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data

🤖 Video Demo

📋 Table of Contents

🎉 News

🚀 Key Features

📂 Project Structure

⚙️ Setup

🎮 Playground

📊 Evaluation

🧠 Training

💐 Acknowledgements

⚖️ License

📜 Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 4

Uh oh!

Languages

Packages