📑 Paper | 🤗 Dataset | 🤖 Model | 🖥️ Model Demo
Vision-Language Models (VLMs) have enabled computer use agents (CUAs) that operate GUIs autonomously with great potential. However, developing robust CUAs requires extensive in-domain knowledge about software interfaces and operations. Unlike image–text pairs that are widely available on the Internet, computer-use data, particularly operation trajectories, are rare, costly to collect. Consequently, progress in this field remains constrained by both data scale and the limited transferability of existing VLMs. In this work, we introduce ScaleCUA, a step toward scaling open-source CUAs. It offers a large-scale dataset spanning 6 operating systems and 3 task domains, via a closed-loop pipeline uniting automated agents with human experts. Trained on this scaled-up data, ScaleCUA can operate seamlessly across platforms. Specifically, it delivers strong gains over baselines (+26.6 on WebArena-Lite-v2, +10.7 on ScreenSpot-Pro) and sets new state-of-the-art results (94.4% on MMBench-GUI L1-Hard, 60.6% on OSWorld-G, 47.4% on WebArena-Lite-v2). These findings underscore the power of data-driven scaling for general-purpose cross-platform CUAs.

scalecua_demo.mp4
2025/09/19
: ScaleCUA-Data is being uploaded to HuggingFace. Please be patient.2025/09/19
: We have released models and code of ScaleCUA.
- ScaleCUA-Data: A large-Scale cross-platform dataset spanning 6 operating systems and 3 GUI-centric task domains.
- ScaleCUA-Models: An cross-platform general-purpose agent that excels at GUI-centric task completion on various environments.
- SFT Codebase: A comprehensive training framework that supports training computer use agent based on Qwen2.5-VL and InternVL.
- Interactive Playground: A series of realistic, interactive environments across Ubuntu, Android, and Web.
- Online Evaluation Suite: A set of online benchmarks to evaluate agents' capabilities of task completion on various platforms.
This repository is organized into three main components:
evaluation
: Contains all the code and benchmarks for the end-to-end evaluation of our agents.playground
: Provides interactive environments (Android, Ubuntu, Web) and model implementations for users to experience the agent's capabilities firsthand.agent-sft
: Includes the training code, configurations, and instructions needed to reproduce ScaleCUA on the ScaleCUD dataset.
- Clone the repository:
git clone https://github.com/OpenGVLab/ScaleCUA.git cd ScaleCUA
- Install dependencies:
pip install -r requirements.txt
The Playground allows you to interactively experience the ScaleCUA agent's capabilities firsthand across Ubuntu, Android, and Web platforms. For a complete guide, please see the [Playground].
Follow these two steps to begin:
-
Deploy the ScaleCUA models with vLLM following our [Model Deployment]. We support two modes of operation: Native Agentic Model using a single model for both UI grounding and planning, and Agentic Workflow supporting two different models for UI planning and grounding.
-
Set up your environment following [Playground Environment]. We provide pre-configured, interactive virtual machines for Ubuntu, Android, and Web to simplify this process.
Now, you can try our agent in the interactive environment!
We provide a suite of benchmarks for end-to-end agent evaluation using a vision-only setup. ScaleCUA support using vLLM to deploy and evaluate it through an OpenAI-compatible API. To run the evaluation benchmarks, please refer to the specific instructions within the [Evaluation].
Our evaluation suite covers desktop, mobile, and web environments:
- Android:
AndroidWorld
,AndroidLab
- Ubuntu:
OSWorld
- macOS:
MacOSArena
- Web:
WebArenaLite-V2
(A refined version of WebArena-Lite suitable for visual-based agents) - Windows:
WindowsAgentArena
The directory agent-sft/
contains all the necessary code and configuration files to train the ScaleCUA from scratch using our ScaleCUA-Data. We support training Computer Use Agents with both InternVL and Qwen-VL.
Thanks to the following open-sourced projects:
OSWorld WindowAgentArena WebArena AndroidWorld ScienceBoard AGUVIS MMBench-GUI Qwen-VL InternVL
This project is licensed under the Apache 2.0 License.
If you find our work useful, please consider citing our paper:
@article{liu2025scalecua,
title = {ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data},
author = {Liu, Zhaoyang and Xie, Jingjing and Ding, Zichen and Li, Zehao and Yang, Bowen and Wu, Zhenyu and Wang, Xuehui and Sun, Qiushi and Liu, Shi and Wang, Weiyun and Ye, Shenglong and Li, Qingyun and Dong, Xuan and Yu, Yue and Lu, Chenyu and Mo, YunXiang and Yan, Yao and Tian, Zeyue and Zhang, Xiao and Huang, Yuan and Liu, Yiqian and Su, Weijie and Luo, Gen and Yue, Xiangyu and Qi, Biqing and Chen, Kai and Zhou, Bowen and Qiao, Yu and Chen, Qifeng and Wang, Wenhai},
journal = {arXiv preprint arXiv:2509.15221},
year = {2025},
note = {Preprint},
url = {https://github.com/OpenGVLab/ScaleCUA}
}