Skip to content

Commit e9aa819

Browse files
committed
update readme & readme_cn
1 parent bd0be48 commit e9aa819

File tree

2 files changed

+134
-210
lines changed

2 files changed

+134
-210
lines changed

README.md

Lines changed: 67 additions & 106 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
55
<p align="center">
66
<!-- <a href="https://github.com/stepfun-ai/gelab-zero"><img src="https://img.shields.io/badge/💻%20GitHub-Repository-black" alt="GitHub" /></a> -->
7-
<a href="https://arxiv.org/abs/2512.15431"><img src="https://img.shields.io/badge/📄%20arXiv-Paper-red" alt="arXiv" /></a>
7+
<a href="https://arxiv.org/abs/2512.15431"><img src="https://img.shields.io/badge/arXiv-Step--GUI Technical Report-B31B1B.svg?logo=arxiv&logoColor=white" alt="arXiv" /></a>
88
<a href="https://opengelab.github.io/"><img src="https://img.shields.io/badge/🌐%20Website-Project%20Page-blue" alt="Website" /></a>
99
<a href="https://huggingface.co/stepfun-ai/GELab-Zero-4B-preview"><img src="https://img.shields.io/badge/🤗%20Hugging%20Face-GELab--Zero--4B--preview-orange" alt="Hugging Face Model" /></a>
1010
<a href="https://huggingface.co/datasets/stepfun-ai/AndroidDaily"><img src="https://img.shields.io/badge/📚%20Hugging%20Face-AndroidDaily-yellow" alt="Hugging Face Dataset" /></a>
@@ -18,63 +18,50 @@
1818

1919
## 📰 News
2020

21-
* 🎁 **[2025-12-18]** We release our technical report on [**arXiv**](https://arxiv.org/abs/2512.15431)! Check out the details of GELab-Zero's architecture, training process, and benchmark results.
22-
* 🎁 **[2025-12-18]** We release a more powerful **API model** with enhanced performance for GUI automation tasks. [Apply for API access here](https://wvixbzgc0u7.feishu.cn/share/base/form/shrcnNStxEmuE7aY6jTW07CZHMf)!
21+
* 🎁 **[2025-12-18]** We release **Step-GUI Technical Report** on [**arXiv**](https://arxiv.org/abs/2512.15431)!
22+
* 🎁 **[2025-12-18]** We release a more powerful **API** for GUI automation tasks. [Apply for API access here](https://wvixbzgc0u7.feishu.cn/share/base/form/shrcnNStxEmuE7aY6jTW07CZHMf)!
23+
* 🎁 **[2025-12-12]** We release **MCP-Server** support for multi-device management and task distribution. See [Installation & Quick Start](#-installation-quick-start) for setup instructions.
24+
* 🎁 **[2025-12-1]** We thank the following projects and authors for providing quantization tools & tutorials: [GGUF_v1](https://huggingface.co/bartowski/stepfun-ai_GELab-Zero-4B-preview-GGUF), [GGUF_v2](https://huggingface.co/noctrex/GELab-Zero-4B-preview-GGUF), [EXL3](https://huggingface.co/ArtusDev/stepfun-ai_GELab-Zero-4B-preview-EXL3), [Tutorials_CN](http://xhslink.com/o/1WrmgHGWFYh), [Tutorials_EN](https://www.youtube.com/watch?v=4BMiDyQOpos)
25+
* 🎁 **[2025-11-31]** We release a lightweight **4B** model GELab-Zero-4B-preview on [**Hugging Face**](https://huggingface.co/stepfun-ai/GELab-Zero-4B-preview) and [**Model Scope**](https://modelscope.cn/models/stepfun-ai/GELab-Zero-4B-preview).
26+
* 🎁 **[2025-11-31]** We release the tasks from the [**AndroidDaily**](https://huggingface.co/datasets/stepfun-ai/AndroidDaily) benchmark.
27+
* 🎁 **[2025-11-30]** We release the current **GELab-Zero** engineering infrastructure.
28+
* 🎁 **[2025-10]** Our [**research**](https://github.com/summoneryhl/gelab-engine) paper on GELab-Engine is accepted by **NeurIPS 2025**.
2329

24-
* 🎁 **[2025-12-12]** MCP-Server ready:
25-
26-
<!-- ### Step1 启动 mcp server 以支持多设备管理和任务分发 -->
27-
### Step1 Start MCP server to support multi-device management and task distribution
28-
29-
```bash
30-
# enable mcp server
31-
python mcp_server/detailed_gelab_mcp_server.py
32-
```
33-
34-
### Step2 Import MCP tools in Chatbox
35-
<!-- images/MCP-chatbox.png -->
36-
<div style="display: flex; align-items: center; justify-content: center; width: 80%; margin: 0 auto;">
37-
<img src="images/MCP-chatbox.png" alt="MCP-Demo" style="flex: 1; height: 400px; object-fit: contain; margin-right: 1px;"/>
38-
</div>
39-
40-
41-
42-
* 🎁 **[2025-12]** We thank the following projects and authors for providing quantization tools & tutorials: [GGUF_v1](https://huggingface.co/bartowski/stepfun-ai_GELab-Zero-4B-preview-GGUF), [GGUF_v2](https://huggingface.co/noctrex/GELab-Zero-4B-preview-GGUF), [EXL3](https://huggingface.co/ArtusDev/stepfun-ai_GELab-Zero-4B-preview-EXL3), [Tutorials_CN](http://xhslink.com/o/1WrmgHGWFYh), [Tutorials_EN](https://www.youtube.com/watch?v=4BMiDyQOpos)
43-
* 🎁 **[2025-11]** We release a lightweight **4B model** on [**Hugging Face**](https://huggingface.co/stepfun-ai/GELab-Zero-4B-preview) and [**Model Scope**](https://modelscope.cn/models/stepfun-ai/GELab-Zero-4B-preview).
44-
* 🎁 **[2025-11]** We release the tasks from the [**AndroidDaily**](https://huggingface.co/datasets/stepfun-ai/AndroidDaily) benchmark.
45-
* 🎁 **[2025-11]** We release the current **GELab-Zero** engineering infrastructure.
46-
* 🎁 **[2025-10]** Our [research](https://github.com/summoneryhl/gelab-engine) paper on **GELab-Engine** is accepted by **NeurIPS 2025**.
4730

4831

4932
## 📑 Table of Contents
5033

5134
- [📖 Background](#-background)
5235
- [🎥 Application Demonstrations](#-application-demonstrations)
53-
- [📊 AndroidDaily](#-androiddaily-a-self-built-benchmark-close-to-daily-life)
5436
- [🏆 Open Benchmark](#-open-benchmark)
5537
- [🚀 Installation & Quick Start](#-installation-quick-start)
5638
- [📝 Citation](#-citation)
57-
- [📧 Contact](#-contact)
5839

59-
## 📖 Background
6040

61-
As AI experiences continue to penetrate consumer-grade terminal devices, mobile Agent research is at a critical juncture transitioning from "feasibility verification" to "large-scale application." GUI-based solutions have emerged as the optimal approach for the current stage in addressing complex mobile ecosystems and achieving scalable Agent capabilities, thanks to their universal compatibility with all apps and zero-cost integration without requiring app vendor adaptation. However, due to the highly fragmented nature of mobile application ecosystems, getting GUI Agents to truly work across different brands and device models often faces numerous engineering challenges: multi-device ADB connections, dependency installation, permission configuration, inference service deployment, task recording and replay. This means Agent developers and MCP users need to handle substantial engineering infrastructure work, making it difficult to focus on strategic innovation.
41+
## 📧 Contact
42+
43+
You can contact us and communicate with us by joining our WeChat group:
6244

63-
To address this challenge, we are open-sourcing GELab-Zero to accelerate the innovation and application deployment of GUI Agents. It consists of two main components:
45+
| WeChat Group |
46+
|:-------------------------:|
47+
| <img src="images/wechat_group2.jpeg" width="200"> |
6448

65-
- Plug-and-play complete inference engineering infrastructure that handles all the heavy lifting
66-
- A 4B GUI Agent model capable of running on local computer
6749

68-
It provides a one-click launch experience similar to open-source GUI Agent MCP, can be deployed entirely locally, and puts the entire inference pipeline under your complete control. Specific capabilities include:
6950

70-
- **Local Deployment**: Supports 4B-scale models running on consumer-grade hardware, balancing low latency with privacy.
71-
- **One-click Launch**: Provides unified deployment pipeline that automatically handles environment dependencies and device management.
72-
- **Task Distribution**: Can distribute tasks to multiple phones while recording interaction trajectories for observability and reproducibility.
73-
- **Three Agent Modes**: Covers multiple working modes including ReAct loops, multi-agent collaboration, and scheduled tasks.
51+
## 📖 Background
52+
As AI experiences increasingly penetrate consumer-grade devices, Mobile Agent research is at a critical juncture: transitioning from **"feasibility verification"** to **"large-scale application."** While GUI-based solutions offer universal compatibility, the fragmentation of mobile ecosystems imposes heavy engineering burdens that hinder innovation. GELab-Zero is designed to dismantle these barriers.
53+
54+
* ⚡️ **Out-of-the-Box Full-Stack Infrastructure**
55+
Resolves the fragmentation of the mobile ecosystem with a unified, one-click inference pipeline. It automatically handles multi-device ADB connections, dependencies, and permissions, allowing developers to focus on strategic innovation rather than engineering infrastructure.
7456

75-
These capabilities enable GELab-Zero to flexibly handle complex task flows in real-world scenarios and provide a solid foundation for future extensions.
57+
* 🖥️ **Consumer-Grade Local Deployment**
58+
Features a built-in 4B GUI Agent model **fully optimized for Mac (M-series) and NVIDIA RTX 4060**. It supports complete local execution, ensuring data privacy and low latency on standard consumer hardware.
7659

77-
For Agent developers, this infrastructure enables rapid testing of new ideas and strategies, validating interaction approaches; for enterprise users, it allows direct reuse of this infrastructure to quickly integrate MCP capabilities into product business.
60+
* 📱 **Flexible Task Distribution & Orchestration**
61+
Supports distributing tasks across multiple devices with interaction trajectory recording. It offers three versatile modes—ReAct loops, multi-agent collaboration, and scheduled tasks—to handle complex, real-world business scenarios.
62+
63+
* 🚀 **Accelerate from Prototype to Production**
64+
Empowers developers to rapidly validate interaction strategies while allowing enterprises to directly reuse the underlying infrastructure for zero-cost MCP integration, bridging the critical gap between "feasibility verification" and "large-scale application."
7865

7966
## 🎥 Application Demonstrations
8067

@@ -163,54 +150,6 @@ Task: Go to Baicizhan and help me complete the vocabulary learning task
163150
**[📹 Click to view demo video](./images/video_8.mp4)**
164151

165152

166-
## 📊 AndroidDaily: A Self-Built Benchmark Close to Daily Life
167-
168-
Current mainstream benchmarks mostly focus on productivity applications (such as email), but users' daily high-frequency usage is dominated by lifestyle service applications (such as food delivery, ride-hailing, social media, payments, etc.), and these scenarios better reflect the practical value of current GUI Agents.
169-
170-
To this end, we propose AndroidDaily: a multi-dimensional dynamic benchmark for the real world. We focus on empirical analysis of six core dimensions of modern life (food, transportation, shopping, housing, information consumption, entertainment), prioritizing popular applications that dominate these categories. This makes the tasks in the benchmark characterized by real-world interaction results (such as transaction payments, service bookings) and tight online-offline inheritance.
171-
172-
To balance evaluation comprehensiveness and execution efficiency, AndroidDaily adopts two evaluation modes:
173-
174-
### Static Testing
175-
176-
Contains 3146 actions. Provides task descriptions and step-by-step screenshots, requiring the Agent to predict the action type and action value (such as click coordinates, input text) for each step, primarily evaluating numerical accuracy. This method requires no complex engineering infrastructure and enables rapid, cost-effective large-scale model iteration and testing.
177-
178-
The action type distribution in static testing is as follows (total 3146 actions):
179-
180-
- **CLICK**: 1354 times - Click operations
181-
- **COMPLETE**: 410 times - Task completion
182-
- **AWAKE**: 528 times - App activation
183-
- **TYPE**: 371 times - Text input
184-
- **INFO**: 305 times - Information query
185-
- **WAIT**: 85 times - Wait operations
186-
- **SLIDE**: 93 times - Slide operations
187-
188-
#### AndroidDaily Static Benchmark Results
189-
190-
191-
| Model | Accuracy |
192-
| ------------------------- | ----------- |
193-
| GPT-4o | 0.196 |
194-
| Gemini-2.5-pro-thinking | 0.366 |
195-
| UI-TARS-1.5 | 0.470 |
196-
| GELab-Zero-4B-preview | **0.734** |
197-
198-
### End-to-End Benchmark
199-
200-
Contains 235 tasks. Conducted in a fully functional test environment (such as real devices or emulators), the Agent needs to autonomously execute tasks from start to finish, with overall task success rate as the evaluation metric. This setup has the highest ecological validity and truly reflects the Agent's comprehensive capabilities in complex environments.
201-
202-
The scenario distribution in the end-to-end benchmark is as follows:
203-
204-
- **Transportation**: 78 tasks (33.19%) - Ride-hailing, navigation, public transit, etc.
205-
- **Shopping**: 61 tasks (25.96%) - E-commerce shopping, payment, order management, etc.
206-
- **Social Communication**: 43 tasks (18.3%) - Messaging, social interactions, etc.
207-
- **Content Consumption**: 37 tasks (15.74%) - News reading, video watching, content bookmarking, etc.
208-
- **Local Services**: 16 tasks (6.81%) - Food delivery, on-site services, etc.
209-
210-
![Scene Distribution](./images/Scenario.png)
211-
212-
Typical tasks include ride-hailing, shopping, message sending, content bookmarking, food delivery ordering, etc. GELab-Zero-4B-preview achieves 75.86% success rate on AndroidWorld testing, demonstrating excellent performance on complex mobile tasks.
213-
214153
## 🏆 Open Benchmark
215154

216155
We conducted comprehensive evaluations of GELab-Zero-4B-preview model across multiple open-source benchmarks, covering various dimensions including GUI understanding, localization, and interaction. The comparison results with other open-source models are shown below:
@@ -528,23 +467,23 @@ Download the [Jan](https://github.com/janhq/jan/releases) client and install it.
528467

529468
Go to Settings → Model Provider → choose llama.cpp, then import the models:
530469

531-
![Import model](images/jan_1.png)
470+
<img src="images/jan_1.png" width="50%" alt="test model">
532471

533472
Select the two GGUF files you just converted:
534473

535-
![Import model](images/jan_2.png)
474+
<img src="images/jan_2.png" width="50%" alt="test model">
536475

537476
Back in the model UI, click `Start`.
538477

539478
Create a chat to verify the model runs correctly:
540479

541-
![test model](images/jan_3.png)
480+
<img src="images/jan_3.png" width="50%" alt="test model">
542481

543482
Once tokens are streaming normally, start the local API server.
544483

545484
Go to Settings → Local API Server, create an API key under server configuration, then launch the service:
546485

547-
![make API service](images/jan_4.png)
486+
<img src="images/jan_4.png" width="50%" alt="test model">
548487

549488
#### Step 3: Adjust GELab-Zero Agent model config
550489

@@ -576,36 +515,58 @@ local_model_config = {
576515

577516
---
578517

518+
### (Optional) MCP-Server Setup
519+
520+
<!-- ### Step1 启动 mcp server 以支持多设备管理和任务分发 -->
521+
#### Step 1: Start MCP server to support multi-device management and task distribution
522+
523+
```bash
524+
# enable mcp server
525+
python mcp_server/detailed_gelab_mcp_server.py
526+
```
527+
528+
#### Step 2: Import MCP tools in Chatbox
529+
<!-- images/MCP-chatbox.png -->
530+
<div style="display: flex; align-items: center; justify-content: center; width: 80%; margin: 0 auto;">
531+
<img src="images/MCP-chatbox.png" alt="MCP-Demo" style="flex: 1; height: 400px; object-fit: contain; margin-right: 1px;"/>
532+
</div>
533+
534+
535+
579536
## 📝 Citation
580537

581538
If you find GELab-Zero useful for your research, please consider citing our work :)
582539

583540
```bibtex
541+
@misc{yan2025stepguitechnicalreport,
542+
title={Step-GUI Technical Report},
543+
author={Haolong Yan and Jia Wang and Xin Huang and Yeqing Shen and Ziyang Meng and Zhimin Fan and Kaijun Tan and Jin Gao and Lieyu Shi and Mi Yang and Shiliang Yang and Zhirui Wang and Brian Li and Kang An and Chenyang Li and Lei Lei and Mengmeng Duan and Danxun Liang and Guodong Liu and Hang Cheng and Hao Wu and Jie Dong and Junhao Huang and Mei Chen and Renjie Yu and Shunshan Li and Xu Zhou and Yiting Dai and Yineng Deng and Yingdan Liang and Zelin Chen and Wen Sun and Chengxu Yan and Chunqin Xu and Dong Li and Fengqiong Xiao and Guanghao Fan and Guopeng Li and Guozhen Peng and Hongbing Li and Hang Li and Hongming Chen and Jingjing Xie and Jianyong Li and Jingyang Zhang and Jiaju Ren and Jiayu Yuan and Jianpeng Yin and Kai Cao and Liang Zhao and Liguo Tan and Liying Shi and Mengqiang Ren and Min Xu and Manjiao Liu and Mao Luo and Mingxin Wan and Na Wang and Nan Wu and Ning Wang and Peiyao Ma and Qingzhou Zhang and Qiao Wang and Qinlin Zeng and Qiong Gao and Qiongyao Li and Shangwu Zhong and Shuli Gao and Shaofan Liu and Shisi Gao and Shuang Luo and Xingbin Liu and Xiaojia Liu and Xiaojie Hou and Xin Liu and Xuanti Feng and Xuedan Cai and Xuan Wen and Xianwei Zhu and Xin Liang and Xin Liu and Xin Zhou and Yingxiu Zhao and Yukang Shi and Yunfang Xu and Yuqing Zeng and Yixun Zhang and Zejia Weng and Zhonghao Yan and Zhiguo Huang and Zhuoyu Wang and Zheng Ge and Jing Li and Yibo Zhu and Binxing Jiao and Xiangyu Zhang and Daxin Jiang},
544+
year={2025},
545+
eprint={2512.15431},
546+
archivePrefix={arXiv},
547+
primaryClass={cs.CV},
548+
url={https://arxiv.org/abs/2512.15431},
549+
}
550+
584551
@software{gelab_zero_2025,
585552
title={GELab-Zero: An Advanced Mobile Agent Inference System},
586553
author={GELab Team},
587554
year={2025},
588555
url={https://github.com/stepfun-ai/gelab-zero}
589556
}
590557
591-
@inproceedings{gelab_mt_rl,
592-
title={GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning},
593-
author={Yan, Haolong and Shen, Yeqing and Huang, Xin and Wang, Jia and Tan, Kaijun and Liang, Zhixuan and Li, Hongxin and Ge, Zheng and Yoshie, Osamu and Li, Si and others},
594-
booktitle={The Thirty-ninth Annual Conference on Neural Information Processing Systems}
558+
@misc{gelab_engine,
559+
title={GUI Exploration Lab: Enhancing Screen Navigation in Agents via Multi-Turn Reinforcement Learning},
560+
author={Haolong Yan and Yeqing Shen and Xin Huang and Jia Wang and Kaijun Tan and Zhixuan Liang and Hongxin Li and Zheng Ge and Osamu Yoshie and Si Li and Xiangyu Zhang and Daxin Jiang},
561+
year={2025},
562+
eprint={2512.02423},
563+
archivePrefix={arXiv},
564+
primaryClass={cs.CV},
565+
url={https://arxiv.org/abs/2512.02423},
595566
}
596567
597568
```
598569

599-
## 📧 Contact
600-
601-
For questions and support, please contact: [[email protected]]
602-
603-
You can contact us and communicate with us by joining our WeChat group:
604-
605-
| WeChat Group |
606-
|:-------------------------:|
607-
| <img src="images/wechat_group2.jpeg" width="200"> |
608-
609570
## ⭐ Star History
610571

611572
<div align="center">

0 commit comments

Comments
 (0)