UILoop

[Paper][ACL 2026 Findings] What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning

🔥 News

Our paper was accepted by ACL 2026 Findings. 🎉🎉🎉
Our code has been released.
Our UI Comprehension Bench is being finalized and will be available soon. 🚧

✨ Our Findings

Left: Evaluation of existing methods on UI element localization, semantic function description, and practical usage. Middle: Performance gains with correct vs. misleading UI info compared to without UI info. Right: Comparison of UILoop against existing "Screen-to-Action" methods on SR metric for Android Control-High.

We demonstrate that comprehensive UI understanding significantly enhances reasoning in existing GUI agents. Building on this insight, we propose the innovative UILoop paradigm, which moves beyond conventional "Screen-to-Action" approaches by reframing GUI reasoning as cyclic "Screen–UI Elements–Action" loop. Through UI Element–Driven Reinforcement Fine-Tuning, UILoop improves model comprehension of interface elements, thereby advancing mutimodal GUI reasoning and interpretability.

🌱 UI Comprehension Bench

Statistics of Our UI Comprehension-Bench. Left: Proportion and distribution of GT UI elements; token length of their semantic descriptions. Right: Proportion of GT UI elements effectively used in action inference.

We introduce the more challenging UI Comprehension task with three dedicated evaluation metrics (UI Locate, Lingualize, Leverage) to assess how existing methods master UI elements. To support this, we advance community research by contributing UI Comprehension-Bench, a 26K benchmark for comprehensive UI capability assessment.

📦 Enviroment

conda create -n uiloop python=3.10
conda activate uiloop
pip install -r requirements.txt

🚀 UI Element-Driven RFT

Our repository supports the Qwen 2.5 VL series models (including 3B and 7B).

bash ./examples/qwen2_5_vl_gui_grpo.sh

📊 Inference and Evaluation

Inference and evaluation of AndroidControl-High and Screenspot Pro.

bash ./uiloop/inference.sh
bash ./uiloop/eval.sh

Inference and evaluation of Our UI Comprehension Bench. Running this script will give you scores for UI Locate, Lingualize, and leverage.

bash ./uiloop/uiloop_bench_inference.sh
bash ./uiloop/eval_uiloop.sh

💐 Acknowledgements

We would like to express our sincere gratitude to QwenVL, EasyR1, Verl and GUI-R1 for providing open-source resources that contributed to the development of this project.

⭐ Citation

If you find this repo useful for your research, please consider citing the paper.

@article{li2026s,
  title={What's Missing in Screen-to-Action? Towards a UI-in-the-Loop Paradigm for Multimodal GUI Reasoning},
  author={Li, Songze and Guo, Xiaoke and Liu, Tianqi and Yi, Biao and Gong, Zhaoyan and Liu, Zhiqiang and Chen, Huajun and Zhang, Wen},
  journal={arXiv preprint arXiv:2604.06995},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
accelerate_configs		accelerate_configs
assets		assets
examples		examples
scripts		scripts
src		src
uiloop		uiloop
verl		verl
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UILoop

🔥 News

✨ Our Findings

🌱 UI Comprehension Bench

📦 Enviroment

🚀 UI Element-Driven RFT

📊 Inference and Evaluation

💐 Acknowledgements

⭐ Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

UILoop

🔥 News

✨ Our Findings

🌱 UI Comprehension Bench

📦 Enviroment

🚀 UI Element-Driven RFT

📊 Inference and Evaluation

💐 Acknowledgements

⭐ Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages