Skip to content

Neur-IO/BAMI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

BAMI: Training-Free Bias Mitigation in GUI Grounding

Borui Zhang*, Bo Zhang*, Bo Wang*, Wenzhao Zheng, Yuhao Cheng, Liang Tang, Yiqiang Yan, Jie Zhou, Jiwen Lu†

Department of Automation, Tsinghua University; Lenovo

Paper License


πŸ“– Introduction

This repository contains the official resources for the paper "BAMI: Training-Free Bias Mitigation in GUI Grounding".

BAMI (Bias-Aware Manipulation Inference) is a novel, training-free framework designed to unlock the full potential of Multimodal Large Language Models (MLLMs) in GUI grounding tasks. By diagnosing grounding failures through our Masked Prediction Distribution (MPD) method, we identified two primary sources of error: Precision Bias (stemming from high resolution and discretization) and Ambiguity Bias (stemming from token-space edit distances).

BAMI addresses these issues via a structured inference process involving Coarse-to-Fine Focus and Candidate Selection, achieving state-of-the-art performance on benchmarks like ScreenSpot-Pro without requiring any additional model training.

Teaser Image

Figure 1: Comparison with conventional grounding models. BAMI achieves accurate localization via structured inference with bias-aware manipulations.

πŸ”₯ News

  • [2025-11-21] The technical report is released! Download PDF.
  • [Coming Soon] The inference code and evaluation scripts will be released soon. Stay tuned!

πŸš€ Key Features

  • Training-Free: Directly boosts the performance of existing open-source backbones (e.g., OS-Atlas, UI-TARS, TianXi-Action) without fine-tuning.
  • Precision Bias Mitigation: Implements a Coarse-to-Fine Focus strategy to handle high-resolution UI elements and small objects effectively.
  • Ambiguity Bias Correction: Utilizes a Candidate Selection mechanism with Euclidean-space priors to correct MLLM selection biases.
  • Diagnostic Tool: Introduces MPD, an attribution method to visualize and analyze error sources in GUI grounding.
  • SOTA Performance: Achieves 57.8% accuracy on the challenging ScreenSpot-Pro benchmark, outperforming baselines by a significant margin.

πŸ“Š Results

BAMI consistently improves accuracy across various model backbones and datasets.

Model Backbone Dataset Baseline Accuracy BAMI Accuracy
TianXi-Action-7B ScreenSpot-Pro 51.9% 57.8%
UI-TARS-1.5-7B ScreenSpot-Pro 40.8% 51.9%
OS-Atlas-7B ScreenSpot-Pro 18.9% 41.6%

For detailed experimental results, please refer to the Technical Report.

πŸ› οΈ Usage

The code for BAMI is currently being organized and will be open-sourced shortly. We are cleaning up the scripts for the Masked Prediction Distribution (MPD) analysis and the inference pipeline to ensure ease of use.

πŸ“ Citation

If you find this work helpful for your research, please consider citing our paper:

@article{zhang2025bami,
  title={BAMI: Training-Free Bias Mitigation in GUI Grounding},
  author={Zhang, Borui and Zhang, Bo and Wang, Bo and Zheng, Wenzhao and Cheng, Yuhao and Tang, Liang and Yan, Yiqiang and Zhou, Jie and Lu, Jiwen},
  journal={Technical Report},
  year={2025}
}

About

Training-free bias mitigation for precise GUI grounding.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published