Skip to content

Official repo for ICML 2025 paper "RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer"

Notifications You must be signed in to change notification settings

GeWu-Lab/RollingQ_ICML2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RollingQ_ICML2025

Official repo for ICML 2025 paper "RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer"

Access our paper via [arXiv].

Authors: Haotian Ni, Yake Wei, Hang Liu, Gong Chen, Chong Peng, Hao Lin, Di Hu.

In this work, we extend Imbalance Multimodal Learning to dynamic fusion paradigms. We identify the deactivation of dynamic property in attention mechanism and propose a simple yet effective method, RollingQ, to revive the cooperation dynamics in multimodal Transformers.

Abstract

Multimodal learning faces challenges in effectively fusing information from diverse modalities, especially when modality quality varies across samples. Dynamic fusion strategies, such as attention mechanism in Transformers, aim to address such challenges by adaptively emphasizing modalities based on the characteristics of input data. However, through amounts of carefully designed experiments, we surprisingly observed that the dynamic adaptability of widely-used self-attention models diminishes. Model tends to prefer one modality regardless of data characteristics. This bias triggers a self-reinforcing cycle that progressively overemphasizes the favored modality, widening the distribution gap in attention keys across modalities and deactivating attention mechanism's dynamic properties. To revive adaptability, we propose a simple yet effective method Rolling Query (RollingQ), which balances attention allocation by rotating the query to break the self-reinforcing cycle and mitigate the key distribution gap. Extensive experiments on various multimodal scenarios validate the effectiveness of RollingQ and the restoration of cooperation dynamics is pivotal for enhancing the broader capabilities of widely deployed multimodal Transformers.

Overview of RollingQ

algorithm

Environment

According to requirements.txt, please run the following command in the shell

conda create -n rollingQ python=3.10
conda activate rollingQ
pip install -r requirements.txt

Datasets

We conducts our experiments on Kinetic-Sound, CREMA-D and CMU-MOSEI. You can download the original datasets, and follow the preprocessing instructions provided in BalanceBench. Additionally, the preprocessed data is available on Huggingface.

Citation

If you find this work useful, please consider citing it.

@article{ni2025rollingq,
  title={RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer},
  author={Ni, Haotian and Wei, Yake and Liu, Hang and Chen, Gong and Peng, Chong and Lin, Hao and Hu, Di},
  journal={arXiv preprint arXiv:2506.11465},
  year={2025}
}

Acknowledgements

This work is supported by National Natural Science Foundation of China (NO.62106272). This work is also supported by Public Computing Cloud, Renmin University of China, and fund for building world-class universities (disciplines) of Renmin University of China.

Contact

If you have any detailed questions or suggestions, you can email us at: [email protected]

About

Official repo for ICML 2025 paper "RollingQ: Reviving the Cooperation Dynamics in Multimodal Transformer"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages