Figure 1: Framework overview of our diffusion-based policy.
![]() |
![]() |
Figure 2: Task visualization and results overview (2 ALOHA + 16 RoboTwin + 4 Real-world tasks).
June 25th, 2025: Our paper is accepted by ICCV 2025.
May 20th, 2025: We released our code and model.
https://github.com/return-sleep/Diffusion_based_imaginative_Coordination.git
cd Diffusion_based_imaginative_Coordination
Install the required packages, see INSTALLATION_ALOHA.md
-
Download the dataset from ALOHA_Data
-
Modify
constants.py Line 5to your own dataset path
cd ALOHA
bash script/train_eval.sh sim_insertion_human 20000 0 0
# bash script/train_eval.sh <task_name> <num_steps> <seed> <cuda_id>
bash script/eval.sh sim_insertion_human 20000 0 0 0
# bash script/train_eval.sh <task_name> <num_steps> <seed> <cuda_id> <ckpt_type>
conda create -n RoboTwin python=3.10
- Install the required packages for RoboTwin, see INSTALLATION_RoboTwin.md
- Install the required packages for Cosmos-Tokenizer and download the checkpoints from Hugging Face, see Cosmos-Tokenizer
- Install the required packages for policy deployment
pip install diffusers wandb ipdb gpustat dm_control omegaconf hydra-core==1.2.0 einops==0.4.1 diffusers==0.11.1 numba==0.56.4 moviepy imageio av matplotlib termcolorcd RoboTwin
bash run_task.sh block_hammer_beat 0
# bash run_task.sh ${task_name} ${gpu_id}
python script/pkl2zarr_mypolicy.py block_hammer_beat D435 100
# python script/pkl2zarr_mypolicy.py ${task_name} ${head_camera_type} ${expert_data_num}
cd policy/ACT-DP-TP
bash scripts/act_dp_tp/train.sh block_hammer_beat 0 0
# bash scripts/train.sh ${task_name} ${gpu_id} ${seed}
bash scripts/act_dp_tp/eval.sh block_hammer_beat 0 0 0
# bash scripts/eval.sh ${task_name} ${gpu_id} ${seed} ${ckpt_type}
Our project builds upon the following excellent repositories:
We sincerely thank the authors for their inspiring work and open-source contributions.
If you find our work helpful, please cite us:
@misc{xu2025diffusionbasedimaginativecoordinationbimanual,
title={Diffusion-Based Imaginative Coordination for Bimanual Manipulation},
author={Huilin Xu and Jian Ding and Jiakun Xu and Ruixiang Wang and Jun Chen and Jinjie Mai and Yanwei Fu and Bernard Ghanem and Feng Xu and Mohamed Elhoseiny},
year={2025},
eprint={2507.11296},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2507.11296},
}All the code, model weights, and data are licensed under MIT license.


