This is the code repository for the following publication:
Yihao Wang, Raphael Memmesheimer, and Sven Behnke: LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps at the 19th International Conference on Intelligent Autonomous Systems (IAS) in Genoa, Italy
A preprint can be found on arXiv
essential packages for the code, please check requirements.txt
Download Alfred dataset, please check: https://github.com/askforalfred/alfred
If you want to try a lighter backbone, i.e., MobileCLIP, please install from their official repo: https://github.com/apple/ml-mobileclip
pretraining: All the pretraining and preprocessing code
model: end-to-end model for generating action sequence
dataset: Dataset for end-to-end training.
@inproceedings{Wang2023LIAM,
title={LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps},
author={Wang, Yihao and Memmesheimer, Raphael and Behnke, Sven},
conference={19th International Conference on Intelligent Autonomous Systems (IAS)},
year={2023},
location={Genoa, Italy}
}