TransFormer with Visual Tokens for Human-Robot Interaction (TFVT-HRI).
@misc{xue2020proactive,
title={Proactive Interaction Framework for Intelligent Social Receptionist Robots},
author={Yang Xue and Fan Wang and Hao Tian and Min Zhao and Jiangyong Li and Haiqing Pan and Yueqiang Dong},
year={2020},
eprint={2012.04832},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2012.04832}
}sh scripts/download_pretrain_models.sh
sh tools/darknet_to_paddle.shYou need to organize the collected video clips into folder data/clips, then preprocess them using multiple objects tracking, i.e., execute:
# in data/clips
video_1.mp4
video_2.mp4
...# Assuming that we run 2 workers
python scripts/collect_v2_data.py -w 2 -c 1 -d data/clips &
python scripts/collect_v2_data.py -w 2 -c 2 -d data/clips &
# For more help information
python scripts/collect_v2_data.py --helpNotice that this script would spawn several workers to make the preprocessing fast. After it finished, your clips folder would looks like:
# in data/clips
video_1.mp4
video_1_track.mp4
video_1_states.pkl
video_2.mp4
video_2_track.mp4
video_2_states.pkl
...Notice: to alleviate the accumulated errors of multiple objects tracking, do not make the video clips too long, maybe several minutes.
We developed a web-based annotation platform and you can start the server by running:
sh scripts/run_anno_platform.shThen, open the index.html, load the video, select the suitable timestamps by clicking "add annotation", and fill the suitable multi-modal actions.
Next, clik the "save" button to download a txt file that has a prefix from the video filename. Finally move them to folder data/annos.
Notice: for video clips as full negative examples, please save a null txt file, otherwise the video would be ignored.
After collected and annotated raw datasets, we need to split them and generate datasets that the dataloader can use.
Step I: create the initial representation of the multi-modal actions.
python scripts/collect_act_emb.py -ad data/annosStep II: split positve examples and sample negative examples.
python scripts/prepare_dataset.py -dv ds -ad data/annos -vd data/clips
python scripts/prepare_dataset.py -dv ds_decordsh scripts/attn_model.shFirst, use scripts/save_infer_model_params.py to get paddle inference model.
# Assume you got trained model 'saved_models/attn/epoch_10'
python scripts/save_infer_model_params.py saved_models/attn/epoch_10 \
jetson/attn data/raw_wae/wae_lst.pkl visual_tokenSecond, setup Jetson environment following jetson/Jetson_INSTALL.md.
Thrid, configurate variables in the jetson/run.sh, use sh run.sh to compile and run the jetson/infer_v3.cpp. This would start a gRPC server and accept requests according to jetson/proactive_greeting.proto.


