Repository for project on temporal reasoning for intelligent human robot collaboration.
This project assists robots to perform temporal reasoning over the past, and carry out human-instructions in the present, in a generalized manner via Foundational Models.
-
- Download Whisper model from https://github.com/openai/whisper, and set up server file.
- In the client file,
ros_whisper.py, setself.hostandself.portto server's IP address and corresponding port number used by the server.
-
- Download CogVLM2 from https://github.com/THUDM/CogVLM2, and set up server file.
- In
config/params.yaml, setcogvlm2_host_ipandcogvlm2_portto server's IP address and corresponding port number used by the server.
-
- Download CogVLM2 from https://github.com/IDEA-Research/Grounded-SAM-2, and set up server file.
- In
config/params.yaml, setsam2_host_ipandsam2_portto server's IP address and corresponding port number used by the server.
-
- Download the Dataset from https://drive.google.com/drive/folders/1c78MIOhFKuIKvPrMw79iLxZvnk47zg0X?usp=sharing.
- Set the
dataset_folder_pathinconfig/params.yamland inbaseline_params.yaml.
-
- Download CogVLM from https://github.com/THUDM/CogVLM, and set up server file.
- In
config/baseline_params.yaml, setcogvlm_host_ipandcogvlm_portto server's IP address and corresponding port number used by the server.
-
- Set up realsense camera to record input video
- Set
video_lengthinconfig/params.yamlto the maximum length of video that you want to record - Video is stored in `output/run_output/input_video.mp4
-
- Set
openai_api_keyandpipeline_pathvariable in theconfig/params.yaml - Run
test_run.pyfor running pipeline on dataset. See output for each datapoint inoutput/dataset/folder. - To test with real-time data, record a video using
realsense, convert input instruction usingros_whisper.py.
- Set