This is a minimal implementation of the paper SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs (ICRA 2024), arxiv.
conda env create -f environment.yml
cd extension
python setup.py installPlease also install Pytorch. We test it with Pytorch 1.12.1 with CUDA 11.6.
Please refer to this page for downloading the data used in the paper and more information.
We set up two shape autoencoders called AtlasNet and AtlastNet2. AtlasNet is trained with full shapes under canonical coordinates,
while AtlasNet2 is trained under the camera frame, which provides shape priors to the goal scene graph to guide the imagination. We also provide trained models downloaded here: trained AtlasNet and trained AtlasNet2.
-
For generating shapes
- Train
AtlasNet. Need to adjust--batchSize,--nepochto make the training optimal.
cd AtlasNet python training/train_AE_AtlasNet.py-
Inference point clouds [optional]: run
AtlasNet/inference/run_AE_AtlasNet.py. The results would store generated points underAtlasNet/log/atlasnet_separate_cultery/network. -
Obtain point feature for training Graph-to-3D: run
AtlasNet/inference/create_features_gt.py, and the features are stored inobjs_features_gt_atlasnet_separate_cultery.json. The keys in the json file are the name of the objects, e.g., "cup_1", and the values are the latent features (128 dimensions).
- Train
-
For producing shape priors
- Store partial points in the initial scenes under the camera frame: This aims to train
AtlasNet2. The files can be downloaded from here: partial_pcs. You can also modify the file path and runAtlasNet2/auxiliary/generate_partial_pc_for_object.py. The final output are stored as pickle files underAtlasNet2/partial_pc_data. - Split the trainval set: Function
generate_train_sampleinAtlasNet2/auxiliary/generate_partial_pc_for_object.pysplitsAtlasNet2/partial_pc_datainto train (90%) and test (10%). The file names are stored asAtlasNet2/partial_pc_data_splits.json - Train
AtlasNet2: The procedure is the same asAtlasNet.
- Store partial points in the initial scenes under the camera frame: This aims to train
We built the scene generator based on Graph-to-3D, a GCN-VAE architecture. Different from the original Graph-to-3D, we leverage a shape-aware scene graph to make the generated shapes aligned with the observed shapes in the initial scene. We provide the trained model available here: trained graph_to_3d.
If you want to retrain the network, --batchSize, --nepoch, --exp needs to be set with proper numbers.
cd graphto3d
python scripts/train_vaegan.py
More details can be found in the original repository.
There are two modes--robot and oracle. The robot mode support a robot arm manipulating the objects according to the imagination. This mode needs a grasping pose prediction network, which we use Contact-GraspNet. This needs tensorflow downloaded.
pip install tensorflow-estimator==2.7.0 tensorflow-gpu==2.7.0The checkpoints can be downloaded from the original repository or here. After download the checkpoints, move them to ./contact_graspnet.
The oracle mode does not need an agent, but just directly put objects in relative poses. To make the script work, one can modify the variable mode inside, and then run:
python sgbot_pybullet.py
The results in the paper are under the oracle mode. We directly use the pre-defined scene graph as the goal.
We provide a recorded rosbag to demonstrate the performance. To conduct this trial, MaskRCNN checkpoint needs to be downloaded from here. Additional requirements need to installed.
